dw mysql表单_[开源]小程序：DW元数据表血缘关系的实现

随着数据仓库(DW)接入的表和建立的模型增多，元数据管理就变得越来越重要。元数据表血缘关系，俗称“表与表之间的关系”。良好的元数据管理，可以清晰和明确看出每张表和模型之前的关系。在没有工具之前，只能依靠手工维护，一旦脚本发生变化，手工维护遗漏或不及时的话，就会造成关系不准确。通过工具，当表数量上百、上千张的时候，通过分析表与表“血缘关系”，就能清楚知道每张表之间的关系，及时定位和溯源问题。笔者在X

weixin_39929138

337人浏览 · 2021-03-05 23:35:52

weixin_39929138 · 2021-03-05 23:35:52 发布

随着数据仓库(DW)接入的表和建立的模型增多，元数据管理就变得越来越重要。元数据表血缘关系，俗称“表与表之间的关系”。良好的元数据管理，可以清晰和明确看出每张表和模型之前的关系。

在没有工具之前，只能依靠手工维护，一旦脚本发生变化，手工维护遗漏或不及时的话，就会造成关系不准确。通过工具，当表数量上百、上千张的时候，通过分析表与表“血缘关系”，就能清楚知道每张表之间的关系，及时定位和溯源问题。

笔者在XXX项目实践中，通过Java和Hive，最终产出一张表与表之间的关系表。现在把思路和代码分享给大家，与大家一起交流。当然，程序也有会很多改进和完善的地方，如有不妥，欢迎指正，谢谢。

本文也提供了解析sql的思路和方法。

实现思路：

获取输入路径；

读取文本文件内容(其中需要把从路径中读取所有文件)；

规则解析

“来源表”解析：主要通过表命名规范进行解析

“目标表”解析：主要通过insert into table和insert overwrite table语句解析

将解析文件，输出txt文件；

将文件上传到hdfs文件，加载到HIVE；

HIVE“行转列”处理；

生成“文件目录filepath、来源表source_table、目标表target_table ”三个字段.

主要代码：

、获取输入路径主要代码：

[java] view plain copy 在CODE上查看代码片派生到我的代码片

ArrayList files=getListFiles(D:\\HQL\xx); //获取解析文件路径

[java] view plain copy 在CODE上查看代码片派生到我的代码片

public static ArrayList getListFiles(Object obj) {

File directory = null;

if (obj instanceof File) {

directory = (File) obj;

} else {

directory = new File(obj.toString());

}

ArrayList files = new ArrayList();

if (directory.isFile()) {

files.add(directory);

return files;

} else if (directory.isDirectory()) {

File[] fileArr = directory.listFiles();

for (int i = 0; i < fileArr.length; i++) {

File fileOne = fileArr[i];

files.addAll(getListFiles(fileOne));

}

return files;

、读取文本文件内容主要代码

[java] view plain copy 在CODE上查看代码片派生到我的代码片

String filePath =files.get(i).toString(); //获取的单个文件路径

[java] view plain copy 在CODE上查看代码片派生到我的代码片

//读取文本文件内容

public static String readFile(String filePath) throws IOException {

StringBuffer sb = new StringBuffer();

readToBuffer(sb, filePath);

return sb.toString();

}

//将文本文件中的内容读入到buffer中

lic static void readToBuffer(StringBuffer buffer, String filePath) throws IOException {

InputStream is = new FileInputStream(filePath);

String line; // 用来保存每行读取的内容

BufferedReader reader = new BufferedReader(new InputStreamReader(is));

line = reader.readLine(); // 读取第一行

while (line != null) { // 如果 line 为空说明读完了

buffer.append(line); // 将读到的内容添加到 buffer 中

buffer.append("\n"); // 添加换行符

line = reader.readLine(); // 读取下一行

}

reader.close();

is.close();

}

、解析“来源表”主要代码

[java] view plain copy 在CODE上查看代码片派生到我的代码片

public static String hqltoSourceTable(String hql){

//获取hql内容

Map map = new HashMap();

String pattern = "(ods.|dwd.|......)\w+" ;//此处可以添加表命名规则

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(hql.toLowerCase().replaceAll(" \s+"," "));//转小写，将“空白字符”转空格

while(m.find()) {

map.put(m.group(0), m.group(0));

}

return map.keySet().toString().replaceAll("[\[\] +]", "");

}

、解析“目标表”主要代码

[java] view plain copy 在CODE上查看代码片派生到我的代码片

public static String hqltoTargetTable(String hql){

//获取hql内容

Map map = new HashMap();

//表：

String pattern = "(insert into table|insert overwrite table)\s+((\w+)\.(\w+)|(\w+))" ;

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(hql.toLowerCase().replaceAll(" \s+"," "));//转小写，将“空白字符”转空格

while(m.find()) {

map.put(m.group(0).replaceAll("insert into table|insert overwrite table|ods.|dwd.|.....", ""), m.group(0));//此处需要添加库名规则，将库名过滤掉

}

return map.keySet().toString().replaceAll("[\[\] \s+]", "");

五、将解析文件，输出txt文件

[java] view plain copy 在CODE上查看代码片派生到我的代码片

StringBuffer bf =new StringBuffer();

ileUtil fileUtil = new FileUtil();

fileUtil.writerFile(path,"hqlToTable.txt",bf.toString());

System.out.println("succeed! The file path is "+ path + "/hqlToTable.txt");

[java] view plain copy 在CODE上查看代码片派生到我的代码片

package com.xx.xx.hql;

import java.io.*;

public class FileUtil {

public String readerFile(String path) throws IOException {

File file = new File(path);

if (!file.exists() || file.isDirectory())

throw new FileNotFoundException();

byte[] tempbytes = new byte[5120];

int length=0;

StringBuffer sb = new StringBuffer();

@SuppressWarnings("resource")

FileInputStream fin = new FileInputStream(path);

// 读入多个字节到字节数组中，byteread为一次读入的字节数

while ((length=fin.read(tempbytes)) != -1) {

sb.append(new String(tempbytes,0,length));

}

return sb.toString();

}

public boolean writerFile(String path, String fileName, String fileContent) throws IOException {

try {

File file = new File(path);

if (!file.isDirectory()) {

file.deleteOnExit();

file.mkdirs();

}

String filePath = file + "/" + fileName;

file = new File(filePath);

if (file.exists() || file.isFile()) {

file.delete();

}

BufferedWriter out

= new BufferedWriter(new FileWriter(filePath));

out.write(fileContent);

out.flush();

out.close();

return true;

} catch (IOException e) {

e.printStackTrace();

return false;

}

六、HIVE“行转列”处理主要代码

[sql] view plain copy 在CODE上查看代码片派生到我的代码片

load data inpath '/tmp/hqlToTable.txt' into table tmp_app_xxxxx_table_rel_a;

insert overwrite table app_xxxxx_table_rel_a partition(dt='${hivevar:today}')

select

source_table

,target_table

,filepath

,from_unixtime(unix_timestamp())

from

(

SELECT distinct

filepath

,trim(new_source_table) as source_table

,trim(new_target_table) as target_table

FROM (

SELECT filepath

,source_table

,target_table

FROM tmp_app_xxxxx_table_rel_a --将解析后的文件数据加载到此表里

WHERE target_table IS NOT NULL

AND target_table <> ''

AND (

filepath NOT LIKE '%create%'

OR filepath NOT LIKE '%bak%'

OR filepath NOT LIKE '%test%'

OR filepath NOT LIKE '%bkt%'

)

) t LATERAL VIEW explode(split(source_table, ',')) adTable AS new_source_table

LATERAL VIEW explode(split(target_table, ',')) adTable AS new_target_table

) t

where trim(source_table) <> trim(target_table)

;

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

ADS1292R 使用过程心电图高精度ADC模块

文章目录1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础https://www.ti.com.cn/cn/lit/wp/slyy192/slyy192.pdf?ts=1600659610730&ref_u

开放原子开发者工作坊

实现一个家庭安防与环境监测系统（一）

开放原子开发者工作坊

【cf】Codeforces Round #774 (Div. 2) 前4题

题目A. Square Counting 简单数学题目大意题解代码B. Quality vs Quantity 排序题目大意题解代码C. Factorials and Powers of Two 状态压缩dp+位运算题目大意题解代码D. Weight the Tree 树形dp+dfs题目大意题解代码E. Power Board 看起来像是数论？许多年没打cf了，偶尔打了一盘，恢复紫名了。A. S