您可能有大量应用程序产生的JSON数据,您可能须要对这些JSON数据进行整理,去除不想要的字段,或者只保留想要的字段,或者仅仅是进行数据查询。数据库
那么,利用阿里云Data Lake Analytics或许是目前能找到的云上最为便捷的达到这一目标的服务了。仅仅须要3步,就能够完成对海量JSON数据的处理,或者更为复杂的ETL流程。json
利用各类手段,将JSON数据投递到OSS中。
一般,对于云上日志链路,还有一种JSON到OSS的投递链路,能够参考“云原生日志数据分析上手指南”其中的JSON部分。函数
参考上述“云原生日志数据分析上手指南”,其中已经有海量JSON数据的分区模式建表方法了。本例中,以非分区表为例,假设,数据文件中每一行一个JSON数据,JSON数据放置的OSS路径为:阿里云
oss://your_bucket/json_data/...
则,在DLA中执行建表:url
CREATE EXTERNAL TABLE simple_json ( data STRING ) STORED AS TEXTFILE LOCATION 'oss://your_bucket/json_data/';
json_remove
从JSON中去除指定JSON Path的数据。能够一次处理一个JSON path,也能够一次处理多个JSON path。注意:目前还不支持“..”等JSON path的模糊匹配,不久后会支持。spa
json_remove(json_string, json_path_string) -> json_string json_remove(json_string, array[json_path_string]) -> json_string
示例:日志
select json_remove( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , '$.glossary.GlossDiv') a; -> {"glossary":{"title":"example glossary"}} select json_remove( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , array['$.glossary.title', '$.glossary.GlossDiv.title']) a; {"glossary":{"GlossDiv":{"GlossList":{"GlossEntry":{"GlossTerm":"Standard Generalized Markup Language","GlossSee":"markup","SortAs":"SGML","GlossDef":{"para":"A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso":["GML","XML"]},"ID":"SGML","Acronym":"SGML","Abbrev":"ISO 8879:1986"}}}}}
json_reserve
从JSON中保留指定JSON Path的数据,去除其余的数据。能够一次处理一个JSON path,也能够一次处理多个JSON path。注意:目前还不支持“..”等JSON path的模糊匹配,不久后会支持。code
json_reserve(json_string, json_path_string) -> json_string json_reserve(json_string, array[json_path_string]) -> json_string
示例:rem
select json_reserve( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , array['$.glossary.title']) a; -> {"glossary":{"title":"example glossary"}} select json_reserve( '{ "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }' , array['$.glossary.title', '$.glossary.GlossDiv.title', '$.glossary.GlossDiv.GlossList.GlossEntry.ID']) a; -> "glossary":{"title":"example glossary","GlossDiv":{"GlossList":{"GlossEntry":{"ID":"SGML"}},"title":"S"}}}
还能够利用Data Lake Analytics强大的云上数据处理能力,进行多源数据融合处理、分析,回流到其余数据库、存储系统中。get
原文连接 本文为云栖社区原创内容,未经容许不得转载。