此文已由做者岳猛受权网易云社区发布。
html
欢迎访问网易云社区,了解更多网易技术产品运营经验。sql
实时计算将来会成为一个趋势,基本上全部的离线计算任务都能经过实时计算来完成,对于实时计算来算,除了性能,延迟性和吞吐量这些硬指标要求之外,我以为易用性上面应该是将来的一个发展方向,毕竟如今的实时计算入storm,flink,sparkstreaming等都是经过API来进行的,这些使用起来都不太方便,后续更大的一个侧重方向应该是SQL ON STREAMING,对storm了解不是不少,可是有些公司已经针对storm进行了sql封装,下面只想谈下两个比较流行的开源流计算引擎对SQL的封装粒度。apache
SQL on Streaming Tables安全
code examplesapp
val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env)// read a DataStream from an external sourceval ds: DataStream[(Long, String, Integer)] = env.addSource(...)// register the DataStream under the name "Orders"tableEnv.registerDataStream("Orders", ds, 'user, 'product, 'amount)// run a SQL query on the Table and retrieve the result as a new Table val result = tableEnv.sql( "SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")
1.2版本 只支持SELECT, FROM, WHERE, and UNION,不支持聚合,join操做,感受离真正的使用仍是有一段距离要走。
code examplesoop
import org.apache.spark.sql.functions._import org.apache.spark.sql.SparkSession val input = spark.readStream.text("file:///home/hadoop/data/") val words = input.as[String].flatMap(_.split(" ")) val wordCounts = words.groupBy("value").count() val query = wordCounts.writeStream.outputMode("complete").format("console").start query.awaitTermination
output mode只实现了两种,且有限制性能
Append mode (default) This is the default mode, where only the new rows added to the result table since the last trigger will be outputted to the sink. This is only applicable to queries that do not have any aggregations (e.g. queries with only select, where, map, flatMap, filter,join, etc.). Complete mode The whole result table will be outputted to the sink.This is only applicable to queries that have aggregations
不支持update模式spa
连接:https://www.jianshu.com/p/9a9f8675bb3ecode
更多网易技术、产品、运营经验分享请点击。
相关文章:
【推荐】 关于评审--从思想到落地