pandas和SQL数据分析实战node
https://study.163.com/course/courseMain.htm?courseId=1006383008&share=2&shareId=400000000398149python
http://impala.apache.org/apache
Apache Impala is the open source, native analytic database
for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.安全
Impala provides low latency低延迟 and high concurrency 高并发for BI/analytic queries on Hadoop (not delivered by batch frameworks such as Apache Hive). Impala also scales linearly, even in multitenant environments.网络
在Hadoop上执行BI风格的查询
Impala为Hadoop上的BI /分析查询提供了低延迟和高并发性(不是由Apache Hive等批处理框架提供的)。
即便在多租户环境中,Impala也能线性扩展。
Unify Your Infrastructure
Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment—no redundant infrastructure or data conversion/duplication.架构
统一基础设施 与Hadoop部署同样,利用相同的文件和数据格式以及元数据,安全性和资源管理框架 - 无需冗余基础架构或数据转换/复制。
For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about re-inventing the implementation wheel.并发
快速实施 对于Apache Hive用户,Impala使用相同的元数据和ODBC驱动程序。与Hive同样,Impala支持SQL,所以您没必要担忧从新实现轮子。
Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Sentry module, you can ensure that the right users and applications are authorized for the right data.app
依靠企业级安全 Impala与本地Hadoop安全和Kerberos进行身份验证集成,而且经过Sentry模块,您能够确保正确的用户和应用程序得到正确数据的受权。
Impala is open source (Apache License).框架
保持自由锁定
Impala是开源的(Apache许可证)。
Expand the Hadoop User-verse
With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata store from source through analysis.dom
展开Hadoop User-verse
经过Impala,不管使用SQL查询仍是BI应用程序,更多用户均可以经过单个存储库和元数据存储从源代码经过分析与更多数据进行交互。
Impala raises the bar for SQL query performance on Apache Hadoop while retaining a familiar user experience. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. Furthermore, Impala uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries. (For that reason, Hive users can utilize Impala with little setup overhead.)
概观
Impala提升了Apache Hadoop上SQL查询性能的标准,同时保留了熟悉的用户体验。使用Impala,您能够实时查询数据,
不管是存储在HDFS仍是Apache HBase中 - 包括SELECT,JOIN和聚合函数。
此外,Impala与Apache Hive同样使用相同的元数据,SQL语法(Hive SQL),
ODBC驱动程序和用户界面(Hue Beeswax),
为面向批处理或实时查询提供熟悉且统一的平台。
(出于这个缘由,Hive用户能够在安装开销很小的状况下使用Impala。)
To avoid latency, Impala circumvents MapReduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration.
设计
为了不延迟,Impala规避MapReduce经过专用分布式查询引擎直接访问数据,该引擎很是相似于商业并行RDBMS中的数据。
结果是性能比Hive更高,这取决于查询和配置的类型。
There are many advantages to this approach over alternative approaches for querying Hadoop data, including::
这种方法与查询Hadoop数据的其余方法相比有许多优势,包括::
因为数据节点上的本地处理,避免了网络瓶颈。
能够使用单一的,开放的和统一的元数据存储。
昂贵的数据格式转换是没必要要的,所以不会产生开销。
全部数据均可以当即查询,而不会延迟ETL。
全部硬件都用于Impala查询以及MapReduce。
只须要一个机器池来扩展。
https://study.163.com/provider/400000000398149/index.htm?share=2&shareId=400000000398149( 欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章)