Apache Impala

Apache Impala
Developer(s)	Apache Software Foundation
Initial release	April 28, 2013
Stable release	3.3.0 / August 22, 2019[1]
Repository	Impala Repository
Written in	C++, Java
Operating system	Cross-platform
Type	Relational Hadoop-analytics
License	Apache License 2.0
Website	impala.apache.org

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.[2] Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.[3]

Description

Apache Impala is a query engine that runs on Apache Hadoop. The project was announced in October 2012 with a public beta test distribution[4][5] and became generally available in May 2013.[6]

Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is integrated with Hadoop to use the same file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other Hadoop software.

Impala is promoted for analysts and data scientists to perform analytics on data stored in Hadoop via SQL or business intelligence tools. The result is that large-scale data processing (via MapReduce) and interactive queries can be done on the same system using the same data and metadata – removing the need to migrate data sets into specialized systems and/or proprietary formats simply to perform analysis.

Features include:

Supports HDFS and Apache HBase storage,
Reads Hadoop file formats, including text, LZO, SequenceFile, Avro, RCFile, and Parquet,
Supports Hadoop security (Kerberos authentication),
Fine-grained, role-based authorization with Apache Sentry,
Uses metadata, ODBC driver, and SQL syntax from Apache Hive.

In early 2013, a column-oriented file format called Parquet was announced for architectures including Impala.[7] In December 2013, Amazon Web Services announced support for Impala.[8] In early 2014, MapR added support for Impala.[9] In 2015, another format called Kudu was announced, which Cloudera proposed to donate to the Apache Software Foundation along with Impala.[10] Impala graduated to an Apache Top-Level Project (TLP) on 28 November 2017.[11]

gollark: My laptop has fish, my servers have zsh.

gollark: observe my immensely powerful laptop.

gollark: Intel actually *only* have open-source drivers, probably because their GPUs are mostly bad anyway and nobody buys them individually, so they can hardly get much out of artificial segmentation like Nvidia.

gollark: AMD and Intel are very good with open source drivers. Nvidia is pure evil, which is why Torvalds famously middle-fingered them.

gollark: You do, however, get nice things like package management, scripting which is actually good, that kind of thing.

References

"3.3.0 release". Retrieved 23 August 2019.
"Apache Impala". Retrieved 15 September 2017.
Cade Metz (October 24, 2012). "Man Busts Out of Google, Rebuilds Top-Secret Query Machine". Wired Magazine. Retrieved October 10, 2016.
Larry Digna (October 24, 2012). "Cloudera aims to bring real-time queries to Hadoop, big data". Between the lines blog. ZDNet. Retrieved January 20, 2014.
Andrew Brust (October 25, 2012). "Cloudera's Impala brings Hadoop to SQL and BI". ZDNet. Retrieved January 20, 2014.
Marcel Kornacker, Justin Erickson (May 1, 2013). "Cloudera Impala 1.0: It's Here, It's Real, It's Already the Standard for SQL on Hadoop". Archived from the original on April 13, 2014. Retrieved April 10, 2014.
"Parquet: Columnar Storage for Hadoop". Project web site. 2013. Retrieved January 20, 2014.
"Announcing Support for Impala with Amazon Elastic MapReduce". Amazon.com. December 12, 2013. Retrieved January 20, 2014.
"Impala for MapR". MapR.com. February 2, 2014. Retrieved April 10, 2014.
David Ramel (November 18, 2015). "Cloudera to Donate Impala and Kudu Big Data Projects to Apache". Application Development Trends. Retrieved October 10, 2016.
"The Apache Software Foundation Announces Apache® Impala™ as a Top-Level Project". November 28, 2017. Retrieved November 30, 2017.

External links

Apache Impala project website
Impala GitHub project source code

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "3.3.0 release". Retrieved 23 August 2019.

[Apache_Impala-2] "Apache Impala". Retrieved 15 September 2017.

[3] Cade Metz (October 24, 2012). "Man Busts Out of Google, Rebuilds Top-Secret Query Machine". Wired Magazine. Retrieved October 10, 2016.

[4] Larry Digna (October 24, 2012). "Cloudera aims to bring real-time queries to Hadoop, big data". Between the lines blog. ZDNet. Retrieved January 20, 2014.

[5] Andrew Brust (October 25, 2012). "Cloudera's Impala brings Hadoop to SQL and BI". ZDNet. Retrieved January 20, 2014.

[6] Marcel Kornacker, Justin Erickson (May 1, 2013). "Cloudera Impala 1.0: It's Here, It's Real, It's Already the Standard for SQL on Hadoop". Archived from the original on April 13, 2014. Retrieved April 10, 2014.

[7] "Parquet: Columnar Storage for Hadoop". Project web site. 2013. Retrieved January 20, 2014.

[8] "Announcing Support for Impala with Amazon Elastic MapReduce". Amazon.com. December 12, 2013. Retrieved January 20, 2014.

[9] "Impala for MapR". MapR.com. February 2, 2014. Retrieved April 10, 2014.

[10] David Ramel (November 18, 2015). "Cloudera to Donate Impala and Kudu Big Data Projects to Apache". Application Development Trends. Retrieved October 10, 2016.

[11] "The Apache Software Foundation Announces Apache® Impala™ as a Top-Level Project". November 28, 2017. Retrieved November 30, 2017.

Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airflow Ambari Ant Aries Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Buildr Calcite Camel CarbonData Cassandra Cayenne Chemistry CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume Forrest Geronimo Giraph Gump Hadoop Hama HBase Helix Hive Impala Jackrabbit James Jena Jini JMeter Kafka Karaf Kudu Kylin Lucene Mahout Marmotta Maven MINA mod_perl MyFaces NetBeans Nutch OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pivot Qpid Roller RocketMQ Samza ServiceMix Shiro SINGA Sling Solr Spark Stanbol Storm SpamAssassin Sqoop Struts 1 Struts 2 Subversion SystemML Tapestry Thrift Tika Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Iceberg MXNet NuttX Superset Taverna XAP
Other projects	Batik Chainsaw FOP Ivy Log4j
Attic	Abdera Apex AxKit Beehive Bluesky iBATIS C++ Standard Library Cactus Click Continuum Deltacloud Etch Excalibur Harmony HiveMind Jakarta Lenya ODE Shale Shindig Slide Tuscany Wave Wink
Licenses	Apache License
Category

Apache Impala

Description

See also

References

External links