Apache Druid

Druid
Original author(s)	Eric Tschetter; Fangjin Yang;
Developer(s)	Apache Druid
Stable release	0.18.1 / 13 May 2020
Repository	Druid Repository
Written in	Java
Operating system	Cross-platform
Type	distributed; real-time; column-oriented data store;
License	Apache License 2.0
Website	druid.apache.org

Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.[1] The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect the fact that the architecture of the system can shift to solve different types of data problems.

Druid is commonly used in business intelligence/OLAP applications to analyze high volumes of real-time and historical data.[2] Druid is used in production by technology companies such as Alibaba,[2] Airbnb,[2] Cisco,[3] , Deep.BI[2][4], eBay,[5] Lyft,[6] Netflix,[7] PayPal,[2] Pinterest,[8] Twitter,[9] Walmart,[10]Wikimedia Foundation [11] and Yahoo.[12]

History

Druid was started in 2011 to power the analytics product of Metamarkets. The project was open-sourced under the GPL license in October 2012,[13][14] and moved to an Apache License in February 2015.[15][16]

Over time, a number of organizations and companies have integrated Druid into their backend technology,[2] and committers have been added from numerous different organizations.[17]

Architecture

Fully deployed, Druid runs as a cluster of specialized processes (called nodes in Druid) to support a fault-tolerant architecture[18] where data is stored redundantly, and there is no single point of failure.[19] The cluster includes external dependencies for coordination (Apache ZooKeeper), metadata storage (e.g. MySQL, PostgreSQL, or Derby), and a deep storage facility (e.g. HDFS, or Amazon S3) for permanent data backup.

Query management

Client queries first hit broker nodes, which forward them to the appropriate data nodes (either historical or real-time). Since Druid segments may be partitioned, an incoming query can require data from multiple segments and partitions (or shards) stored on different nodes in the cluster. Brokers are able to learn which nodes have the required data, and also merge partial results before returning the aggregated result.

Cluster management

Operations relating to data management in historical nodes are overseen by coordinator nodes. Apache ZooKeeper is used to register all nodes, manage certain aspects of internode communications, and provide for leader elections.

Features

Low latency (streaming) data ingestion
Arbitrary slice and dice data exploration
Sub-second analytic queries
Approximate and exact computations

gollark: Nobody, not everything has to be weird and NSFW.

gollark: What happened to sleep?

gollark: Sometimes, when I need to be quieter, I use a touchscreen - if I tap the right side it triggers a right click, and the same for the left.

gollark: I say "RIGHT CLICK" and it triggers a right click.

gollark: I use voice recognition mouse buttons.

References

Hemsoth, Nicole. "Druid Summons Strength in Real-Time" Archived 2013-02-27 at the Wayback Machine, Datanami, 08 November 2012
druid. "Druid | Powered by Druid". druid.apache.org. Retrieved 2016-06-29.
Butler, Brandon. "Under the hood of Cisco's Tetration Analytics platform". Retrieved 2016-06-23.
"Real-time Stream Analytics and User Scoring Using Apache Druid, Flink & Cassandra at Deep.BI".
"Druid at Pulsar - ebay的专栏 - 博客频道 - CSDN.NET". blog.csdn.net. Retrieved 2016-06-23.
Streaming SQL and Druid by Arup Malakar, retrieved 2020-01-29
"The Netflix Tech Blog: Announcing Suro: Backbone of Netflix's Data Pipeline". techblog.netflix.com. Retrieved 2016-06-23.
Pinterest: Powering Ad Analytics with Apache Druid, retrieved 2020-01-29
"Interactive Analytics at MoPub: Querying Terabytes of Data in Seconds". blog.twitter.com. Retrieved 2020-01-29.
Nayak, Amaresh (2018-02-23). "Event Stream Analytics at Walmart with Druid". Medium. Retrieved 2020-01-29.
https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/60986
"Complementing Hadoop at Yahoo: Interactive Analytics with Druid". Retrieved 2016-06-23.
Tschetter, Eric. "Introducing Druid", druid.apache.org, 24 October 2012
Higginbotham, Stacey. "Metamarkets open sources Druid, its in-memory database", GigaOM, 24 October 2012
Harris, Derrick (2015-02-20). "The Druid real-time database moves to an Apache license". Retrieved 2015-08-04.
"Druid Gets Open Source-ier Under the Apache License". Retrieved 2015-08-04.
druid. "Druid | Druid Community". druid.apache.org. Retrieved 2016-06-23.
Druid Project Documentation
Yang, Fangjin; Tschetter, Eric; Léauté, Xavier; Ray, Nelson; Merlino, Gian; Ganguli, Deep. "Druid: A Real-time Analytical Data Store", Metamarkets, retrieved 6 February 2014

External links

Official website

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[datanami-1] Hemsoth, Nicole. "Druid Summons Strength in Real-Time" Archived 2013-02-27 at the Wayback Machine, Datanami, 08 November 2012

[powered-2] ruid. "Druid | Powered by Druid". druid.apache.org. Retrieved 2016-06-29.

[3] Butler, Brandon. "Under the hood of Cisco's Tetration Analytics platform". Retrieved 2016-06-23.

[4] "Real-time Stream Analytics and User Scoring Using Apache Druid, Flink & Cassandra at Deep.BI".

[5] "Druid at Pulsar - ebay的专栏 - 博客频道 - CSDN.NET". blog.csdn.net. Retrieved 2016-06-23.

[6] Streaming SQL and Druid by Arup Malakar, retrieved 2020-01-29

[7] "The Netflix Tech Blog: Announcing Suro: Backbone of Netflix's Data Pipeline". techblog.netflix.com. Retrieved 2016-06-23.

[8] Pinterest: Powering Ad Analytics with Apache Druid, retrieved 2020-01-29

[9] "Interactive Analytics at MoPub: Querying Terabytes of Data in Seconds". blog.twitter.com. Retrieved 2020-01-29.

[10] Nayak, Amaresh (2018-02-23). "Event Stream Analytics at Walmart with Druid". Medium. Retrieved 2020-01-29.

[11] ttps://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/60986

[12] "Complementing Hadoop at Yahoo: Interactive Analytics with Druid". Retrieved 2016-06-23.

[druidblog-13] Tschetter, Eric. "Introducing Druid", druid.apache.org, 24 October 2012

[gigaom-14] Higginbotham, Stacey. "Metamarkets open sources Druid, its in-memory database", GigaOM, 24 October 2012

[15] Harris, Derrick (2015-02-20). "The Druid real-time database moves to an Apache license". Retrieved 2015-08-04.

[16] "Druid Gets Open Source-ier Under the Apache License". Retrieved 2015-08-04.

[17] ruid. "Druid | Druid Community". druid.apache.org. Retrieved 2016-06-23.

[druid-docs-18] Druid Project Documentation

[19] Yang, Fangjin; Tschetter, Eric; Léauté, Xavier; Ray, Nelson; Merlino, Gian; Ganguli, Deep. "Druid: A Real-time Analytical Data Store", Metamarkets, retrieved 6 February 2014

Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airflow Ambari Ant Aries Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Buildr Calcite Camel CarbonData Cassandra Cayenne Chemistry CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume Forrest Geronimo Giraph Gump Hadoop Hama HBase Helix Hive Impala Jackrabbit James Jena Jini JMeter Kafka Karaf Kudu Kylin Lucene Mahout Marmotta Maven MINA mod_perl MyFaces NetBeans Nutch OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pivot Qpid Roller RocketMQ Samza ServiceMix Shiro SINGA Sling Solr Spark Stanbol Storm SpamAssassin Sqoop Struts 1 Struts 2 Subversion SystemML Tapestry Thrift Tika Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Iceberg MXNet NuttX Superset Taverna XAP
Other projects	Batik Chainsaw FOP Ivy Log4j
Attic	Abdera Apex AxKit Beehive Bluesky iBATIS C++ Standard Library Cactus Click Continuum Deltacloud Etch Excalibur Harmony HiveMind Jakarta Lenya ODE Shale Shindig Slide Tuscany Wave Wink
Licenses	Apache License
Category