TiDB
TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads[3]. It is MySQL compatible and can provide horizontal scalability, strong consistency, and high availability. It is developed and supported primarily by PingCAP, Inc. and licensed under Apache 2.0. TiDB drew its initial design inspiration from Google's Spanner[4] and F1[5] papers.[6]
Developer(s) | PingCAP Inc. |
---|---|
Initial release | October 15, 2017[1] |
Stable release | 4.0.1[2]
/ 12 June 2020 |
Repository | |
Written in | Go (TiDB), Rust (TiKV) |
Available in | English, Chinese |
Type | NewSQL |
License | Apache 2.0 |
Website | https://pingcap.com/ |
TiDB was recognized by InfoWorld 2018 Bossie Award as one of the best open source software for data storage and analytics.[7]
History
PingCAP Inc., a software company founded in April, 2015, began developing TiDB after its founding. The company is the primary developer, maintainer, and driver of TiDB and its associated open-source communities. PingCAP is a venture-backed company; it announced its 50 million USD Series C round financing in September, 2018.[8]
Main Features
Horizontal Scalability
TiDB can expand both SQL processing and storage capacity by adding new nodes. This makes infrastructure capacity scaling easier and more flexible compare to traditional relational database which only scale vertically.
MySQL Compatibility
TiDB acts like it is a MySQL 5.7 server to applications. A user can continue to use all of the existing MySQL client libraries.[9] Because TiDB’s SQL processing layer is built from scratch, not a MySQL fork, its compatibility is not 100%, and there are known behavior differences between MySQL and TiDB.[10]
Distributed Transactions with Strong Consistency
TiDB internally shards table into small range-based chunks that are referred to as "regions".[11] Each region defaults to approximately 100MB in size, and TiDB uses a two-phase commit internally to ensure that regions are maintained in a transactionally consistent way.
Cloud Native
TiDB is designed to work in the cloud to make deployment, provisioning, operations, and maintenance flexible. The storage layer of TiDB, called TiKV, became a Cloud Native Computing Foundation member project in August, 2018, as a Sandbox level project.[12] The architecture of the TiDB platform also allows SQL processing and storage to be scaled independently of each other.
Minimize ETL for Analytics
TiDB can support both online transaction processing (OLTP) and online analytical processing (OLAP) workloads. This means that while a user may have traditionally transacted on MySQL and then Extracted, Transformed and Loaded (ETL) data into a column store for analytical processing, this step is no longer required.
High Availability
TiDB uses the Raft consensus algorithm[13] to ensure that data is highly available and safely replicated throughout storage in Raft groups. In the event of failure, a Raft group will automatically elect a new leader for the failed member, and self-heal the TiDB cluster without any required manual intervention. Failure and self-healing operations are transparent to the applications.
Deployment Methods
Kubernetes with Operator
TiDB can be deployed in a Kubernetes-enabled cloud environment by using TiDB Operator.[14] An Operator is a method of packaging, deploying, and managing a Kubernetes application. It is designed for running stateful workloads and was first introduced by CoreOS in 2016.[15]TiDB Operator[16] was originally developed by PingCAP and open-sourced in August, 2018.[17] TiDB Operator can be used to deploy TiDB on a laptop[18], Google Cloud Platform’s Google Kubernetes Engine,[19], and Amazon Web Services’ Elastic Container Service for Kubernetes.[20]
Tools
TiDB has a series of open-source tools built around it to help with data replication and migration for existing MySQL and MariaDB users.
Syncer and Data Migration (DM)
Syncer is a tool that supports full data migration or incremental data replication from MySQL or MariaDB instances into a TiDB cluster.[23] Data Migration (DM) is the second-generation iteration of Syncer that is suited for replicating data from already sharded MySQL or MariaDB tables to TiDB.[24] A common use case of Syncer/DM is to connect MySQL or MariaDB tables to TiDB, treating TiDB almost as a slave, then directly run analytical workloads on this TiDB cluster in near real-time.
Lightning
Lightning is a tool that supports high speed full-import of a large MySQL dump into a new TiDB cluster, providing a faster import experience than executing each SQL statement. This tool is used to quickly populate an initially empty TiDB cluster with much data, in order to speed up testing or production migration. The import speed improvement is achieved by parsing SQL statements into key-value pairs, then directly generate Sorted String Table (SST) files to RocksDB.[25]
TiDB-Binlog
TiDB-Binlog is a tool used to collect the logical changes made to a TiDB cluster. It is used to provide incremental backup and replication, either between two TiDB clusters, or from a TiDB cluster to another downstream platform.
It is similar in functionality to MySQL master-slave replication. The main difference is that since TiDB is a distributed database, the binlog generated by each TiDB instance needs to be merged and sorted according to the time of the transaction commit before being consumed downstream. [26]
User Cases
Currently, TiDB is used by nearly 1,000 companies, including Shopee, BookMyShow, Xiaomi, Zhihu, Meituan-Dianping, iQiyi, Zhuan Zhuan, Mobike, Yiguo.com, and Yuanfudao.com.
References
- "1.0 GA release notes".
- "TiDB 4.0.1 Release Notes".
- "How TiDB combines OLTP and OLAP in a distributed database".
- "Spanner: Google's Globally-Distributed Database".
- "F1: A Distributed SQL Database That Scales".
- "TiDB Brings Distributed Scalability to SQL".
- "The best open source software for data storage and analytics".
- "TiDB developer PingCAP wants to expand in North America after raising $50M Series C".
- "Meet TiDB: An open source NewSQL database".
- "Compatibility with MySQL".
- "TiKV Architecture".
- "CNCF to Host TiKV in the Sandbox".
- "The Raft Consensus Algorithm".
- "Database Operators Bring Stateful Workloads to Kubernetes".
- "Introducing Operators: Putting Operational Knowledge into Software".
- "TiDB Operator GitHub repo".
- "Introducing the Kubernetes Operator for TiDB".
- "Deploy TiDB to Kubernetes on Your Laptop".
- "Deploy TiDB, a distributed MySQL compatible database, to Kubernetes on Google Cloud".
- "Deploy TiDB, a distributed MySQL compatible database, on Kubernetes via AWS EKS".
- "Ansible Playbook for TiDB".
- "How to Spin Up an HTAP Database in 5 Minutes With TiDB + TiSpark".
- "Syncer User Guide".
- "DM GitHub Repo".
- "Introducing TiDB Lightning".
- "TiDB-Binlog Architecture Evolution and Implementation Principles".