Who They Are
PingCap is a database software company founded in Beijing in 2015. They quickly established themselves as one of the leading providers of the emergent application architecture referred to by Gartner, Inc. as Hybrid Transactional/Analytical Processing (HTAP). Their main products, TiDB and TiDB Cloud, are open-source distributed HTAP systems designed to give users greater access to both online transactions and analytics.
PingCap claims that TiDB has been adopted by approximately 1,500 companies worldwide. A few examples include the video sharing platform Dailymotion, financial services company Square, and the e-commerce app Shopee.
TiDB and TiDB Cloud
PingCap’s main offering, TiDB (Titanium Database), handles online transactional processing (OLTP) and online analytical processing (OLAP) in the same database, which results in faster real-time analytics than other distributed database architectures.
PingCap launched TiDB Cloud in June 2020, which is described as “TiDB as a Service,” available on Amazon Web Services and Google Cloud. While TiDB Cloud isn’t currently offered to Microsoft Azure customers, it should be noted that HTAP functionality is already available to Azure customers via Azure Synapse Link.
TiDB is an open-source NewSQL database released under the Apache 2.0 license. NewSQL refers to a class of relational database management systems that attempts to reconcile the horizontal and vertical scalability of NoSQL databases for OLTP with the desire for the ACID guarantees of traditional database systems.
TiDB uses the MySQL protocol, which means that existing applications can implement any MySQL connector to interact with the database. It also means that, for the most part, SQL functionality will remain unchanged.
On the technical side, users will notice some key differences between TiDB and a straight MySQL system. Architectures that currently utilize MySQL with Read Replicas, (such as AWS’ relational database server, for example) will notice that things work a bit differently with TiDB.
Scaling and Query Distribution
TiDB natively distributes query execution and storage, as opposed to MySQL, in which systems generally handle scaling with replication. It is common practice in MySQL to have a master DB and multiple subordinate databases, each of which store a complete copy of the data. This method is efficient and practical for read-heavy workloads, because query execution can be divided among the subordinate DBs. Write-heavy workloads pose a problem, however, since each subordinate needs to keep a complete, exact copy of the data.
The difference with TiDB is that queries are handled by a layer of TiDB servers. So, as scaling demands, TiDB servers can be added to the processing layer – a process which integrates nicely with Kubernetes ReplicaSets since TiDB servers are stateless, and the storage layer is responsible for all data persistence. The storage layer is composed of TiKV servers, which are distributed transactional key-value storage servers.
TiDB automatically shards data for tables into small chunks and distributes it among these TiKV servers. It creates three copies of each shard and keeps them in the TiKV cluster, however, no single server requires a full copy of the data. In other words, each TiKV server is both a master and a subordinate, since some servers will contain the primary data shard, while others will store the secondary.
TiDB supports cross-shard queries handled by a Placement Driver, which is a management server present in all TiDB clusters. These operations are all ACID compliant, and operations that modify data across multiple shards will use a multi-phase commit. In this way, TiDB servers would be similar to a proxy that converts SQL into batches of key-value requests, which are then delivered to the TiKVs. Tables are stored in TiKVs with range-based partitioning, which automatically balances to keep each partition at the size determined by the user (default is 96MB). The Placement driver keeps track of where each range is located and handles the rebalancing as needed.
The advantage here is that processing and storage are scaled independently, as opposed to MySQL systems, in which one of these bottlenecks is generally hit before the other. It also scales each incrementally, which allows for a much more efficient use of hardware.
The TiKV servers use the RocksDB storage engine, which can compress large datasets more efficiently than the traditional database storage engines that typically use B+tree data structures. RocksDB can also maintain insert performance, even when indexes can no longer fit in memory. Beyond this, TiDB supports an API which allows new storage engines to be utilized.
TiDB’s metrics are gathered in Prometheus and queried with Grafana, a popular analytics tech stack used by many operations teams. Insights can be obtained out-of-the-box, as Grafana immediately provides charts and graphs to visually represent the metrics gathered from TiDB. There are also a plethora of user configuration options for setting alarms and tracking KPIs.
PingCap customers ranked TiDB highly in terms of cost effectiveness and ease of use. The compatibility of TiDB with MySQL seems to be a major selling point for many customers, as well as the scalability and risk mitigation that is inherent with their products. Customers did, however, have a few complaints. Chief among them are the complexity of many of their components. It seems that some expertise is required when integrating new features, and while the open-source community is of tremendous value in this respect, it’s worth noting that those without a strong technical background will need to invest considerable time and effort in some cases. Other users mentioned that data synchronization seems to be buggy, and may be an issue that PingCap will want to address moving forward. DDL changes may also need to be updated, as users are currently unable to perform multiple DDL changes to a table at the same time.
PingCap’s TiDB seems to provide a cost-effective and easy to use solution for those seeking a distributed relational database. The fact that it is open-source and MySQL compatible makes TiDB even more attractive, and when considering its HTAP functionality, it becomes a clear choice for organizations with large volumes of data that require real-time analytics and consistent performance.