The latest release from Cockroach Labs of CockroachDB focuses on making it easier for developers to deploy and manage distributed databases. The highlight include new SQL syntax that can be entered on the command line for controlling the latency and availability of data for multi-region deployments. The 21.1 release being announced for general availability today is CockroachDB’s spring release – the company typically issues two major releases a year (spring and fall).
This is the direct follow-on to a promise made by CEO Spencer Kimball in a discussion with big on data bro Andrew Brust in a post about how the company would use the $160 million E funding round that was announced back in January.
The back story to all this is that distributed transaction databases are inherently complex by nature. It’s one thing to commit a transaction when you have a single master. In the cloud, there are several approaches for relational transaction databases, but most still rely on a single write master – even when compute and storage are separated to allow scale-out such as with Amazon Aurora and Microsoft Azure SQL Database Hyperscale. But over the past few years, we’ve seen new entrants, such as Yugabyte, and more recently TiDB (which we’ll cover in a future post), emerge, showing that demand for global databases are not no longer one-offs.
The challenge to global reads and writes is that they require special measures for maintaining ACID, not to mention thought about laying out the data based on expected usage patterns.
For instance, if a database is deployed across two or three world regions, where do you actually persist the data? Do you partition it by region and keep local data there, or would you have all records replicated worldwide? There are reasons for doing it either way, based on usage. On one hand, if most writes or update s are likely to be confined to data that pertains to a specific region, then for performance reasons, that data should be locally partitioned and live in its own region; the same would apply if countries in a region require data to reside within their national borders. Otherwise, if read and write patterns are likely to be global, data should then be replicated to some or all global regions.
In the new release, developers and DBAs can write SQL statements specifying, first, which clusters and which regions that database is to operate. Then, in another SQL command, the developer would then specify which region(s) the database will operate and geographically where the data should reside. Finally, the developer will specify “survivability,” which means designating regions and clusters for disaster recovery. With the new feature, developers who know SQL will be able to lay out CockroachDB without having to learn special configuration statements.
A related developer-friendly feature is standardizing on common JSON formats for log data from CockroachDB, so it can readily be ingested into observability tools such as DataDog or NewRelic. The new release also makes debugging and query optimization easier through EXPLAIN statements generated by CockroachDB’s query optimizer, which developers can then use to fix or tweak queries.
Before CockroachDB, and its inspiration, Google Cloud Spanner, were released, distributed databases were largely the domain of NoSQL key-value stores because of their more relaxed requirements for database consistency. But the spread of digital business, accelerated during the ACID pandemic, has spurred new apps and use cases demanding the ability for global reads and writes. As noted above, CockroachDB and Spanner no longer have this space to themselves. The new features in CockroachDB 21.1 are steps toward lowering the complexity barriers for developers. We’re waiting for the next shoe to drop, when Cockroach Labs makes these new SQL capabilities governing latency and availability into visual low code/no code tools.