InfluxData, a time-series database platform provider that already has distinct cloud and open source/on-premises versions, is adding to the stew. It is announcing upgrades to the 2.0 generation of the open source platform that includes some features borrowing from its cloud offering, a smattering of incremental updates, and the announcement of a new open source project that will extend InfluxData’s reach to cloud object storage.
The open source platform has now added support for Flux, the GUI-based, scripting-oriented query language initially introduced with the InfluxDB Cloud 2.0 platform last spring. It’s very much a departure from the original InfluxQL that was a more declarative SQL-like language. It’s also a departure from rivals like Timescale and Amazon Timestream that have heavily embraced SQL
InfluxData’s rationale is that a scripting-oriented language, even if it is simplified with a drag and drop front end, is a lot more powerful when building analytic queries on time series data. Nonetheless, InfluxData is not pulling the plug on InfluxQL. Hold that thought.
Also part of this release, InfluxData is releasing jumpstart templates, which consist of single file-monitoring configurations for common use cases such as network and IoT sensor monitoring. This comes from a strategy of meeting users where they live – may as well make core use cases easier.
And then there is announcement of yet another new front in the war. InfluxData is unveiling IOx. It would extend the platform’s reach to data stored in Parquet format in cloud object storage. As envisioned, IOx would embrace a modern cloud-native architecture by separating storage from compute. Storage would act as the durability layer, while query, ingest, and indexing servers run in stateless Kubernetes clusters, with a management layer up top to maintain state and keep the whole operation sane. It would build atop existing open source building blocks including Apache Arrow Flight for marshalling data for high-speed transport.
Down the road, we would also assume that the project would develop APIs that could both subscribe to PubSub feeds from Kafka, or as part of change data streaming, publish update feeds from the database. There will be different APIs so you can query IOx, or whatever InfluxData brands as the commercially supported version of this open source project, with the language of your choice. Also, down the road, we wouldn’t be surprised if the project got renamed, as a Google search of IOx today brings up Cisco’s IoT application environment.
Our concern is that, for a modest sized company, InfluxData is juggling multiple data platforms. For now, there is the classic InfluxData platform, that now has a 2.0 version; two generations of cloud platforms; and now, a new project that adds yet another engine to extend InfluxData’s reach to cloud object storage with a cloud-native architecture. That’s a lot of targets for a single technology vendor that is not the size of AWS or Microsoft to support. InfluxData claims it has the resources to push this forward, having not even dipped into the $60M Series-D funding they secured back in February 2019. But it’s not just resources, but the ability to focus, which gets challenging when there are lots of platforms or engines to juggle.
Keep in mind that the InfluxDB platform is arguably the most popular open source time series database out there, with the company estimating over 400,000 daily active instances out there in the wild. No wonder there are third parties getting into the act, hosting their own InfluxDB clouds.
We’ll start with our own confusion – there are two InfluxData 2.0 platforms. When InfluxData 2.0 was introduced to us a year ago, it was positioned as the cloud-native successor to InfluxDB 1.0, and that the Influx QL query language was being deprecated. We assumed that InfluxCloud 2.0 would become the future design point for InfluxData’s platforms. InfluxData characterizes the two 2.0-version platforms (cloud and open source) as “complementary,” that are optimized for different use cases. The open source platform is single node, good for local and edge deployments, while the cloud is designed for scaling.
At the time, we observed that asking customers of a popular platform to make a jump to a new version with (at the time) limited backward compatibility was a riverboat gamble. We were also told at the time that InfluxQL was being deprecated. Since then, the existing customer base responded with, in effect, “Not so fast.” InfluxData will keep InfluxQL alive with APIs.
Clearly, the extension of common APIs and support of Flux on the current platform are positive steps toward providing a bridging strategy. APIs are popular ways of making cloud data platforms extensible and to the developer/user, it will make InfluxData’s platforms extensible to different audiences. But in the cases of Azure Cosmos DB; Google Cloud Firestore and Spanner; and with announcement of the new Stargate open source project, DataStax and Cassandra, they are with different APIs that expose the same underlying storage engine. In contrast, InfluxData is pulling this off with multiple storage engines in a manner reminiscent of MariaDB – it’s clearly the more challenging road.
We would have preferred that InfluxData adopt a more active platform convergence strategy that would have centered on a more singular target, InfluxCloud 2.0, with the assumption that the meeting of the paths would not happen overnight. And we would have liked the cloud platform to have been imagined for an extensible future for supporting different time series data types beyond logs and metrics, and with federated query to cloud storage in mind. We would have preferred that to devising new query and storage engines after the fact.