Following an almost 2-year preview, AWS’s time series database, Amazon Timestream, is now generally available. Timestream is a serverless, purpose-built database that exposes time series data through SQL.
AWS is tapping into a segment that wasn’t exactly born overnight, but until recently, was mostly populated by niche open source platforms or relational databases with SQL extensions for capabilities such as time period definitions, temporal primary keys, and syntax for time-slicing tables. The explosion in data volumes has driven the emergence of databases specifically designed for the purpose.
It’s the use cases, stupid, not to mention the preponderance of use cases involves data that lives in the cloud and often comes in unpredictable torrents that has stirred interest in purpose-built time series databases. Use cases like gauging product demand in real time, analyzing clickstream data, managing smart utility grids, monitoring IT infrastructure, tracking commodity prices and capital markets, and real-time supply chain optimization are among those that have stirred demand for fit-for-purpose tome series platforms engineered for the cloud. It’s spurred open source and quasi open source platforms like InfluxDB Cloud and Timescale Cloud, and into the mix, now jumps Amazon Timestream.
Time series data stresses the design parameters of most SQL and NoSQL platforms. Among the sticking points is how to handle and partition sliding time windows, handling both numeric (e.g., meter reading) and alphanumeric text (e.g., “STATUS: OK”) as first-class entities, and then automating data lifecycle management so as not to clog high-performance tiers designed for landing of real-time data feeds.
As noted, Timestream is a database platform that AWS designed from the ground up. The SQL interface and multi-AZ automatic replication might conjure up similarities to Amazon Aurora, while the serverless architecture might make it look like a clone of DynamoDB. But Timestream is its own creature. It is serverless, with ability to autoscale out to ingesting trillions of events. It can automatically tier data from a durable in-memory store to magnetic storage. Unlike DynamoDB, Timestream is not exclusively an operational database, but instead, also designed for handling complex analytic queries that, with SQL support, can include complex table or time slice partition joins. Timestream also has SQL support for time series functions for approximation and interpolation.
Not surprisingly, the first connectors coming to Timestream at launch are focused on ingesting streaming and IoT data. It includes a connector for Amazon Kinesis Data Analytics (KDA) for Apache Flink. The KDA Flink adapter can be used with Amazon Kinesis, Amazon MSK and Apache Kafka. Additionally, for data coming from IT infrastructure and similar sources for DevOps monitoring, Timestream has connectors for Telegraf open source agent, and soon for Prometheus time series database for system data (both are often used together). Specifically, Timestream has a connector to pull data in from Telegraf; after release, AWS promises a bidirectional (read and write) connector to Prometheus.
As the data is exposed as relational, not surprisingly, there is a JDBC interface for SQL clients that in turn can hook in popular BI tools for visualization such as Amazon QuickSight. There is also support for open source Grafana. On the machine learning side, there is an interface to Amazon SageMaker, for developing predictive models. Amazon Timestream customers will be able to interact with the console, analytic apps and via the SDKs that AWS supports, such as Python, Go, Node and others. Additionally, the SDKs and connectors can be used for developing analytics and models in Jupyter notebooks.
In comparison to InfluxDB, both are serverless, but unlike InfluxDB Cloud, Timestream does not use a specialized query language. Like TimescaleDB Cloud, Timestream is queried through SQL, but it is not an adaptation of the PostgreSQL relational database.
On our wish list, we would like to see federated query from other AWS data platforms, such as Amazon Redshift for analytics, and Neptune, for providing a graph view that could provide the interrelationships insights that would be useful for connected cases, such as smart utility grids and supply chain optimization.