The problem: Buying, installing, and provisioning computers takes time, money, and space. Installing databases and security upgrades is a chore. To make matters worse, the demand to store and retrieve data will often spike, and that means buying enough to handle the maximum load, not just the average.
The solution is to rent what you need — in other words, what some companies call database-as-a-service (DBaaS). Companies are lining up to take a packet of your bits and promise to give you another copy sometime in the future whenever you want. For this, they’ll bill the tiniest amount. AWS Glacier, for instance, charges $0.004 per gigabyte per month. Sending them the data costs nothing. But if you want to retrieve a copy, well, that will cost you. If you want a faster response, that will cost more.
Many of the services want to go beyond simply storing raw bits and returning them. They’re adding extra data structures and processing that deliver faster answers. Finding the right values, for instance, can be sped up by adding indices. Some systems will improve computing statistical values like averages or maxima. Others are adding assurance that the data hasn’t changed.
Some of the services are also making life easier for programming teams by honoring the old APIs of the legacy software. Some are running versions of the original databases and billing them as a service. Others are just mimicking the old API, usually in SQL, so your software won’t know it’s not communicating with an imitation.
How the legacy players are approaching it
The major database companies recognize that customers are entranced by the simplicity of the cloud, and they’re launching many different services to fill that need. Oracle runs close to a dozen major services that deliver everything from its flagship product to versions of its newer options like MySQL or NoSQL. Older versions are available to support legacy applications that haven’t been upgraded to work with the newest versions of databases.
Microsoft is following the same path with its SQL Server, which is now not just one product but a family of them. The main product is available as either a standalone product running on a separate virtual machine or as a pure service that is billed by the transaction. The first is closer to the server in your datacenter that it’s replacing. The second is a flexible, modern pricing scheme that comes with the ability to scale quickly to handle loads as big as 100 terabytes.
IBM’s Db2 has also been transformed into a collection of products that can run either on your own machines or in IBM’s cloud. There are several different versions, including some tuned for Big Data applications spanning a Hadoop cluster. Others target particular types of applications, like tracking event-driven data feeds.
The three companies are competing heavily on price and total cost of ownership while adding new features that simplify management while speeding up responses. Oracle, for instance, has added a new Heatwave engine to its venerable MySQL product; the company claims it will speed up some complex analytical queries by a factor of 400. Moreover, Oracle also advertises that Heatwave is more than 2.7 times faster than other cloud versions, yet it’s one-third the price.
The upstarts
It’s probably not fair to put Amazon’s Web Services in the class with many of the startups given its size, but it’s much younger than some of the older databases like Oracle that began more than forty years ago. AWS now offers more than a dozen different database services, as well as another dozen like S3 that just store raw bits. AWS is also blurring the line between these classes, because S3 buckets can be searched with SQL queries.
Some of AWS’ offerings are new services, like its ledger databases. Others are managed instances of popular open source databases like PostgreSQL or MySQL. One service, Aurora, offers compatibility with both PostgreSQL and MySQL while promising to be three to five times faster than the stock version. It’s impossible to summarize the breadth and complexity of AWS’ data storage options in any short amount of time.
Google is another behemoth that may not seem like an upstart, but it is still much younger than the legacy databases. The company offers a wide range of databases with different interfaces. Its Firebase is notable not for its storage capacity or query style, but for the way it uses smart replication to make it simpler for mobile developers to create applications that merge local storage with cloud backups.
There are many other services that will also manage popular open source databases as instances with pre-defined images. Some clouds, like DigitalOcean, will start up a managed instance delivering MySQL or PostgreSQL. Others like Linode offer only block or object storage. The cloud companies want to offer commodity storage alongside commodity computation.
Others are competing on price and the way the bills are computed. Wasabi, for instance, says it’s one-fifth the cost of AWS’ S3. It’s not just the raw storage costs, because Wasabi also won’t bill for data egress or some API calls. Figuring out the cheapest way to store and retrieve your bits is not as simple as looking at one number.
What about governance?
The control of the data and the software will continue to be an issue for customers, and there is no simple solution. In the past, the proprietary databases would often squeeze customers when the license needed to be renewed. Migrating to a different database can be so time-consuming and fraught with danger that many database administrators simply paid the extra fees and learned to quietly curse vendor lock-in.
The most stable solution is to use a cloud service that provides a clone of an open source database, particularly one released under a license that places no restrictions on use. The databases can be easily migrated to any computer, including one that sits on your desk (if you have enough local storage). In the worst case, you can also download the code from an open source project, rebuild it, and maybe even enhance the software.
Many services are developing triage plans for emergencies that organize the data into tiers. The most mission-critical data might be regularly replicated to several on-site databases located in offices in multiple geographic locations. This might include client contact information, login credentials, and employee records. The goal is to maintain the minimum viable data necessary to bring the enterprise back to work as soon as possible.
Lower-tier information includes many of the features that may be nice to have but aren’t essential. They could be historical records of orders, marketing projects, or log files — anything that isn’t necessary for processing current orders and maintaining short-term continuity.
Is there anything a database as a service can’t do?
Largely, the prices are proportional to the amount to be stored, and that often means that the premium for the service is too much for larger businesses with relatively large data sets. Dropbox, for instance, started building its own datacenter in 2016, and some estimates suggested it saved tens of millions of dollars by doing so. Heavy users with long-term, sustained needs for data storage will find it simpler and dramatically cheaper to follow Dropbox’s lead.
But Dropbox is an extreme case of a company that stores quite a bit of data. Not only that, but its margins are slim because it’s also reselling the storage at a commodity price. It’s competing directly with the major cloud providers. Many other companies find the premium charged to be lower than hiring their own employees and running a similar process in their own datacenter.
This article is part of a series on enterprise database technology trends.