1, Amazon DynamoDB,
2, Google BigQuery,
3, Azure SQL Server,
4, Azure Cosmos DB,
5, Amazon Redshift
In 2012, Amazon released the first cloud database DynamoDB and changed the database landscape forever. Since then, cloud databases have experienced a meteoric rise in terms of adoption and innovation. As the whole Software development industry moves towards cloud-native development, cloud databases will be increasingly more important in the coming days. Gartner has predicted that by the end of 2022, 75% of all databases will move to the Cloud:Gartner Says the Future of the Database Market Is the CloudBy 2022, 75% of all databases will be deployed or migrated to a cloud platform, with only 5% ever considered for…www.gartner.com
Why are cloud databases getting popular? In terms of database technology, the public cloud databases are not different from other SQL or NoSQL databases. However, the key selling point of the public cloud databases lies in database management and scaling.
In traditional SQL databases and many NoSQL databases, the application owners manage the databases, including replication, sharding, backup, restoration, scaling. But in cloud databases, the cloud provider manages the database.
Most cloud-native databases offer the following features along with the basic database management system:
- Horizontal Scaling with managed partitioning/sharding.
- Automatic Backup and restoration.
- High availability with guaranteed SLA.
- Cross data-center replication.
- Support different consistency levels (Strong consistency, eventual consistency).
- Cloud-Native.
- Support multiple-model.
- Move data to the edges with global distribution.
- Serverless.
- Planet scale.
Although the mainstream SQL and NoSQL databases are now trying to retrofit these features, they are not built from the ground-up for these needs.
In this article, I will rank the five most in-demand cloud-native database according to the following criteria:
- Key Features
- Popularity
- Trending
- Mainstream uses
- Bright Future
1. Amazon DynamoDB
During the Christmas sale of December 2004, Amazon learned the hard way that the centralized, strong, consistent RDBMS could not handle Web-Scale application load. With a strict consistency Model, relational structure, and 2-phase commit, the traditional SQL databases could not provide the high availability and horizontal scalability Amazon was looking for. Amazon Engineering team developed a new NoSQL database DynamoDB and released their findings in their Dynamo paper in 2007. Amazon Dynamo paper played a crucial role in the later development of the NoSQL databases like Cassandra, Riak.
Although DynamoDB was used as the primary database of Amazon’s shopping cart application, it was only made public in 2012. Since then, DynmoDB is the most popular public cloud database and one of the most popular AWS services.
5 Key Features
- It is a key-value and document-based NoSQL database.
- It is a fully managed, multi-region, multi-master, highly available database.
- It is designed for web-scale applications. It can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second.
- DynamoDB Accelerator (DAX) provides a fully managed in-memory cache.
- With its multi-region replications, it offers single-digit millisecond response time at any scale.
When to Use DynamoDB
- When AWS is the preferred public cloud provider.
- When a key-value or document database is needed.
- When hyper-scaling and high availability are preferred over consistency.
- When fully managed public cloud database is preferred.
- When Geospatial Data type is required.
When not to Use DynamoDB
- When AWS is not the preferred public cloud provider.
- As a primary database (OLTP) needing an ACID transactional guarantee.
- When on-prem Database is preferred due to regulations, data-protection, or as a key business requirement.
- When a columnar database or graph database is required.
- When Distributed SQL (NewSQL) database is required.
Alternatives
Popularity:
Amazon DynamoDB is one of the most used hyper-scale cloud databases. It is also one of the most AWS services.
In recent years, it is getting fierce competition from open-source databases (e.g., Cassandra, MongoDB) and other public cloud databases (Azure Cosmos DB).
As Amazon is the leading public cloud provider, DynamoDB is still the most popular NoSQL database in the public cloud.
According to the popular database ranking site DB-Engines, it is the second most popular public cloud database, just behind Azure SQL database:
The Stack Overflow Developer Survey 2020 has placed DynamoDB as the 11th most popular database for 2020. This is a feat considering that DynamoDB was the only public cloud database in that list:
Trending
Since its launch in 2012, DynamoDB is one of the trendiest Databases in the industry. DB-Engines shows a continuous growth of trending for DynamoDB in its whole life-span:
Google trending also shows a linear increase in popularity for DynamoDB over the last decade:
2. Google BigQuery
There are very few companies out there who have to deal with a massive dataset like Google. It is no wonder that Google is leading the BigData landscape in the 21st century with many novel ideas and innovations. At the beginning of this century, Google found the “One size fit for all” SQL databases are not good enough for Analytics workload. They developed a new Database, “Dremel,” for data-warehousing, i.e., handling a large volume of analytics Data. Google published a paper, “Dremel: Interactive Analysis of Web-Scale Datasets,” in 2010 to make their findings public.
Later, Google made their internal Dremel Database public as BigQuery Database in 2011. Since then, it is the leading and the most innovative Database for data warehousing and analytics load. Google Cloud (GCP) has a strong presence in the Data Storage landscape, and BigQuery is playing a pivotal role there.
5 Key Features
- Highly scalable, multi-cloud data warehouse solution with separate storage, compute, and processing.
- It has Serverless architecture with managed provisioning, maintenance, security, and scaling. It has automatic high availability with multi-location replicated storage.
- BigQuery ML enables users to create and execute machine learning models in BigQuery with standard SQL queries.
- Its high-speed streaming insertion API provides a robust foundation for real-time analytics.
- The BI Engine (in-memory analysis service) offers sub-second query response time and high concurrency for popular BI tools via standard ODBC/JDBC.
When to Use Big Query
- For a large-scale (petabyte) Data warehouse solution.
- When built-in ML and AI integration, e.g., TensorFlow, is desired.
- When real-time analytics is a vital requirement.
- When Geospatial Data type is required.
- When a serverless database is preferred.
When not to Use Big Query
- It’s multi-cloud offering, “BigCloud Omni,” is still in the “private alpha” stage. Use it in multi-cloud scenarios with caution.
- As a primary database (OLTP) needing an ACID transactional guarantee.
- When on-prem Database is preferred due to regulations, data-protection, or as crucial business confidentiality.
- When a document database or graph database is required.
- When the dataset is not large.
Alternatives
- Amazon Redshift
- Snowflake
- SAP Data Warehouse Cloud
- IBM Db2 Warehouse
- Azure Synapse Analytics
- Oracle Autonomous Data Warehouse
- Teradata
Popularity:
BigQuery has revolutionized Data Warehousing. It is the third most popular Cloud Database according to the DB-Engines ranking:
Trending
BigQuery achieved a steep rise in popularity over the years, as confirmed by the DB-Engines trending:
It is one of the trendiest Data Warehousing solution and generating lots of hype in recent years, as shown below:
3. Azure SQL Database
Microsoft is another big player in the database landscape. With Microsoft SQL Server, Microsoft dominated the commercial database market of the mid-range Windows Systems. When Microsoft adopted its Cloud-first approach during the 2010s, it offered a managed database service on top of the Microsoft SQL Server. In the following years, Microsoft managed SQL Server went through many changes.
Currently, Azure SQL Database is not only the managed database-as-a-service of the Microsoft SQL Server, but it also offers many other value-added extra features. For many enterprises, especially those who are already using Microsoft SQL Server, it is the preferred database in the cloud as they can easily lift-and-shift their on-prem Microsoft SQL Server to the cloud.
5 Key Features
- Managed SQL database on the Azure cloud.
- Along with a standalone database, it offers flexible Elastic pools to manage and scale multiple databases with variable loads in a cost-effective way.
- It offers a serverless compute tier.
- It is a Hyperscale SQL database with 99.99% availability even in infrastructure failures, almost instantaneous backup, and fast database restore.
- In addition to the standard tier, it offers a Hyperscale service tier for a very large-scale SQL dataset.
- Offers lift-and-shift move of the on-prem Microsoft SQL databases to Azure SQL database in an effortless way.
When to Use Azure SQL Database
- If Azure is the preferred public cloud provider.
- If a company already has Microsoft SQL servers that it wants to migrate to the Cloud.
- A hyper-scale SQL database is required for various reasons (large SQL Database, fast backup/restore, high throughput).
- If enterprise-grade data protection with encryption, authentication, limiting user access to the appropriate subset of the data is desired.
- If elastic pooling of database is desired for cost purpose.
When not to use Azure SQL Database
- When Azure is not the preferred public cloud provider.
- When on-prem Database is preferred due to regulations, data-protection, or as crucial business requirement.
- When a columnar database or graph database is required.
- When Distributed SQL (NewSQL) database is required.
- When data is semi-structured and unstructured.
Alternatives
Popularity:
Azure SQL database is not as disruptive or innovative as some other databases in this list. But there is a massive market of the managed SQL database in the cloud. In that domain, the Azure SQL database excels.
It is the most popular public cloud database as per DB-Engines ranking:
Trending
Azure SQL Database is not as trending as some other databases in this list. But it still is generating positive trends during the last decade with a high spike previous year:
Google trends also show a stable trend for the Azure SQL database.
4. Azure Cosmos DB
Microsoft is the traditional giant Tech company with a global presence. When Microsoft started the Cloud-first policy in 2010, they wanted to develop their own Planet-Scale NoSQL database to focus on maximum flexibility and developer friendliness. Finally, after seven years of intensive research and development, they have released their multi-model, multi-consistency, globally distributed Database Azure Cosmos DB in 2017. In many ways, Azure Cosmos DB introduced several Novel features in Database Technology. Although it is not the first Multi-model database, it is by far the most advanced multi-model database. It also offers additional developer-friendly features.
Today, Azure Cosmos DB is one of the fastest-growing Databases in the market. In modern days, the search for a “Master Database,” i.e., “one database to rule them all,” is a hot topic. Among all the potential “Master Database” candidates, Azure Cosmos DB is the most suitable candidate at this moment.
Key Features
- Multi-model, planet-scale NoSQL database for the Cloud.
- It supports almost all mainstream data models: Document database (Semi-structured data), Graph database for highly relational data, wide-column storage for high throughput data.
- It also offers multiple and already known access patterns and APIs: SQL, MongoDB API (Document database), Cassandra API (Wide-column database), and Gremlin (Graph Database).
- It offers the most advanced consistency levels with guaranteed SLA: strong, bounded staleness, session, consistent prefix, eventual.
- It is a globally distributed database system that allows reading and writing data from the database’s local replicas with single-digit millisecond latency.
When to Use Azure Cosmos DB
- When a multi-model SQL database is required.
- When a NoSQL database with an industry-standard API is required.
- When a globally distributed database with a flexible consistency level is required.
- When Microsoft Azure is the preferred public cloud.
- When a fully managed and server-less database is required.
When not to use Azure Cosmos DB
- When Microsoft Azure is not the preferred public cloud provider.
- When on-prem Database is preferred due to regulations, data-protection, or as key business confidentiality.
- When a data warehouse system is required.
- When Distributed SQL (NewSQL) database is required.
- If budget and cost are an issue, relatively expensive Cosmos DB is not a good option.
Alternatives
Popularity:
Azure Cosmos DB is the youngest database on this list and only in the market for the last four years. Nevertheless, it has experienced a very high adoption in the industry and ranks 4th in terms of cloud database popularity:
Trending
5. Amazon Redshift
As the leading and pioneer cloud provider, Amazon wanted to move fast. Amazon has, famously or infamously, taken many open-source data stores and built their AWS service offering on top of it. When Google shook the Data Warehouse scenario in 2011 with Big Query, Amazon took popular and innovative SQL database PostgreSQL and built their own data warehouse solution on top of it. In 2013, they released Amazon Redshift as an enterprise-grade cloud data warehouse solution.
Amazon Redshift is one of the leading data warehouse solutions thanks to the dominance of AWS in the public cloud landscape. On the flip side, Amazon Redshift is not moving as fast as its competitors (e.g., BigQuery, Snowflake) due to its strong dependency on PostgreSQL.
5 Key Features
- Fully-managed, Cloud-ready, petabyte-scale data warehouse solution.
- Works seamlessly with many AWS cloud and data services (S3, Amazon Athena, Amazon EMR, DynamoDB, and Amazon SageMaker).
- Native integration with AWS Analytics ecosystem ( AWS Glue for ETL, Amazon QuickSight for advanced BI, AWS Lake Formation for secure data lake).
- With its hardware-accelerated query cache AQUA, it can offer 10x better query performance.
- Its shared-nothing Massively Parallel Processing (MPP) results in caching, efficient storage, lightning-fast querying for analytics, and concurrent analysis.
When to Use Amazon Redshift
- For a large-scale (petabyte) Data warehouse solution.
- When Amazon is your public cloud provider.
- When various Amazon Data Analytics tools and Data Platforms are already in use.
- When a team is familiar with PostgreSQL syntax and connectivity.
- In addition to enhanced database security capabilities, Amazon also has an extensive integrated compliance program.
When not to Use Amazon Redshift
- When Amazon is not your public cloud provider.
- When built-in ML and AI integration, e.g., TensorFlow, is desired.
- As a primary database (OLTP) needing an ACID transactional guarantee.
- When on-prem Database is preferred due to regulations, data-protection, or as key business confidentiality.
- When a serverless data warehouse with instant horizontal scaling is a key requirement.
Alternatives
- BigQuery
- Snowflake
- SAP Data Warehouse Cloud
- IBM Db2 Warehouse
- Azure Synapse Analytics
- Oracle Autonomous Data Warehouse
- Teradata
Popularity:
In terms of popularity, Amazon Redshift is lagging behind other public cloud databases in this list, as shown below:
Trending
Amazon Redshift is not the trendiest Data warehouse solution in the market and lagging behind BigQuery and Snowflake. In recent years, its traction has flattened, as shown by the DB-Engines trending:
Conclusion
In this list, the Amazon SQL database is the only public cloud SQL database.
Amazon DynamoDB is the most used NoSQL database among the cloud databases.
Google BigQuery has revolutionized the Data Warehouse landscape and the most innovative data warehouse solution. Amazon Redshift is another popular data warehouse solution built on a PostgreSQL server.
Although relatively new, Azure Cosmos DB is a very promising database and a leading candidate for the master database.
Many other public cloud databases could not make in this shortlist. Among them, Google Spanner and Amazon Aurora are very promising in the landscape of Distributed SQL databases.
If you are already in a public cloud or planning to move to the public cloud, you should also consider a public cloud database. Public cloud databases are here to stay and will offer managed databases in different scenarios in the future.
Ref: https://towardsdatascience.com/5-best-public-cloud-database-to-use-in-2021-5fca5780f4ef