So there’s this website called DB-Engines that ranks database systems according to their popularity. (They determine popularity based on various factors, including number of Google results for the name of that system, number of Stack Overflow questions about it, number of appearances in LinkedIn profiles, and so on.)

The DB-Engines rankings also classify systems by “database model”. For instance, they classify Redis, memcached, and Riak as “key-value stores”. Those three systems are, in that order, the top three key-value stores in the DB-Engines rankings. (That’s leaving out DynamoDB, which is listed as both a key-value store and a document store – but more on document stores in a moment.)

But any system of categorization that lumps Redis, memcached, and Riak together must be ignoring a lot. The most obvious thing that “key-value store” isn’t telling us here is anything about persistence. Redis and memcached – especially memcached, it seems to me – are more like caching layers that one might use in front of a database, rather than standalone databases themselves. They don’t work so well as persistent storage. Riak, on the other hand, is generally intended to be used as a persistent backing store (although Riak also gives you the option to use memory as a backend, say, for testing or debugging).

Aside from persistence, here are some other things that we don’t know about a system just from knowing that it is a “key-value store”:

  • How does the system accomplish horizontal scaling, if it does at all? Sharding? Consistent hashing? None of the above?
  • What other options are there for querying the data, besides just the ability to look up a key and get a value? Full-text search? Something else?
  • What consistency options does the system offer? Strong consistency? Eventual consistency? Something else?
  • How much does the design emphasize fault-tolerance? Does it deal with node failure gracefully?
  • How much does the design emphasize high availability?

So – while it’s certainly not incorrect to say that Redis is a key-value store, or that Riak is a key-value store, or that memcached is a key-value store – the term “key-value store” may not tell us what we need to know when we’re choosing what kind of data store to use for a given purpose. I would say the same for “document store”, “wide column store”, and some of the other DB-Engines categorizations. Indeed, Riak might have more in common with, say, Cassandra, which DB-Engines lists as a wide column store, than it does with the other key-value stores.

It might even make sense to think of wide column stores and document stores as specialized flavors of key-value stores: a wide column store could be thought of as a key-value store where the values are “columns”, or records that can be of wildly varying length (and the database is optimized to store such data in a way that uses space efficiently), and a document store could be thought of as a key-value store where the values are “documents” and have internal structure that the database knows something about and is able to query on. But those descriptions also don’t go very far toward answering the questions that I’m posing above, not to mention any number of other questions that I haven’t thought of. There are a lot of ways to slice up the space of choices, and there doesn’t seem to be any one right way to do so.

Comments