| CouchDB |
Redis |
MongoDB |
Riak |
Membase |
Neo4j |
Cassandra |
Hbase |
| Written in: Erlang |
Written in: C/C++ |
Written in: C++ |
Written in: Erlang & C, some Javascript |
Written in: Erlang & C |
Written in: Java |
Written in: Java |
Written in: Java |
| Main point: DB consistency, ease of use |
Main point: Blazing fast |
Main point: Retains some friendly properties of SQL. (Query, index) |
Main point: Fault tolerance |
Main point: Memcache compatible, but with persistence and clustering |
Main point: Graph database – connected data |
Main point: Best of BigTable and Dynamo |
Main point: Billions of rows X millions of columns |
| License: Apache |
License: BSD |
License: AGPL (Drivers: Apache) |
License: Apache |
License: Apache 2.0 |
License: GPL, some features AGPL/commercial |
License: Apache |
License: Apache |
| Protocol: HTTP/REST |
Protocol: Telnet-like |
Protocol: Custom, binary (BSON) |
Protocol: HTTP/REST or custom binary |
Protocol: memcached plus extensions |
Protocol: HTTP/REST (or embedding in Java) |
Protocol: Custom, binary (Thrift) |
Protocol: HTTP/REST (also Thrift) |
| Bi-directional (!) replication, |
Disk-backed in-memory database, |
Master/slave replication (auto failover with replica sets) |
Tunable trade-offs for distribution and replication (N, R, W) |
Very fast (200k+/sec) access of data by key |
Standalone, or embeddable into Java applications |
Tunable trade-offs for distribution and replication (N, R, W) |
Modeled after BigTable |
| continuous or ad-hoc, |
Currently without disk-swap (VM and Diskstore were abandoned) |
Sharding built-in |
Pre- and post-commit hooks in JavaScript or Erlang, for validation and security. |
Persistence to disk |
Full ACID conformity (including durable data) |
Querying by column, range of keys |
Map/reduce with Hadoop |
| with conflict detection, |
Master-slave replication |
Queries are javascript expressions |
Map/reduce in JavaScript or Erlang |
All nodes are identical (master-master replication) |
Both nodes and relationships can have metadata |
BigTable-like features: columns, column families |
Query predicate push down via server side scan and get filters |
| thus, master-master replication. (!) |
Simple values or hash tables by keys, |
Run arbitrary javascript functions server-side |
Links & link walking: use it as a graph database |
Provides memcached-style in-memory caching buckets, too |
Integrated pattern-matching-based query language (“Cypher”) |
Writes are much faster than reads (!) |
Optimizations for real time queries |
| MVCC – write operations do not block reads |
but complex operations like ZREVRANGEBYSCORE. |
Better update-in-place than CouchDB |
Secondary indices: search in metadata |
Write de-duplication to reduce IO |
Also the “Gremlin” graph traversal language can be used |
Map/reduce possible with Apache Hadoop |
A high performance Thrift gateway |
| Previous versions of documents are available |
INCR & co (good for rate limiting or statistics) |
Uses memory mapped files for data storage |
Large object support (Luwak) |
Very nice cluster-management web GUI |
Indexing of nodes and relationships |
I admit being a bit biased against it, because of the bloat and complexity it has partly because of Java (configuration, seeing exceptions, etc) |
HTTP supports XML, Protobuf, and binary |
| Crash-only (reliable) design |
Has sets (also union/diff/inter) |
Performance over features |
Comes in “open source” and “enterprise” editions |
Software upgrades without taking the DB offline |
Nice self-contained web admin |
|
Cascading, hive, and pig source and sink modules |
| Needs compacting from time to time |
Has lists (also a queue; blocking pop) |
Journaling (with –journal) is best turned on |
Full-text search, indexing, querying with Riak Search server (beta) |
Connection proxy for connection pooling and multiplexing (Moxi) |
Advanced path-finding with multiple algorithms |
|
Jruby-based (JIRB) shell |
| Views: embedded map/reduce |
Has hashes (objects of multiple fields) |
On 32bit systems, limited to ~2.5Gb |
In the process of migrating the storing backend from “Bitcask” to Google’s “LevelDB” |
|
Indexing of keys and relationships |
|
No single point of failure |
| Formatting views: lists & shows |
Sorted sets (high score table, good for range queries) |
An empty database takes up 192Mb |
Masterless multi-site replication replication and SNMP monitoring are commercially licensed |
|
Optimized for reads |
|
Rolling restart for configuration changes and minor upgrades |
| Server-side document validation possible |
Redis has transactions (!) |
GridFS to store big data + metadata (not actually an FS) |
|
|
Has transactions (in the Java API) |
|
Random access performance is like MySQL |
| Authentication possible |
Values can be set to expire (as in a cache) |
|
|
|
Scriptable in Groovy |
|
|
| Real-time updates via _changes (!) |
Pub/Sub lets one implement messaging (!) |
|
|
|
Online backup, advanced monitoring and High Availability is AGPL/commercial licensed |
|
|
| Attachment handling |
|
|
|
|
|
|
|
| thus, CouchApps (standalone js apps) |
|
|
|
|
|
|
|
| jQuery library included |
|
|
|
|
|
|
|
| http://couchapp.org/page/index |
http://redis.io/commands |
|
|
|
|
|
|
| Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important. |
Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). |
Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks. |
Best used: If you want something Cassandra-like (Dynamo-like), but no way you’re gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you’re ready to pay for multi-site replication. |
Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement. |
Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense. |
Best used: When you write more than you read (logging). If every component of the system must be in Java. (“No one gets fired for choosing Apache’s stuff.”) |
Best used: If you’re in love with BigTable. And when you need random, realtime read/write access to your Big Data. |
| For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments. |
For example: Stock prices. Analytics. Real-time data collection. Real-time communication. |
For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back. |
For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server. |
For example: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga). |
For example: Social relations, public transport links, road maps, network topologies. |
For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis. |
For example: Facebook Messaging Database (more general example coming soon) |