Thursday, February 16, 2012

More on database consistency

I've written a few times about database consistency before, mainly in conjunction with NoSQL and the concept of Eventual consistency. Now, I'm about to do an update on the subject, as I have come to realize a few things.

From an oldtimer like myself, having been an SQL guy for 25 years, I remember Punk-rock and even The Beatles and I having hair growing out of my ears, what can be contributed? Well, let me beging with stating what I mean when I say Database consistency. What I mean is Consistency as the C in ACID (no, we aren't talking drugs here, we are talking databases). Let's see what the online authorative reference work on just about anything on this planet, from the size of J-Lo's feet to the number of Atoms in the universe (those two numbers are quite far apart by the way), Wikipedia: "The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof." In other words, consistency means that the databas is always in a consistent state, the different data items in it (rows, if you wish) are "in sync" with eachother. I think most of you agree with this notion.

Now, when it comes to NoSQL databases, like MongoDB, this terminology is different. These guys introduced Eventual consistency, which means that the database will eventually reach a consistent state with regards to a specific transaction that changes that "state" of the database. But there are multiple transactions at the same time, and they aren't necessarily, in an Eventual consistentcy model, consistent with eachother as they aren't on the same node. But the theory goes that some time, eventually, they will. If the system never stops, and transactions keep coming, then eventual consistens is determined to happen within 100 ms or less from the point in time when pigs fly. But if you stop all state changing transactions, then the state of the database will reach consistency. Eventually.

Now in NoSQL circles there is a thing called a Consistent read. If my database was consistent, then any read is consistent, right? And in the case of use SQL RDBMS folks, consistency is about the state of the database when I write to it? Well, if you have an eventual consistency model, where you have data distributed all over the place, things are different. To begin with, the basic thing that you have to make sure, and the NoSQL databases do this, is to ensure that the writes to the databases are all in order (we know this from MySQL also, and it is part of the issue with the MySQL slaves, and the NoSQL guys aren't fixing this particular bottleneck). And here we mean they are in order in each and every node. Across nodes, we don't care, which is where I get my abilility to scale out writes from!

A consistent read is a read where the data I am reading is in a consistent state, or sometime that my data is the most recent data. These two aren't always the same, but the second (reading most recent data) typically implies the former, although I assume this is not always the case. This is VERY different from the meaning of Database Consistency as we RDBMS folks look at it. All the same, the concept sure is useful, and as the NoSQL distributed systems doesn't need to keep the data consistent on a global level, a lot of shortcuts can be taken. But having Read Consistency has litte to do with Database Consistency. Your NoSQL fans will complain here and try to tell you that these achieve the same thing, but they don't. Achieving global Database Consistency costs an arm and a leg or two in performance, but the database is ALWAYS consistent.

So two different things, both with advantages and disadvantages, but they are STILL different! And the NoSQL folks will confuse things by allowing you not to have even Read Consistency, somewhat implying that turning it on means you get Database Consistency and that the Read Consistency model (which is very very simple by the way) means you get the effect of Database Consistency using Eventual Consistency. Nope. You don't. Which doesn't make it bad, but IT IS NOT THE SAME THING!

/Karlsson

2 comments:

rpbouman said...

Yup, good post.

Some NoSQLs (couchdb) don't claim just consistency, but claim to be ACID as well. That's fine, but that concept was described and given meaning in the context of a transaction model. In a DBMS that does not support transactions, almost all letters in ACID become meaningless or trivial, which makes the claim as good as empty.

(See here for a discussion on that topic http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/#comment-3545)

Unknown said...

Anders -

Great post.

The interesting discussion is about what appropriately resides at what level of a system, or to put it differently, what is better handled by reusable infrastructure and what is better handled by application programmers.

Too often ACID properties and consistency are discussed in semi-academic terms relating to when you absolutely need transactional semantics. That's a fair discussion, but more important is the point that ACID transactions simplify application development, delivering more reliable, cheaper and more easily understood systems. It is a very good abstraction for a large class of applications.

If we as an industry want to let go of ACID transactions then we will need to come up with something else that has equivalent benefits of system simplification. Or perhaps we should adopt a SQL/ACID database architecture that offers elastic scalability and related cloud-native characteristics :-)

Barry Morris, CEO NuoDB Inc.