Concurrency control is one area where databases differentiate themselves. It is an area that sets a database apart from a file system and databases apart from each other.
As a programmer, it is vital that your database application works correctly under concurrent access conditions, and yet time and time again this is something people fail to test.
Techniques that work well if everything happens consecutively do not necessarily work so well when everyone does them simultaneously. If you don’t have a good grasp of how your particular database implements concurrency control mechanisms, then you will
•\ Corrupt the integrity of your data
•\ Have applications run slower than they should with a small number of users
•\ Decrease your applications’ ability to scale to a large number of users
Notice I don’t say, “you might…” or “you run the risk of…” but rather that invariably you will do these things. You will do these things without even realizing it. Without correct concurrency control, you will corrupt the integrity of your database because something that works in isolation will not work as you expect in a multiuser situation. Your application will run slower than it should because you’ll end up waiting for data. Your application will lose its ability to scale because of locking and contention issues. As the queues to access a resource get longer, the wait gets longer and longer.
An analogy here would be a backup at a tollbooth. If cars arrive in an orderly, predictable fashion, one after the other, there won’t ever be a backup. If many cars arrive simultaneously, queues start to form. Furthermore, the waiting time does not increase linearly with the number of cars at the booth. After a certain point, considerable additional time is spent “managing” the people who are waiting in line, as well as servicing them (the parallel in the database would be context switching).
Concurrency issues are the hardest to track down; the problem is similar to debugging a multithreaded program. The program may work fine in the controlled, artificial environment of the debugger but crashes horribly in the real world. For example, under race conditions, you find that two threads can end up modifying the same data structure simultaneously. These kinds of bugs are terribly hard to track down and fix. If you only test your application in isolation and then deploy it to dozens of concurrent users, you are likely to be (painfully) exposed to an undetected concurrency issue.
Over the next two sections, I’ll relate two small examples of how the lack of understanding concurrency control can ruin your data or inhibit performance and scalability.