The goal is to fully use the facilities available to you, but ensure you can change the implementation on a case-by-case basis. As an analogy, Oracle is a portable application. It runs on many operating systems. On Windows, however, it runs the Windows way: using threads and other Windows-specific facilities. On UNIX/Linux, in contrast, Oracle runs as a multiprocess server, using individual processes to do what threads do on Windows—that’s the UNIX/Linux way. The “core Oracle” functionality is available on both platforms, but it is implemented in very different ways under the covers. Your database applications that must function on multiple databases will be the same.
For example, a common function of many database applications is the generation of a unique key for each row. When you insert the row, the system should automatically generate a key for you. Oracle has implemented the database object called a SEQUENCE for this. SYS_GUID() is another function that provides for unique keys as well. Informix has a SERIAL datatype. Sybase and SQL Server have an IDENTITY type. Each database has a way to do this. However, the methods are different, both in how you do it and the possible outcomes. So, to the knowledgeable developer, there are two paths that can be pursued:
•\ Develop a totally database-independent method of generating a unique key.
•\ Accommodate the different implementations and use different techniques when implementing keys in each database.
Note Oracle has an IDENTITY type column. Under the covers, it creates a sequence and defaults your column to that value—making it work very much like the SQL Server IDENTITY type or the MySQL AUTO_INCREMENT type.
The theoretical advantage of the first approach is that to move from database to database you need not change anything. I call it a “theoretical” advantage because the downside of this implementation is so huge that it makes this solution totally infeasible. What you’d have to do to develop a totally database-independent process is to create a table such as this:
Looks simple enough, but the outcomes (notice plural) are as follows:
•\ Only one user at a time may process a transaction row. You need to update that row to increment a counter, and this will cause your program to serialize on that operation. At best, one person at a time will generate a new value for this key.
•\ In Oracle (and the behavior might be different in other databases), all but the first user to attempt to concurrently perform this operation would receive the error “ORA-08177: can’t serialize access for this transaction” in the SERIALIZABLE isolation level.
For example, using a serializable transaction (which is more common in the J2EE environment, where many tools automatically use this as the default mode of isolation, often unbeknownst to the developers), you would observe the following behavior:
Now (while leaving the prior session connected), we’ll go to another SQL*Plus session and perform the same operation, a concurrent request for a unique ID:
This will block at this point, as only one transaction at a time can update the row. This demonstrates the first possible outcome—we would block and wait for the row. But since we’re using SERIALIZABLE in Oracle, we’ll observe the following behavior as we commit the first session’s transaction:
The second session will immediately display the following error:
That error would occur regardless of the ordering of the preceding commit statement. All it takes is for your transaction to attempt to modify any record that was modified by some other session since your transaction began.
So, that database-independent piece of logic really isn’t database independent at all. It may not even perform reliably in a single database, depending on the isolation level! Sometimes we block and wait; sometimes we get an error message. To say the end user would be upset in either case (wait a long time or wait a long time to get an error) is putting it mildly.
This issue is compounded by the fact that our transaction is much larger than just outlined. The UPDATE and SELECT in the example are only two statements of potentially many other statements that make up the transaction. We have yet to insert the row into the table with this key we just generated, and do whatever other work it takes to complete this transaction. This serialization will be a huge limiting factor in scaling. Think of the ramifications if this technique was used on websites that processed orders, and this was how we generated order numbers. There would be no multiuser concurrency, so we would be forced to do everything sequentially.
The correct approach to this problem is to use the best code for each database. In Oracle, there are two approaches shown as follows (assuming the table that needs the generated primary key is T):
Note An IDENTITY column must be of a numeric datatype.
But note that it did not really skip the sequence generation—the sequence was generated automatically by the database. If you didn’t have an automatically generated sequence, then you could use the following logic to have an autogenerated surrogate primary key:
Trigger created.
This will have the effect of automatically—and transparently—assigning a unique key to each row inserted. A more performance-driven approach would be simply this:
Insert into t ( pk, …. ) values ( t_seq.NEXTVAL, …. );
That is, skip the overhead of the trigger altogether (this is definitely my preferred approach). You can achieve the same effect in the other databases using their types. The CREATE TABLE syntax will be different, but the net results will be the same. Here, we’ve gone out of our way to use each database’s feature to generate a nonblocking, highly concurrent unique key and have introduced no real changes to the application code—all of the logic is contained in this case in the DDL.