Once you understand that each database will implement features in a different way, another example of defensive programming to allow for portability is to layer your access to the database when necessary.
Let’s say you are programming using JDBC. If all you use is straight SQL SELECTs, INSERTs, UPDATEs, and DELETEs, you probably don’t need a layer of abstraction. You may very well be able to code the SQL directly in your application, as long as you limit the constructs you use to those supported by each of the databases you intend to support—and that you have verified work exactly the same (remember the NULL= NULL discussion!).
This means you’ll have poorly performing SQL, though—and you’ll apparently have in your head more knowledge about more databases than most anyone I know of (after all, that’s the only way to know if something has a chance of working the same on all databases!).
Another approach that is both more portable and offers better performance would be to use stored procedures to return resultsets. You will discover that every vendor’s database can return resultsets from stored procedures, but how they are returned is different. The actual stored procedure source code you must write is different for different databases.
Your two choices here are to either not use stored procedures to return resultsets or to implement different code for different databases. I would definitely follow the different-code-for-different-vendors method and use stored procedures heavily.
This might seem as if it would increase the time it takes to implement on a different database. However, you’ll find it is actually easier to implement on multiple databases withthis approach. Instead of having to find the perfect SQL that works on all databases (perhaps better on some than on others), you will implement the SQL that works best on that database.
You can do this outside of the application itself, which gives you more flexibility in tuning the application. You can fix a poorly performing query in the database, and deploy that fix immediately, without having to patch the application.
Additionally, you can take advantage of vendor extensions to SQL using this method freely. For example, Oracle supports a wide variety of SQL extensions, such as analytic functions, the SQL model clause, row pattern matching, and more. In Oracle, you are free to use these extensions to SQL since they are “outside” of the application (i.e., hidden in the database). In other databases, you would use whatever features they provide to achieve the same results, perhaps. You paid for these features so you might as well use them.
Another argument for this approach—developing specialized code for the database you will deploy on—is that finding a single developer (let alone a team of developers) who is savvy enough to understand the nuances of the differences between Oracle, SQL Server, and DB2 (let’s limit the discussion to three databases in this case) is virtually impossible. I’ve worked mostly with Oracle for the last 20 years (mostly, not exclusively). I learn something new about Oracle every single day I use it.
To suggest that I could be expert in three databases simultaneously and understand what the differences between all three are and how those differences will affect the “generic code” layer I’d have to build is highly questionable. I doubt I would be able to do that accurately or efficiently.
Also, consider that we are talking about individuals here; how many developers actually fully understand or use the database they currently have, let alone three of them? Searching for the unique individual who can develop bulletproof, scalable, database-independent routines is like searching for the holy grail.
Building a team of developers that can do this is impossible. Finding an Oracle expert, a DB2 expert, and a SQL Server expert and telling them “We need a transaction to do X, Y, and Z”—that’s relatively easy. They are told, “Here are your inputs, these are the outputs we need, and this is what this business process entails,” and from this they can produce transactional APIs (stored procedures) that fit the bill. Each will be implemented in the manner best for that particular database, according to that database’s unique set of capabilities. These developers are free to use the full power (or lack thereof, as the case may be) of the underlying database platform.
These are the same techniques developers who implement multiplatform code use. Oracle, for example, uses this technique in the development of its own database. There is a large amount of code (though a small percentage of the database code overall) called OSD (Operating System Dependent) code that is implemented specifically for eachplatform.
Using this layer of abstraction, Oracle is able to make use of many native OS features for performance and integration, without having to rewrite the majority of the database itself. The fact that Oracle can run as a multithreaded application on Windows and a multiprocess application on UNIX/Linux attests to this feature.
The mechanisms for interprocess communication are abstracted to such a level that they can be reimplemented on an OS-by-OS basis, allowing for radically different implementations that perform as well as an application written directly, and specifically, for that platform.
In addition to SQL syntactic differences, implementation differences, and differences in performance of the same query in different databases outlined earlier, there are the issues of concurrency controls, isolation levels, query consistency, and so on.
We cover these items in some detail in Chapter 7 of this book, and you’ll see how their differences may affect you. SQL92/SQL99 attempted to provide a straightforward definition of how a transaction should work and how isolation levels should be implemented, but in the end, you’ll get different results from different databases. It is all due to the implementation. In one database, an application will deadlock and block all over the place. In another database, the same exact application will run smoothly. In one database, the fact that you did block (physically serialize) was used to your advantage, but when you deploy on another database and it does not block, you get the wrong answer. Picking an application up and dropping it on another database takes a lot of hard work and effort, even if you followed the standard 100 percent.