Schema Evolution in
Objectivity/DB
INTRODUCTION
Objectivity/DB? supports high availability applications with a patented
schema evolution mechanism that allows developers to modify the structure
of persistent data without requiring the database to be taken off-line.
This increases the flexibility of the application development process,
reducing the technical and business risks associated with modifying
deployed applications.
As the requirements of a database
application evolve over time, changes are made to the definition of the
physical data structure, or "schema", of the data elements stored within
the database. Schema evolution is the process of redefining the persistent
datatypes in a database application. Data conversion is the step in the
process that converts the contents of the database from the old schema to
the new schema.
Objectivity/DB provides a mechanism for
implementing changes to existing applications with large, populated
databases. This ability to modify database schema "on-the-fly" provides
developers the option of making data structure changes to deployed
applications that would not otherwise be feasible. This reduces the risk
of application deployment and makes project management more
flexible.
The rest of this document describes Objectivity/DB schema
evolution in more detail. A general discussion of the restrictions placed
on schema evolution by relational technology is followed by a description,
including examples, of Objectivity/DB's schema evolution capabilities.
RELATIONAL SCHEMA EVOLUTION
Relational technology provides little support for schema evolution and
data conversion, offering, at best, the ability to add a statically
initialized column to a table. The bulk of the work for more complex
schema changes is the conversion of the existing data, which must be
performed after the database has been taken off-line.
Application Changes
Data is usually stored in a relational database in a normalized format
to provide a common view of the data across multiple applications.
Persistent data is defined strictly as rows and columns within tables. The
translation of data from the logical data structures of the application
into the tables that form the schema of the database is left to the
application.
As a result, relational database applications must
always be conscious of the tables, field definitions, alternate views of
tables, and, most importantly, table joins that must be performed during
normal execution. Since the database schema is defined by the same
mechanism that provides access to the database, SQL, every point in the
application that accesses the database must be altered to reflect the
change in the external data structure.
Of course, it is possible to
encapsulate the physical data structure in a relational database
application by providing alternate views of tables. The proper use of
modular programming techniques can isolate the knowledge of the physical
structure of the database. However, object oriented applications, built on
object databases, do this encapsulation as a natural part of the
development process, rather than as an extra step in the design,
programming, and administration of the database.
Data Conversion
Assume for a moment that the application is sufficiently modular to be
changed with a relatively small effort. What about the database that
already exists that is filled with operational data?
The effort
required to convert the data in an existing relational database to a new
schema is one that is quite familiar. Traditionally, changing the
definition of a table in a relational database requires shutting down the
database, converting the contents of the database, updating the
applications to use the new table definition, redistributing the
application, restarting the database, and allowing the users back on the
system.
Making a copy of some or all of the database and doing the
job offline allows the users to remain operational, but it raises the
problems of disk space and data integrity. The converted data will be out
of sync due to normal operations that occur during the conversion process,
requiring some form of update conflict resolution to be
performed.
The key issue with changing the schema is that the
on-line database must be converted at some point in time. There is no way
to avoid inconveniencing the end-users during this phase of the data
conversion process.
While the technique of schema evolution and
data conversion described above is also applicable to some object
databases, it is possible for an object database to provide assistance
with the schema evolution process. In particular, the conversion of
previously stored data is a process that is facilitated by
Objectivity/DB's schema evolution capabilities.
A Relational Example
Consider an example in which the performance of a relational database
application is found to be limited by the normalization of the data
model.
The administrators discover that one of the original
assumptions in their system design is false. Table A was expected to be
accessed independently of Table B most of the time, when actual usage
indicated that A is only used when B is accessed first. This lookup of A
for every B is performed through a join that is repeated over and over,
causing extremely poor performance. Denormalizing the physical
implementation of the data model, merging A information into each record
of B, would improve performance by eliminating the unnecessary join.
RDBMS do not provide support for such schema modifications. In order to
make the changes indicated in the example above, the development team must
perform some variation of the following general steps:
- Modify the application to use the new
schema
- Write a monolithic Upgrade Application that
performs three steps:
1. Reads the old data from the database
DB 2. Converts each record from the old schema to the new
schema 3. Writes the database with the new schema into DB'
- Kick all the users off and shut down the
database
- Perform the monolithic data
conversion
- Distribute the new version of the
application
- Let users run the new application
The Problems with the Relational Approach
The problems with the monolithic data conversion described above are
the lack of database availability during the data conversion process,
excess disk space requirements, and general risk involved.
Availability
User inconvenience can be reduced by the steps outlined above, but it
cannot be completely removed. The data conversion takes a finite amount of
time. The larger the database, the longer it is unavailable during data
conversion.
This puts a great deal of pressure on the development
staff to make the conversion process go smoothly. It also requires access
to the database when it is not being heavily used so that it can be shut
down without disrupting operations. This is not possible for many
applications, since they require the database to be available
continuously.
Disk Space
During data conversion itself, the data is copied from one place to
another, meaning that there will be two images of the database in storage.
This could, in the worst case, double the disk space
requirements.
If the entire database was copied during the
conversion program, the disk requirements would double. If the tables are
copied back into the same database, then obsolete versions of tables may
exist that can be archived or deleted, as appropriate.
Disk space
is a particular issue for schema evolution in object databases, since
object databases are able to hold significantly more operationally useful
data than relational databases.
Risk
Monolithic data conversion incurs tremendous risk to the schema
evolution process in terms of the lost business opportunities during the
data conversion time period. The business costs associated with the
database being unavailable are entirely application dependent, but can be
considerable in strategic applications.
The primary technical risk
involves data integrity. At some point, the users will all change over to
a new version of the end-user application to run against a new version of
the database that was created with a separate upgrade application. The
possibility of corrupting the database with two applications is greater
than with a single application.
SCHEMA EVOLUTION WITH OBJECTIVITY/DB
Objectivity/DB provides a robust schema evolution mechanism that
handles most schema changes quite simply, giving the developer control
over the timing and the granularity of the data conversion process. A
developer is able to alter the schema of a deployed application and
convert the existing database without forcing end-users off the database
during a lengthy off-line, monolithic data conversion
process.
Rather than have to convert the entire database at once,
only the objects whose definitions have changed are candidates for data
conversion. Those objects are referred to as "affected" objects. They may
be converted one at a time, or in various size groups. When converted,
affected objects are written back into the space in which they existed
before, allocating or freeing incremental disk space according to the type
of change being made to the schema.
During the data conversion
process, and in stark contrast to relational databases, the database
remains on-line for the business function it supports, minimizing business
risk. The technical risk is also minimized because in most cases the
"conversion program" and the "end-user application" are the same. Data is
converted automatically by Objectivity/DB in the end-user application,
which greatly reduces the risk of programming errors.
The remainder
of this discussion revolves around the type of schema changes that can be
made, and how the timing and granularity of the data conversion is
controlled by the developer.
Types of Schema Change
Many schema changes are possible, ranging from purely logical changes
(such as changing the name of a data member) to inheritance changes. The
basic types of schema changes supported by Objectivity/DB are:
- Logical changes
- Class member changes
- Association and reference changes
- Class changes
- Inheritance changes
Basic schema changes of each of these types can be handled
automatically by Objectivity/DB, with the optional use of Conversion
Functions as required.
Automatic
Conversion
Objectivity/DB handles many types of schema
changes automatically, such as the conversion of one primitive datatype to
another, the addition or deletion of new class members, and the
modification of the access control of a base class.
Conversion Functions
Conversion Functions are developer defined call-back
functions that provide an opportunity for application dependent processing
to be applied at the point of data conversion. The Conversion Function is
executed by the database engine during the automatic conversion of
affected objects. Each time an object of the old schema is accessed, the
Conversion Function is executed. When objects of the new schema are
accessed, the Conversion Function is not executed.
Conversion Modes
After the type of schema change has been specified, the issues of
timing and granularity of data conversion must be addressed. In other
words, we have to decide when to convert the existing data, and how much
of it to convert at a time.
Relational databases, and some object
databases, only provide monolithic data conversion. In object database
terms this is called "Immediate Mode" conversion, because all the data has
to be converted immediately before any user application can be given
access to the database. This makes the database completely unavailable to
the users.
By comparison, Objectivity/DB does
not limit access to the database during data conversion.
In
addition to Immediate Mode, Objectivity/DB also offers alternative
conversion modes that allow the application requirements to dictate the
timing and granularity of data conversion. Data conversion can either be
deferred until objects are physically accessed by an application, or
performed when the developer demands. These are known as Deferred and
On-Demand schema conversion, respectively, which never limit database
availability. Even Immediate Mode data conversion leaves the database
available, because only the affected objects are made unavailable.
Mode |
Granularity |
Deferred |
Object |
On-Demand |
Container |
|
Database |
|
Federated Database |
Immediate |
Federated Database |
Deferred Mode Conversion
Deferred Mode Conversion leaves the affected objects in the database in
the old form until they are required for use by the end-user application.
Objectivity/DB converts each affected object as it is used in the course
of normal end-user operations.
Deferred Mode, which encompasses the
majority of schema changes, is the easiest form of conversion from the
developer's standpoint: the end-user application is simply modified to use
the new schema. The process of changing the schema in the application
source code will automatically set the program up to convert affected
objects as they are encountered; i.e. in Deferred Mode. If a Conversion
Function is required to augment the data conversion, it would be added to
the end-user application.
The end-user simply receives a new
version of the application and operates it as before. The conversion takes
place in the database engine automatically. There is no need to stop all
the users from using the system for an extended period of time, because
down-time for an individual end-user is limited to the amount of time it
takes them to restart their application.
On-Demand Mode Conversion
On-Demand Mode Conversion does the same type of conversions as Deferred
Mode, defined above, but to groups of objects explicitly indicated by the
application developer at various points in an
application.
On-Demand Mode is implemented by calling a member
function for one of the data storage constructs in the end-user
application. The function call would be placed at that point where the
application encounters new containers, databases, or federated
databases.
Objects, and groups of objects, are flagged as they are
converted, so that each affected object is only converted once. Unless
On-Demand conversion is used for the entire federation, it is likely that
there will be some unconverted objects in the database. This is not a
problem, since they will simply be converted when the end-user application
tries to use them.
Immediate Mode Conversion
The use of Immediate Mode data conversion allows schema evolution to be
performed despite the presence of unidirectional associations and
inherited references in the schema. Objectivity/DB offers a flexible
implementation of Immediate Mode conversion that leaves the database
on-line, making only the affected objects unavailable during data
conversion. Note that Immediate Mode conversion is only required for two
specific types of schema change; replacing base classes and deleting
classes.
OTHER SCHEMA EVOLUTION ISSUES
Multiple Changes Over Time
Objectivity/DB can keep track of an arbitrary number of Deferred Mode
schema changes to the same class. This becomes an important issue when
multiple changes are made to the schema over time, and not all of the
objects in the database have been converted.
For example, assume
that a particular class is changed two or three times using Deferred Mode
data conversion. The database will contain both converted and unconverted
objects midway through a Deferred Mode conversion. Objectivity/DB allows
the subsequent schema evolution processes to be started, even though all
the data has not yet been converted from the earlier schema changes. As
objects are accessed, they will be converted to the newest schema
automatically. The above example assumes there are no user-defined
conversions being used; only one conversion function can be registered per
case per application.
Upgrade Applications
Upgrade Applications are primarily small programs that make one or more
calls to the On-Demand Mode member functions to convert objects in
containers, databases, or across the entire federation. Such an Upgrade
Application can usually be run in parallel with the new version of the
end-user application, with the knowledge that when it completes, all
affected objects will have been converted.
Of course, it is not
possible to anticipate every type of schema change. Objectivity/DB schema
evolution supports unanticipated schema changes through the sequential
execution of multiple schema changes. Some of these multiple step schema
changes will require an Upgrade Application to explicitly traverse all of
the affected objects prior to moving on to the next step in the schema
evolution.
In a similar fashion, schema changes that are complex in
nature, such as those where Immediate Mode conversion is required, are
dependent upon application-specific information to be provided in an
Upgrade Application in order to be able to apply integrity constraints
during the schema evolution process.
SCHEMA EVOLUTION SCENARIOS
Application Distribution
Objectivity/DB In Centralized Client/Server
Applications
Take the example of a repository built using Objectivity/DB, where the
end-users start and stop the client application each day. In this
scenario, the client applications are able to be redistributed as a normal
part of operations. In an application in which Objectivity/DB resides in
the client workstations, performing schema evolution simply requires
updating the client workstation applications.
The only time that the database would be "unavailable" is during the
brief moment when the client applications are being restarted.
Objectivity/DB In Server Application
Only
Removing Objectivity/DB from the client workstations changes the
situation.
This might be an advanced Web server application built with
Objectivity/DB, where the "client application" is an off-the-shelf Web
browser. Since the client and server processes are effectively decoupled
through the use of HTML, it is unnecessary to redistribute the client
portion of the application. The only time that the Web site would be
unavailable is during the restart of the Web server
application.
One way to prevent even this minor interruption of
service is with Objectivity/DB Data Replication Option, which allows an
individual server to be taken off-line for service, and brought back
on-line again, without disrupting access to replicated data in a federated
database.
SCHEMA EVOLUTION EXAMPLES
Adding Data Members
This example is the classic situation where a new piece of information
needs to be maintained in an object.
The steps to performing the schema evolution are quite simple.
- Change the schema and application to add the new
data type.
- Recompile, redistribute, and run the application as
before.
The conversion of objects residing in the database will be deferred
until they are accessed in the normal operation of the application.
Objectivity/DB can automatically initialize a new data member to a
predefined value. If the initial value must be calculated, a Conversion
Function is required.
Conversion Of Primitive Datatypes
In this example, the number of unique BufferIDs required was
underestimated. Converting BufferID from a short to a long will solve the
problem. The physical conversion of the object is shown below.
The steps to performing the schema evolution are quite simple.
- Change the schema and application to use the new
data type.
- Recompile, redistribute, and run the application as
before.
Logical Schema Change
In this example, no change to the physical structure of the persistent
objects is required. Objects of a persistent class each have an
association to another object, and new requirements dictate that the
visibility of the association be changed from private to public. Changing
the visibility of an association - or a member - requires only a logical
schema change.
The steps to performing such an operation are as follows:
- Change the schema to reflect the different access.
- Recompile, redistribute, and run the application as
before.
Modifying Inheritance
Objectivity/DB's schema evolution support is not limited to modifying
the contents of a class. It is also possible to modify the inheritance
relationships between existing classes in Objectivity/DB.
For
example, adding a non-persistent base class to a persistent class is a
schema change that can be implemented in Deferred Mode. The same is true
for removing a non-persistent base class. In this example, we also wish to
ensure that all the objects are converted in a finite amount of time.
The steps are the same as in the earlier examples:
- Change the end-user application to add or delete the
base class.
- Recompile, redistribute, and run the user
application as before.
Over time, most of the affected objects are likely to be converted. In
order to force the remaining affected objects to be converted, an Upgrade
Application can be written that calls the function to convert the
remaining affected objects in the federated database. If Conversion
Functions are used in the end-user application, they should also be used
in the Upgrade Application.
- Create an Upgrade Application that also converts the
objects in On-Demand mode against the entire Federated Database.
- Run the Upgrade Application to convert the affected
objects simultaneously with the execution of the end-user applications.
Managing Deployed Applications
Providing mechanisms to allow the migration of objects for changes in
the database schema meets the needs of maintaining a deployed database,
but how do you manage deploying the new schema? The traditional
requirement that you build the new schema on the same platform as the
existing database, is often a problem for embedded applications that are
deployed in a minimal environment. What is required is the ability to make
all changes to the schema in the development environment and then simply
take the revised schema to the field to marry it with the existing data.
Two tools, to dump and load schemas, are provided for just this purpose.
These tools can also be used to allow independent development of database
changes, which can then be merged into one schema.
CONCLUSION
Schema evolution is a key requirement in high availability
applications. Objectivity/DB provides powerful and flexible schema
evolution capabilities which clearly demonstrate our support of
mission-critical application environments in which the database must
remain available at all times.
Not only are application developers
able to make schema changes that were not possible before, but they are
able to do it easily with Objectivity/DB. The flexibility of Deferred and
On-Demand Mode data conversion allows the developer to select the timing
and granularity of the data conversion appropriate for the
application.
Objectivity/DB's support of on-line data conversion
minimizes the risk traditionally associated with making schema changes in
deployed applications. Application developers are better able to plan
incremental application modifications, reducing the risk of being locked
into a deployed application that might be inadequate to meet future needs.
|