February 18, 2004

Working with industrial-strength databases

Author: JT Smith

It's rare to find a PHP/PERL/Python developer
who is also a database administrator. This is understandable. If they really
spent time doing real database administration, their application development
skills (or, at the very least, timelines) would suffer greatly for it. However,
in some situations, where the back end is concerned, what you don't know can
have a non-trivial and negative affect on your application's performance or
security. At the very least, it will cause you to write a
significant amount of front end code to do things a database
feature might do for you with just a line or two of SQL or the tweaking of a
configuration directive.It is for this reason that I thought it best to describe the commonly-hyped
features of more fully-featured database systems in terms that can easily be
understood by those who don't have the time to do all of the investigative
work themselves.

Data Integrity

Data integrity, in an overwhelmingly small nutshell, means that an
operation on one piece of data cannot invalidate, orphan, or otherwise
corrupt other pieces of data in a database. Data integrity can be enforced
in many ways, but I'll just cover the basics here:

Enforcing Data Integrity In the Code

This is what currently goes on more regularly than most people would
probably like to admit. This is probably due in large part to the fact that
in many open source projects, there is no 'proper' database administrator
(DBA) involved in any phase of the project. The developer is the DBA!

Here's an example of enforcing data integrity 'in the code' - don't fret -
you won't have to be a coder to get this:

You have a database with 3 database tables, named 'people', 'places' and
'things', respectively. Every person in the 'people' table lives in a 'place'
and has one or more 'things'. Even if you've done only simple database
applications, you can probably envision that there's an 'id' field in the
'things' table, which corresponds to an 'id' in the 'people' table, so you
can match up people with the things they own. But what happens when a person
goes away and you need to delete the record in 'people' for this person?
Quite naturally, you'd need to also delete their 'things'! In a large many
software packages, this means the application does a database call, saying
'DELETE FROM people WHERE id = 1'. The application then does an additional
database call, saying 'DELETE FROM things WHERE id = 1'. This is not to even
mention what to do about the 'places' table!

If you've seen similar situations in your applications, and you've handled
similar problems in the same way, you have, indeed, enforced data integrity
in your own code. You have also (quite possibly unwittingly) done the
database's job, and made your own job more complex as a result. Enforcing
data integrity solely through the use of front end application code will
heretofore be referred to as 'the wrong way'.

Enforcing Data Integrity in the Database

Enforcing data integrity is
a job that has fallen squarely on the shoulders of the back end database for
many years. The simplest way to handle this task in the most database-agnostic
way possible is through the use of foreign keys. In older releases of
MySQL, foreign keys are only supported at the syntax level. What this means is
that defining a foreign key in the SQL code won't cause MySQL to make loud
belching noises, but the relationships you meant to form with them are not
enforced.

In some newer 3.x versions of MySQL using InnoDB tables, some
functionality related to foreign keys has been supported. In the 4.x
versions and upward, foreign keys are starting to become more usable.
PostgreSQL has had very good, mature support for foreign keys for some time. No
matter which database you use, you should be using foreign keys. I'll refer
back to our 3 example tables to explain how they basically work.

In the earlier example, you had to code all of the checks involved in making
sure that deleting a record in one table didn't invalidate data in other
tables. Now, with a foreign key, things are a bit smoother. You define the 'id'
field in your 'things' table as a foreign key referencing the corresponding
'id' field in the 'people' table. You add a restriction to the foreign key
definition along the lines of 'ON DELETE CASCADE' or 'ON DELETE RESTRICT', and
with those three words, you've potentially saved yourself many lines of
front-end code.

In the first case, a DELETE operation on the 'people' table will 'cascade' down
to the child table, such that all of the records in 'things' that had the same
key value will also be deleted. In the second case, your database will throw an
error if you try to delete from the 'people' table someone who still has
'things' in the database.

There's not nearly enough room here to go into a detailed discussion on how
foreign keys make your life easier. However, if you've sifted through huge
lists of database features not knowing which ones were really important,
foreign keys, to most administrators, rank right up there with... I dunno...
tables?!

Views

Views are one of the DBA's best friends. Sure, you can string together some
subselects and fancy SQL hacks to sort of emulate the functionality of a view,
but it's not nearly as clean and elegant, and it's not nearly as easy to use or
maintain. MySQL 5.0 or 5.1 will reportedly have views. PostgreSQL already has
usable, though not perfect, support for views.

You can think of views as a 'virtual table' that stores a particularly useful
view of your data. So, for example, using our imaginary tables again, if you
had an application that listed all of the things in a particular place, you
would generally do a join on the two tables involved, and select a subset of
the data from the result of the join. You would repeat this activity every time
you needed data that depended on fields from more than one table... ad nauseum.

Instead, what you can do is basically store this join in the database as a
view, and use a much simpler 'SELECT' statement - against the view itself. A
view, to an application, looks, smells and acts just like a table. It doesn't
know the difference. It's quite a lot like assigning a value (in the form of an
SQL statement) to a variable (in the form of the view name), and then calling
the variable to get some subset of the data it holds.

If you're thinking this sounds like an abstraction layer over your 'base' data
tables, you've pretty much got it. As an application developer, this allows you
the freedom to make changes to the base tables between releases for the purpose
of adding functionality (or removing it), without greatly affecting the
front-end code, without the need for a complex install routine that has to
migrate tons of data to an entirely new database design, and without requiring
your users to bear the pain of some crazy, manual data migration. As long as the view of
the data that the application needs can be maintained, everything should 'just
work'(tm). You could even rewrite the view definitions if necessary, to allow
for name changes in the base tables or columns. As long as the end result looks
like the same table the application has always called upon, your code can
remain unchanged.

For applications (or parts of applications) that provide 'read only' services
to the end user (such as report generation), views also act as a layer of
security for your base tables. While your base tables pretty much have to be
read-write, your views can certainly be made 'read only'. In addition, in the
event that the application is somehow hijacked, the malicious user would have a
tough time finding a way to write to a read-only view, and guessing the base
table names would be quite difficult.

In conclusion
I do hope that this information provides you with some inspiration to go off
and find ways to exploit (slightly) more advanced features of the back end to help your
applications be more robust, more secure, faster, and easier to maintain and
develop. While you could get by treating a database as an arbitrary collection
of tables, with some research, you'll find many features that are easy to use,
and have a very large impact on your application.

Indeed, it is entirely possible, with the use of views, temporary
tables, stored procedures, triggers, views and other advanced database
features, to offload entire sections of an applications functionality to the
database. Furthermore, though this may sound extremely impractical, some
corporate policies greatly restrict the operations that can be directly
performed on data using front-end application code, preferring to have those
applications call stored procedures and other back-end functionality developed
by the database administration and development team. The reasons for this
revolve only partially around performance and flexibility. Other reasons
revolve around accountability and data security, or the fact that they want
their C-coders to code C, not SQL - so it's easier to call a function than
write the SQL the function represents.

This is just a light intro. There is much more to discover. I will leave you
simply by saying that those who develop database software build in advanced
and, at times, complex features for a reason. Being that databases are pretty
useless without applications to access them, it stands to reason that some of
these features might be useful to application developers. Do your research, and
use 'the right tool for the job'.

Click Here!