May 8, 2008

Synchronize your databases with SqlSync

Author: Ben Martin

SqlSync lets you compare two databases to see which tuples have been added, removed, and changed. You can also use SqlSync to make one database a clone of another and maintain its contents to be that way. One benefit of using SqlSync to perform synchronization is that you can perform heterogeneous syncs -- for example, from MySQL to PostgreSQL.

No SqlSync packages are available in the standard repositories for Ubuntu, Fedora, or openSUSE. For the article I'll build from source using version 1.0.0-rc1 on a Fedora 8 64-bit machine. Although it is not mentioned on the project's freshmeat page or home page, SqlSync uses ODBC to access your databases. If you do not have the unixODBC development packages installed, the build will fail.

SqlSync doesn't use autotools to build itself, relying on Makefiles instead. The Makefile.log might hint at a reason if the build fails. I found that the build broke complaining about trying to make a shared library without the -fPICposition-independent code option. PIC code has the advantage that it can be loaded at different locations in memory; this is useful for compiling code for shared libraries, because they can be moved when multiple libraries would otherwise want the same address. When I attempted to build the static linked version I ran into another problem with linking. To solve it, I modified the Makefile.def file as shown below and ran make as shown to build SqlSync. This produces a statically linked tool src/sqlsync_static.

$ vi Makefile.def
- ODBC_STATIC_LIB=`odbc_config --static-libs` -L`odbc_config --lib-prefix` -liconv
+ ODBC_STATIC_LIB=`odbc_config --static-libs` -L`odbc_config --lib-prefix` -ldl
$ make WITH_SHARED=NO WITH_STATIC=YES clean all
...

The primary options to the sqlsync command specify the source and destination ODBC Data Source Name (DSN), any restriction on the tables and columns you want to compare, and whether you want to only show the queries that would perform the sync or actually run the sync as well. An ODBC DSN is a name that identifies a database, what driver to use, and other configuration parameters. If you run sqlsync without specifying to either show the queries or to update the destination database, it will show you a summary of the changes.

For initial testing I'll use two PostgreSQL databases: sourcedb and destinationdb. The initial database setup is shown below. Note that the database schema must exist in destinationdb prior to running sqlsync. The final command shown below will dump only the schema from sourcedb into destinationdb, meaning the tables will exist in destinationdb but there will be no tuples in destinationdb.

$ psql
ben=# create database sourcedb;
CREATE DATABASE
ben=# create database destinationdb;
CREATE DATABASE
ben=# \c sourcedb
sourcedb=# create table customers ( cid serial primary key, name varchar(100) ) ;
sourcedb=# create table info ( id serial primary key, cid int references customers(cid), note text );
sourcedb=# insert into customers values ( default, 'fred' );
sourcedb=# insert into customers values ( default, 'john' );
sourcedb=# insert into customers values ( default, 'peter' );
sourcedb=# insert into info values ( default,
(select cid from customers where name='fred'), 'fred is home' );
sourcedb=# insert into info values ( default,
(select cid from customers where name='john'), 'john is with fred' );
sourcedb=# insert into info values ( default,
(select cid from customers where name='fred'), 'a second note for him' );
sourcedb=# \q

$ pg_dump -c -s sourcedb | psql destinationdb

To make things clear, I'll use the same ODBC DSN as the postgresql database name. You can set up ODBC DSNs with the ODBCConfig graphical client. First I show a summary of the differences between the two ODBC DSNs, followed by the SQL queries that would bring the destinationdb into the same state as the sourcedb, using the sqlsync --show-queries option.

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-summary
Connected to dsn_src server sourcedb
Connected to dsn_dst server destinationdb
All available tables will be synchronized
User didn't specify list of tables to synchronize
Checking availability of specified tables
Comparing tables customers
Comparing tables customers completed
[BEGIN] -----------------------------
[SUMMARY] ---------------------------
Table customers
Number of columns 2
Equal rows 0
Different rows 0
Missing rows 3
Additional rows 0
[COLUMNS] ---------------------------
cid| name|
[DIFF] ------------------------------
[MISSING] ---------------------------
1| fred|
2| john|
3| peter|
[ADDITIONAL] ------------------------
[END] -------------------------------
Comparing tables info
Comparing tables info completed
[BEGIN] -----------------------------
[SUMMARY] ---------------------------
Table info
Number of columns 3
Equal rows 0
Different rows 0
Missing rows 3
Additional rows 0
[COLUMNS] ---------------------------
id| cid| note|
[DIFF] ------------------------------
[MISSING] ---------------------------
1| 1| fred is home|
2| 2| john is with fred|
3| 1| a second note for him|
[ADDITIONAL] ------------------------
[END] -------------------------------
Comparing tables sql_parts
Error occured while running SELECT query on source table,
probably generated query is corrupted for sql_parts table
[unixODBC]Error while executing the query (non-fatal);
ERROR: relation "sql_parts" does not exist
Comparing tables sql_parts completed
[BEGIN] -----------------------------
[SUMMARY] ---------------------------
Table sql_parts
...
[END] -------------------------------

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries
Connected to dsn_src server sourcedb
Connected to dsn_dst server destinationdb
...
Creating synchronization queries
INSERT INTO "customers" ("cid", "name") VALUES (1, 'fred');
INSERT INTO "customers" ("cid", "name") VALUES (2, 'john');
INSERT INTO "customers" ("cid", "name") VALUES (3, 'peter');
Creating synchronization queries completed
Comparing tables info
Comparing tables info completed
Creating synchronization queries
INSERT INTO "info" ("id", "cid", "note") VALUES (1, 1, 'fred is home');
INSERT INTO "info" ("id", "cid", "note") VALUES (2, 2, 'john is with fred');
INSERT INTO "info" ("id", "cid", "note") VALUES (3, 1, 'a second note for him');
Creating synchronization queries completed
...

The --run-queries command will execute the queries that you saw with the --show-queries option to bring the destination database into the same state as the source database. After this sync has been performed, I add a new tuple into the source database and destination database. The final command shows that sqlsync will add the new tuple from the source database and delete the new tuple that I added to the destination database.

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --run-queries
Connected to dsn_src server sourcedb
Connected to dsn_dst server destinationdb
...
Creating synchronization queries
Running UPDATE synchronization queries
Running DELETE synchronization queries
Running INSERT synchronization queries
Running synchronization query (0/3)
...

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries
Connected to dsn_src server sourcedb
Connected to dsn_dst server destinationdb
...
Creating synchronization queries
Creating synchronization queries completed
Comparing tables info
Comparing tables info completed
Creating synchronization queries
Creating synchronization queries completed
...

$ psql sourcedb
sourcedb=# insert into info values ( default, (select cid from customers where name='john'), 'another' );
sourcedb=# \c destinationdb
destinationdb=# insert into info (cid, note) values ( (select cid from customers where name='peter'), 'this will be gone' );
destinationdb=# \q

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries
...
Creating synchronization queries
DELETE FROM "info" WHERE "id"=20;
INSERT INTO "info" ("id", "cid", "note") VALUES (4, 2, 'another');
Creating synchronization queries completed

Next, I added a new tuple in the info table with a foreign key reference to the customers table, then synced it with sqlsync.

$ psql sourcedb
sourcedb=# insert into customers values ( default, 'luke' );
sourcedb=# insert into info values ( default, (select cid from customers where name='luke'), 'lucky' );
sourcedb=# \q

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries
...
INSERT INTO "customers" ("cid", "name") VALUES (4, 'luke');
...
DELETE FROM "info" WHERE "id"=20;
INSERT INTO "info" ("id", "cid", "note") VALUES (4, 2, 'another');
INSERT INTO "info" ("id", "cid", "note") VALUES (5, 4, 'lucky');

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --run-queries

Working across DBMS engines

To test out the cross-database synchronization, I created a MySQL database to sync into called destinationdb, with an ODBC DSN called mysql-destinationdb. You can use the isql command from unixODBC to connect to a database through ODBC, which comes in handy if you are troubleshooting ODBC access issues. Instead of using the serial data type for the MySQL database schema, I just use a plain integer type. I used the user and password options in order to connect to the MySQL database DSN as shown in the sqlsync command below.

$ isql mysql-destinationdb user passwd
SQL> create table customers ( cid int primary key, name varchar(100) ) ;
SQL> create table info ( id int primary key, cid int references customers(cid), note text );
SQL> quit

$ sqlsync_static --src-dsn sourcedb \
--dst-dsn mysql-destinationdb \
--dst-user user --dst-password passwd \
--show-queries
...
Creating synchronization queries
INSERT INTO `customers` (`cid`, `name`) VALUES (1, 'fred');
INSERT INTO `customers` (`cid`, `name`) VALUES (2, 'john');
INSERT INTO `customers` (`cid`, `name`) VALUES (3, 'peter');
INSERT INTO `customers` (`cid`, `name`) VALUES (5, 'luke');
...
INSERT INTO `info` (`id`, `cid`, `note`) VALUES (1, 1, 'fred is home');
INSERT INTO `info` (`id`, `cid`, `note`) VALUES (2, 2, 'john is with fred');
INSERT INTO `info` (`id`, `cid`, `note`) VALUES (3, 1, 'a second note for him');
INSERT INTO `info` (`id`, `cid`, `note`) VALUES (4, 2, 'another');
INSERT INTO `info` (`id`, `cid`, `note`) VALUES (6, 5, 'lucky');

You can restrict which tables and which columns from these tables will be synced using the --tables option to sqlsync. For example, I could sync only the name and note field using --tables customers[name],info[note]. To test this out I created a new database where the info table had only the id and note field. The sync is shown below.

The ability to pick out only some columns for syncing can be useful, but you cannot set up a table name mapping, for example syncing the info table into a limitedinfo table in the destination. There is nothing precluding such functionality being added to SqlSync in the future.

$ psql
ben# create database partdestdb
ben# \c partdestdb
partdestdb# create table info ( id serial primary key, note text );
partdestdb# \q

$ sqlsync_static --src-dsn sourcedb --dst-dsn partdestdb --tables info[id,note] --show-queries
Parsing tables and columns passed from command line
Parsing tables and columns completed
Connected to dsn_src server sourcedb
Connected to dsn_dst server partdestdb
Checking availability of specified tables
Comparing tables info
Comparing tables info completed
Creating synchronization queries
INSERT INTO "info" ("id", "note") VALUES (1, 'fred is home');
INSERT INTO "info" ("id", "note") VALUES (2, 'john is with fred');
INSERT INTO "info" ("id", "note") VALUES (3, 'a second note for him');
INSERT INTO "info" ("id", "note") VALUES (4, 'another');
INSERT INTO "info" ("id", "note") VALUES (6, 'lucky');
Creating synchronization queries completed

SqlSync might be worth considering if you maintain a backup of a database. Although many databases provide tools for replication, using SqlSync lets you back up the database using a different database server. This could be useful if you are migrating between two database engines and wish to maintain the database on two different database servers while you are testing.

While SqlSync is handy, it's not perfect. The ability to attempt to set up a schema in the destination database automatically would be a welcome addition.

Categories:

  • Databases
  • Tools & Utilities