Re: Homegrown synchronization




In our tough IT environment, they don't
allow unapproved third-party software (like TSI Synchronizer).
Further IT has not approved the Jet Synchronizer for use and it
currently cannot be installed. Even if they do eventually allow
it, they may at some point in the future take it away. Also since
the Replication Manager is now no longer provided with Office
Professional 2003, I seriously doubt they will allow its
installation even if they can get a copy.

Show them the Jet Replication security update. Even if you were
doing direct replication, you'd need that, obviously, and that
includes the synchronizers. So, it *is* supplied direct from
Microsoft.

Are you saying that the synchronizer needed for indirect
synchronization is already on the workstations??! Direct
synchronization does work as I've already tested it and nothing was
added in the way of executables, dlls, etc.

Sounds like a good plan at this point.

Glad that someone with your experience agrees. I'd hate to put in the
time and then find I'd hit a dead end or had neglected to consider a
showstopper.


I've done this already in a master/slave relationship. Here's what
you do:

1. use the PKs to determine new additions. That is, do an outer join
and find the records that don't exist on the other end.

I have very few tables that use autonumber PKs, but these could be a
problem. I have one table where none of the human-enterable fields are
unique, therefore there has to be an autonumber PK. But generally I
use compound keys. I do not use Access to enforce referential
integrity and do it all in code (Okay, I got tired before deployment
and have about 5 out of 110 tables that do; but these are all small and
it would be relatively easy to do the referential integrity in code).


2. use DAO to walk the recordset fields to find updated records.
This is a matter of writing on-the-fly SQL that is something like
this:

UPDATE table1
INNER JOIN table1 ON table1.PK = table2.PK
SET table1.field = table2.field2
WHERE table1.field <> table2.field2

Now, it's more complicated than that, as you have to account for
Nulls. And you can't use the same WHERE clause for all data types.
What I usually do is have a CASE SELECT on the field type that
writes an appropriate WHERE clause.

Then, to run it, you walk the fields collection of a recordset with
the updatable fields, and then write the SQL for each field. This
SQL is *very* efficient and runs very quickly, and it only updates
the fields that have been changed.

I had forgotten that I've already coded this for about ten tables or
so. When I first started playing around with replication I was
concerned about mass deletions bloating the replicated backend. This
is because I have chosen to allow users to make backup copies of
certain tables directly in the backend that they are connected to.
That way if they screw up that table when editing/deleting, they can
quickly restore the original (i.e. they don't have to restore the
entire backend, just the table). But when they make a new backup,
previously I was simply deleting all records in the backup table.
Since some of my tables have 100000+ records, this really bloats
replicated backends. Therefore, I had already implemented the code you
suggest. Of course, now I have to do it for the other 100 tables :).
The GUIDs caused me problems when implementing this code and I couldn't
just use Select *. I had to explicitly write each field which was
ugly with one table that has 70+ fields (yes, these are virtually all
unique fields not contained in any other table). But now without the
GUIDs this should be a whole lot easier. Still it's a lot of work.


3. deletes are harder, because you can't delete a record that isn't
there. There are two approaches for this:

A. use a DELETED flag and a DELETED field. The flag means "a user
deleted this" and the DELETED field (a date/time field) says when
it was deleted from the other replica. Now, the problem here is
that you still need to get it around to all the other replicas,
so your topology is going to matter a lot. You won't be able to
synch with just any other database -- you'll only be able to have
pairs of databases. I've never done this except in a master/slave
relationship between two databases, so I didn't have to worry
about that, and don't know what the ideal solution would be.

B. do it like Jet replication and maintain a table of deletions.
Then all you'd be doing is checking for additions to that table
(just like in #1) and then processing the new records
accordingly. This seems to me to be much easier than A) in a
multi-database scenario like yours.

Yes, the deletes are a problem and I'll study your two suggestions
later.

But all of these are dependent on generating unique PKs in each
database that are the permanent PKs.




Your temp database is just your method of sending the data, so I
guess that works.
Yes. I thought about a text file, but the coding will be substantially
tougher (at least for me), than if I just use an Access database. There
will be very, very few cases where that temp database will be greater
than 2MB; most of the time I'm sure it will be less than 1MB after
compacting (this is based on my experience with my current temp
database which has about 65 tables.


5) When my app starts up, it will look for temp updateDBs in a
folder on the server. If it finds any that have not already been
processed (based on prior logs in a frontend table), it will copy
them from the server to the workstation. Then code in the app
will transfer "update/add" data from each table in the temp
updateDB(s), copied over from the server, to the equivalent local
backend table. If the flag for the record in the table in the
temp updateDB is set to 'Yes' that record will be deleted from the
local backend. As noted, my app will contain a front-end table
that will log the filename of each temp updateDB that was imported
as well as the import date.

I would suggest you'll want to archive both the temp databases and
the state of the real database before applying the changes from the
temp databases. This is so you can easily roll back to a previous
state if something goes wrong.

You'll want to do all your synching wrapped in transactions, too.

Good idea on the transaction wrappers. I already have those in my app,
so I've done it before. With regards to archiving, I think I'll do
this only for the server copy. I'm going to beef up the small backup
and compact application (intially provided by Candace Tripp and
modified by me) to include handling the updating of the server backend
file. For me that is the key backend file. If any of the workstation
backend files blow up, it's not a real problem because the launcher is
already setup to allow users to download a fresh copy of the backend
file from the server if they need to.


What about between the servers?
Yeah, this is where it gets trickier.

When the app opens it will look for temp
updateDB(s) to IMPORT and when it closes it will look to EXPORT
any table changes made by that user into a temp updateDB that will
be copied to the server. Again synchronizing table design changes
is not critical. There are other ways to handle this. If we
change the table design, then we will do it to the most current
copy we can find and run code to take all temp updateDB files on
the server and import them. Most of these updates will likely fail
due to key violations. My app launcher already looks for updates
to my app located on the server and will download them to the
workstation. It is a simple matter to also force downloading of
fresh copies of the backend from the server. Further, in this
scenario, even if the supposedly updated backend on the server
does not truly have all current data, a new copy of my app will
have an empty frontend table that logged previous imports of temp
updateDBs. So it will try again to import temp updateDBs once it
is on the workstation. Most if not all will fail due to key
violations, but it should ensure that the new backend file is
updated.

Sounds like there are pitfalls there, but the basic outline sounds
about right.

HOWEVER, I still need to have a current copy of the backend on
the server. My app launcher allows users to download a copy of
the backend from the server (in case it ever gets corrupted,or if
we've changed the table designs, or if they haven't logged on in
30 to 60 days).

But I'm unclear on how to keep the copy of the backend on the
server up-to-date. I guess I will need to have one or more of the
workstations handle this. My app already has a user setup form
that defines the group a user belongs to as well as what
permissions they have to see data. Only administrators can
add/delete users and set permissions. I could add a checkbox to
that user setup form that sets whether their copy of my app can
update the server backend. Only users that are within the LAN of
the SERVER would be allowed to do this; no users connecting to the
server via a WAN would be allowed to do this. When these users
close my app, the code would not only update their local copy but
also the server copy. An alternative would be to create a program
that would schedule these updates.

Yes, you'll have to schedule somehow. I know that it's possible to
launch an Access MDB that runs code automatically, and this works
when there's not a user logged on (I had a client whose daily
compacts were done this way).

I already have a small
program for scheduled backup and compaction of the server backend
so I could add updating of the server backend tables to this
program. It has a timer (IT won't allow us to use Windows
Scheduler).

Assholes.
You got that right :)


You could write a VBScript to do the same thing, you know, using
DAO. Would they let you schedule that?

This might
actually be the best solution. Users on the LAN where the server
is could just launch this small program, make sure the timer is
set and then leave their machines on over night.

It could work, but I'd be worried about it -- it's too easy for it
to get shut down and not run.
Yeah, you're right. Or users could forget to run it. HOWEVER, it just
occurred to me that I've already got programming in my app to send
e-mails. What if I just copy and paste this email code into the
backup/compact/update server backend app. When the update of the
server backend file is complete, it will send an e-mail to the
appropriate individual(s). E-mail received, everything went okay. No
e-mail, then there were most likely problems. Of course the
backup/compact/update app can be operated manually, so that's what the
administrator would have to do if the timed update failed, for whatever
reason.

Fortunately, keeping the data up-to-date is not super critical most of
the time. But there are certain times of the year where it is.

Also "housekeeping" is an issue. At what point do we start
getting rid of these temp updateDB files that are accumulating not
only in the server folder but also in the workstation folders.
Since the frontend will have a table that logs every temp updateDB
filename and a date of import, any entries where the date of
import is over 60 days old, for example, could be deleted from not
only the table but also the folder.

If I'm going to use the compact/backup/update app for the server
backend, I should probably log which temp updateDB files were already
imported. I don't think it is wise to store that in the
compact/backup/update front end. Probably should store that
information in the backend file so that it will be universally
available to anyone who uses the compact/backup/update app (each
administrator will need their own copy). Besides I might find it
useful in the future to have a history of updates files applied to the
server backend. Since I plan to include the user's logon names in the
temp updateDB files, I can learn who is doing what and if they're doing
what they are supposed to with regards to maintaining data.

What about zipping them up? That gives you a lot more overhead?

Also, you might want to consider not updating every time, and just
updating when a certain number of changes have accumulated.

Yeah, I could probably get away with that


These are just my initial thoughts and I would appreciate
comments, suggestions, etc.

It's a *very* hard task, and you're running into some of the same
problems with IT, because you really need scheduling to remove as
much latency as possible. But I think it's doable.

But I don't envy you the task.

Everything I've mentioned about IT is only getting worse. Most of the
users are disillusioned and literally have to practice TM and chant to
keep from going crazy with the slowness of the network connections.
The response of anything dealing with the server in general is slow.
For example, at my location, it takes 7 minutes to copy a 50 MB file
from my hard drive to the server using Windows Explorer. At home with
cable modem, I can download 50 MB in just over a minute. This by the
way is why I provide users with a local copy of the backend. My app is
mostly about complicated analyses on tens to hundreds of thousands of
rows from multiple tables with multiple criteria provided by users.
Some take several minutes to run even with the backend file on the
local PC. A comment I get from many users is that they like the speed
of the app (it's only because the backend is on the local PC). They
are used to applications (MAXIMO, etc) where they have to wait and wait
and wait to accomplish their work. I wanted my app to be used [since
it is really cool :)], so I decided I could forgo best practice and not
make everyone deal with an ultraslow backend on a LAN server (the fact
that we're not constantly updating data makes this feasible, IMHO). Of
course, if they want to, users can connect to the backend file on the
server. My launcher makes this very easy. Just click a radio button
and the server location is input into a textbox and that's where my app
will go to find the backend file. But no one does that, because they
prefer the speed. However, that means that the responsibility is then
on me to do what I can to make sure they have current data.

At this point it is really about me maintaining control. I would hate
to implement Jet indirect synchronization and then have them yank the
synchronizer in a year or two causing my application to fail when it
comes to updating replicas. No, this will be better. Also, I don't
care for the bloat and size of replicated backends. My backend went
from 130 MB to close to 300MB after it was replicated and after
multiple compactions.

One thing I didn't consider yet is conflicts. I don't expect this to
be a huge problem because data entry personnel are only allowed to
update data for their site. There are four sites and while everyone can
see data from all four sites, they cannot update somebody else's site
data even though they may have the rights to update their own site
data. This will assist in minimizing data conflicts. Further, not
everyone can update data. Only about 1 out of three users has been
given permissions to update data; the rest can only view and run
analyses. But conflicts still can and probably will happen, and we
won't really know about them until probably well after the fact. Right
now it looks like the last temp updateDB file to update a backend is
the winner if there is a conflict. If a new record's PK already
exists, then the fields will simply be updated instead of a new record
added. If a row has already been deleted, well then I guess in that
case, the first deletion won. I don't think it is going to be a show
stopper. I'm finding that my users don't really want to update the
data, they just want to use the data for their analyses and view data
they need for their work. Most want somebody else to do it.


--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

.