Re: Identifying Duplicate Records
- From: Peter Hallett <PeterHallett@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 18 Feb 2008 03:35:03 -0800
Gordon,
As a postscript to my earlier reply, examination of the SQL statement you so
kindly analysed for me shows that it is indeed that used in the Find
Duplicate query. I am very grateful for your explanation. It certainly
clarifies what seems a pretty arcane piece of code to someone, like me, who
tries to avoid writing SQL wherever possible. One day I must get around to
learning it but, in the mean time, I try to use VBA in what I term
‘high-level mode’. It is amazing what can be achieved without ever defining
a record set or writing SQL statements. Indeed, the duplicate records
problem represented one of the few occasions on which I have been truly
stumped over the last few years – and I have had to deal with some pretty
complex situations during that time. Discovering the Find Duplicate query,
however, has allowed me to plug another gap in my knowledge and further
enhanced my respect for Access. In the personal and small business
environment there is certainly very little that it can’t do.
It was clearly not a case of ‘two minds with but a single thought’ – mine
was just stubborn, and rather frustrating, slog – but we seem to have arrived
at the same solution by different routes. Meanwhile, I have had the chance
to study your response in detail and I have filed a printed copy for later
reference, together with the relevant sections of the other replies. Thank
you all, once again.
--
Peter Hallett
"gllincoln" wrote:
Hi Peter,
The first order of business would be to determine what column of data is
unique except when it isn't? <grin>
Phone number mught be a good one, particularly if you force a uniform format
on the phone numbers entered. Address is what you mentioned but nailing down
dupe addresses isn't necessarily straightforward. Unless you are running
your addresses through an address correction/validating scrub, accurately
finding all of the duplicate street addresses can get complicated.
Managing the address data for a couple bulk mail campaigns makes one cynical
regarding their fellow man's ability to accurately and/or consistently enter
a postal address into a form; maybe I'm a little prejudiced on this subject.
However, you might need to concatenate two or more columns into a single
entity to get a psuedo primary key (for querying purposes), to double-check
things. For example, if first and last names are two columns - you might
use FullName:[firstname] & " " & [lastname] in the query designer or '" &
[firstname] & ' ' & [lastname] & "' AS FullName in SQL.
Whatever you use, you then can use the results to flag the dupes for review
in one way or another, or bring up the dupe records to look at them.
The traditional dupe finder query looks be something like this:
SELECT * FROM mytable WHERE address=IN(SELECT address FROM mytable AS tmp
GROUP BY address HAVING Count(*)>1;);
This is a tricky little query to understand, from a beginner to intermediate
point of view. I didn't come up with it on my own - Access 95/97 used to
have a find duplicate records wizard. Deduping a set of records is such a
common chore that I soon had that criteria phrase memorized.
If you are using the query designer - you would put
=IN(SELECT address FROM mytable AS tmp GROUP BY address HAVING Count(*)>1;)
in the criteria box of the address column.
For those who don't quite get it (how this works) don't feel badly. I
struggled wth this one. Just had a mental block at first - I saw that it
worked, I memorized it character by character and used it. Later, when I
started using an occasional aggregating query and got stuck doing some
elaborate reports, I began to understand GROUP BY and then 'I got it'.
A quick overview for the slow learners like me.
Whatever you put inside the IN( ) (and we can have more than one item
separated by commas) will be compared to the left-hand side and return true
or false if we get a match or not.
5= IN(1,2,3) will be false, no match
5= IN(3,4,5) will be true, we have a match
To find our duplicate records we build a query inside the IN function that
will return only those values that occur more than once in the table.
In plain English what our query is saying, give us all the addresses (SELECT
address AS tmp) where we get more than one item (HAVING Count(*)>1) that is
the same when we group them together (GROUP BY).
This query inside the IN - returns a list similar to our IN example above.
When the query is running, each row's address column is being compared using
the IN function to an array of dupes only addresses (dupeaddress1,
dupeaddress2,dupeaddress3, etc)
Hope this helps...
Gordon
"Peter Hallett" <PeterHallett@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:049A1BFD-A581-488C-AE0B-D2636A98AFE7@xxxxxxxxxxxxxxxx
I have a dynamically created and amended table which occasionally contains
duplicate records. These duplicates must be allowed but need to be
flagged-up, when they occur, and marked in the table. Setting the No
Duplicates property is clearly inappropriate.
I can think of one or two ways of going about the job, including, perhaps
running two nearly identical queries, one with the unique records property
set and the other without, then comparing the results but the details of
subsequently identifying and marking the appropriate entries have started
to
suggest some rather prolix VBA and I have a suspicion that I have missed
something obvious or, at least, simpler. Has anyone got any thoughts?
--
Peter Hallett
.
- References:
- Re: Identifying Duplicate Records
- From: gllincoln
- Re: Identifying Duplicate Records
- Prev by Date: Re: Identifying Duplicate Records
- Next by Date: Using select as button controlsource
- Previous by thread: Re: Identifying Duplicate Records
- Next by thread: Nightmare Bug. !! Plz Help
- Index(es):