Re: Identifying Duplicate Records
- From: Peter Hallett <PeterHallett@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Sun, 17 Feb 2008 16:06:00 -0800
Thank you, Gordon, and everyone else who so kindly responded. As it happens,
however, the solution, as I suspected, was staring me in the face. Having
not quite got the answer I wanted, I spent another couple of hours with
Microsoft Access Help (never the most helpful of services imho!). Finally,
and not for the first time, I fell over what I was looking for in the
response to what appeared to be a barely relevant question. (I was trying
anything that appeared even vaguely related at that stage.)
For the benefit of anyone who, like me, has been looking at the thing for
years, without seeing it, just go to the database window, select Queries,
click New and there, in the list of options is Find Duplicates query. How on
earth I missed it is a mystery but I have given it a whirl and it works just
fine, faithfully listing the duplicated records in the required table. My
only satisfactions are that at least I did not have to wait for someone else
to point out that I was being dumb and I now have a solution where it was
becoming increasingly apparent that it was going to be very difficult to find
a suitable work-around.
--
Peter Hallett
"gllincoln" wrote:
Hi Peter,.
The first order of business would be to determine what column of data is
unique except when it isn't? <grin>
Phone number mught be a good one, particularly if you force a uniform format
on the phone numbers entered. Address is what you mentioned but nailing down
dupe addresses isn't necessarily straightforward. Unless you are running
your addresses through an address correction/validating scrub, accurately
finding all of the duplicate street addresses can get complicated.
Managing the address data for a couple bulk mail campaigns makes one cynical
regarding their fellow man's ability to accurately and/or consistently enter
a postal address into a form; maybe I'm a little prejudiced on this subject.
However, you might need to concatenate two or more columns into a single
entity to get a psuedo primary key (for querying purposes), to double-check
things. For example, if first and last names are two columns - you might
use FullName:[firstname] & " " & [lastname] in the query designer or '" &
[firstname] & ' ' & [lastname] & "' AS FullName in SQL.
Whatever you use, you then can use the results to flag the dupes for review
in one way or another, or bring up the dupe records to look at them.
The traditional dupe finder query looks be something like this:
SELECT * FROM mytable WHERE address=IN(SELECT address FROM mytable AS tmp
GROUP BY address HAVING Count(*)>1;);
This is a tricky little query to understand, from a beginner to intermediate
point of view. I didn't come up with it on my own - Access 95/97 used to
have a find duplicate records wizard. Deduping a set of records is such a
common chore that I soon had that criteria phrase memorized.
If you are using the query designer - you would put
=IN(SELECT address FROM mytable AS tmp GROUP BY address HAVING Count(*)>1;)
in the criteria box of the address column.
For those who don't quite get it (how this works) don't feel badly. I
struggled wth this one. Just had a mental block at first - I saw that it
worked, I memorized it character by character and used it. Later, when I
started using an occasional aggregating query and got stuck doing some
elaborate reports, I began to understand GROUP BY and then 'I got it'.
A quick overview for the slow learners like me.
Whatever you put inside the IN( ) (and we can have more than one item
separated by commas) will be compared to the left-hand side and return true
or false if we get a match or not.
5= IN(1,2,3) will be false, no match
5= IN(3,4,5) will be true, we have a match
To find our duplicate records we build a query inside the IN function that
will return only those values that occur more than once in the table.
In plain English what our query is saying, give us all the addresses (SELECT
address AS tmp) where we get more than one item (HAVING Count(*)>1) that is
the same when we group them together (GROUP BY).
This query inside the IN - returns a list similar to our IN example above.
When the query is running, each row's address column is being compared using
the IN function to an array of dupes only addresses (dupeaddress1,
dupeaddress2,dupeaddress3, etc)
Hope this helps...
Gordon
"Peter Hallett" <PeterHallett@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:049A1BFD-A581-488C-AE0B-D2636A98AFE7@xxxxxxxxxxxxxxxx
I have a dynamically created and amended table which occasionally contains
duplicate records. These duplicates must be allowed but need to be
flagged-up, when they occur, and marked in the table. Setting the No
Duplicates property is clearly inappropriate.
I can think of one or two ways of going about the job, including, perhaps
running two nearly identical queries, one with the unique records property
set and the other without, then comparing the results but the details of
subsequently identifying and marking the appropriate entries have started
to
suggest some rather prolix VBA and I have a suspicion that I have missed
something obvious or, at least, simpler. Has anyone got any thoughts?
--
Peter Hallett
- References:
- Re: Identifying Duplicate Records
- From: gllincoln
- Re: Identifying Duplicate Records
- Prev by Date: Re: Highlight a row
- Next by Date: Re: AddNew but I Don't See the New Record
- Previous by thread: Re: Identifying Duplicate Records
- Next by thread: Re: Identifying Duplicate Records
- Index(es):
Loading