Re: Name parsing routine



On Thu, 4 May 2006 15:27:37 -0400, "Matt Williamson"
<ih8spam@xxxxxxxxxxx> wrote:

I'm looking for some input on the best way to accomplish this. I have a data
extract from Outlook that I'm parsing to go into a SQL table. I need to take
a Full name and put it into 6 fields. PrimaryFirst, PrimaryMiddle,
PrimaryLast, SecondaryFirst, SecondaryMiddle, SecondaryLast.

90% of the names fall into the following patterns:

FirstName Lastname
'Joe Anderson
FirstName MiddleName Lastname 'Joe
J. Anderson
Prefix FirstName MiddleName LastName 'Mr. Joe
J. Anderson
Prefix FirstName MiddleName LastName Suffix 'Mr. Joe
J. Anderson Jr.
FirstName and/& SecFirstName LastName 'Joe &
Jane Anderson

[etc]

This is one of those problems that can't really be solved with today's
technology, especially when data entry has been at all slapdash in the
past.

I've written stuff (not in VB) to attempt solutions for both names and
addresses - in both cases, just like you say, you've got a large
unformatted string (several for addresses) to deal with, and you need
to split the data into more precise elements.

Addresses I'll leave aside - here in the UK, they can be
cross-referenced against postal code data for validation, which helps
a lot.

Oh, and we did telephone numbers too, back when UK area codes all
changed. That was easy in comparison, since you've got rules to work
by, and you can pretty much guess that the letter O is an operator
error for the number 0, I for 1, etc.

If it's any help, what the bosses wanted was a way to format
free-entry name fields into title, first name, middle name, surname,
suffix - just like you describe - and it proved to be impossible,
especially trying to fit a single algorithm across the 250+ users'
databases we had to convert.

The end solution was to make a best guess, to print out the before and
after values in a structured report, and get the clients to distribute
copies of the relevant parts of the report among their users for
confirmation or written correction. Since the clients were firms of
lawyers, and the users themselves lawyers (for whom time is money),
this was an unpopular but necessary part of the solution.

Another consideration was case. Ideally, "Smith", "SMITH" and "smith"
should all come out the other end as "Smith", but what about names
like "DeGroot" (a Dutch surname), where the internal capital is very
important?

The best I could come up with was about 40% valid names overall, very
much depending on which clients had input their own clients' names
under which system, fine-tuning for each client and *always* making
sure the users could check the resultant data against the original,
even if the self-checking algorithm reported a valid conversion.

You've got a tough job on - but that depends a lot on how consistently
well the data's been entered in the first place.

I'll not even mention non-European naming conventions, where the
distinction between forename and surname can get extremely confusing
depending on the individual person's cultural heritage.
--
John Hatpin
.



Relevant Pages

  • Re: MS Access Reports and VB.NET Program
    ... think that I want to ship a 30mb .NET framework to each of my clients? ... Barns & Noble has some books that could help you learn .NET. ... It is Access TIMES eCommerce TIMES Sql Server. ... that you actually posted how to execute the report using a macro. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Summarizing records in one table that match a different table
    ... the report. ... using "Language" as the example (i had to add a field to my Clients test ... i created a report based on the query with the joins ...
    (microsoft.public.access.reports)
  • Re: Print Preview with EXPORT capability
    ... I would guess the txText control is what you want, ... It has rtf and html export and V7 came with a pdf component to export PDF. ... But I am trying to NOT have to have my clients acquire/install any version ... just to vew the report as a PDF. ...
    (microsoft.public.vb.general.discussion)
  • Re: Count records, but not dupes
    ... As you can imagine, it's hard for us to see exactly what you are doing, as we cannot see your report, its sections, or its source query and the tables that feed that. ... Allen Browne - Microsoft MVP. ... I may have 20 clients that have 25 invoices that sum to $x. ...
    (microsoft.public.access.reports)
  • Re: report showing all
    ... If a report returns blank, then go to the query behind the report. ... named Restaurants, and that form is open when the query runs, then... ... That worked wonderfully for my Clients. ...
    (microsoft.public.access.reports)