Re: A Challenge

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



This got sent by accident, and was not complete (and many of the thoughts
were "re-thunk" ... sorry for the seeming double post and PLEASE -- NO
FLAMES FROM ANYONE, ACCIDENTS HAPPEN :-)


"William Benson" <wbenson1(SPAMSUCKS)@nycap.rr.com> wrote in message
news:OOE7rF3qFHA.1168@xxxxxxxxxxxxxxxxxxxxxxx
>I would use Access. Access is fast, and the table results are easy to work
>with. Just my prefereence I guess.Come up with a final table layout that
>will cover all the fields you might ever run into.You'll have to write some
>"stage" logic. Pull off prefixes in one stage, populating the "Prefix"
>field and dumping the rest into a "Remainder" field. Pull off Suffix1,
>Suffix2, Suffix3, ... as many as you think you might need for Sr., M.D.,
>PhD., etc. The logic to parse can be done through VBA/SQL.
>
>
> A sample including Dr. or M.D. from the Pisacano Leadership Foundation
> Drs. John and Katherine Miller
> Dr. and Mrs. Michael Needleman
> Dr. and Mrs. David Hutcheson-Tipton
> Lillian Gelberg, M.D. and Steven C. Spronz
> Patricia Glowa, M.D. and Donald Kollisch, M.D.
>
>
> based on processing logic, not you are not dealing with slow Excel files
> nor worrying how big the spreadsheets.
>
> I would run routines which "pre-process" the data run separate stages.
>
> For example, I would have in my ending table, fields like: First, Middle,
> Last, Prefix, Suffix1, Suffix2 (example of the different between suffix1
> & 2 would be Mr. James R. Smith Jr, PHD).
>
> Each intermediary result table would have fields you know are what they
> are, and a "AllTheRest" field. So, in Stage One, you might look to pull
> out all the prefixes, so intermediary table 1 will be 3 fields
>
> NameID Prefix AllTheRest.
>
> Write logic like If the full text includes both Mr. and Mrs., regardless
> of whether they are next to each other, the Prefix is converted to Mr. &
> Mrs.
>
> Then, look over ALL the prefixes you have been able to harvest, and
> specifically look
> Processing: I would have a table of known prefixes... and "pull out"
> prefixes, leave ."harvest" what I know, like all the Mr.
>
> and remove text (and fill fields)
> "mjones" <michele@xxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:1125188660.447356.150380@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> Hi,
>>
>> My client is expecting me to perform miracles. I thought I'd throw
>> this one out for those who like a challenge. Many large spreadsheets
>> (so large they have to be split up to get them into Excel) have a name
>> column. An example of the type of data might be:
>>
>> Mary Jane Fox
>> Marg F. Smith
>> Matt Del Sandro
>> Frank George Di Marco
>> Paul Joseph Rental/Storage Ltd.
>> Kim O'Neil
>> Don Jones-St. James
>> Andrew K.G. Black
>> Mr. and Mrs. Albert F. Casey
>> Brown
>> Sisters of the Holly Cross of 3rd Street
>> M Thomas
>> Christine J. Main & Ken Henry
>>
>> My client wants me to separate the last name into its own column so
>> they can run duplicate tests with other files against it and other
>> things. Remember, there could be 65,000 names.
>>
>> As you can imagine, this is very time consuming so anything will help.
>> I have been using space delimited to separate the fields and then doing
>> things like sorting or splitting off the first two characters and
>> sorting what's left to separate out the single initials and then
>> concatenating back again.
>>
>> I know this can't be an exact science, but I'm wondering if a macro can
>> be written to perform some of the work and perhaps stop and ask about
>> questionable names. For example,
>>
>> - single letters with or without periods would be first names
>> - a database of words could show companies and take the whole word as
>> last name like Ltd. or Co.
>> - a database of words like Di, St., Del, O', or Le could be defined as
>> part of the last name
>> - one word names are last names
>> - take the last of two names and forget the first one
>> - stuff like that
>>
>> When you use text to columns space delimited it puts the first word in
>> the first column and second in second column, etc. so depending upon
>> how many words are in the name, you never know which column the last
>> word will end up in. It would be nice to make it start in the last
>> column. We could use Access, too, if that would make any difference.
>>
>> I don't expect to receive much help on this, but you never know. I
>> find some people in this newsgroup have the most amazing ideas.
>>
>> Thanks,
>>
>> Michele
>>
>
>


.



Relevant Pages

  • Re: A Challenge
    ... I would run routines which "pre-process" the data run separate stages. ... the prefixes, so intermediary table 1 will be 3 fields ... prefixes, leave ."harvest" what I know, like all the Mr. ... > My client is expecting me to perform miracles. ...
    (microsoft.public.excel.programming)
  • Conditional Truncation of a Cell
    ... I have a list of 500 email addresses in excel. ... trying to figure out how to separate based on condition ... those prefixes containing a. ... Thx. ...
    (microsoft.public.excel.newusers)
  • Re: Using interfaces "everywhere" due to (EMF) modelling framework
    ... I'd hate it very much if a project had a thousand classes all called ... the redundant naming prefixes that some people use. ... class that implements Client. ...
    (comp.lang.java.programmer)