Duplicate detection (looking for code)

I would like to add code to detect for duplicates or records with similar 
information.
Looking to do this check on (firstname and lastname) and on address 
(street1, street2, city, state, zip)

But I want something a little more advanced than just checking for exact 
matches.

Wondering if anyone has some code they would care to share that might make 
my job of writing it a little easier?

Example:
Bob Smith and Bobby Smith would be detected as duplicates
Rob Jones and Robert Jones would be detected as duplicates
123 main street pittsburgh, pa 15126
123 main st pittsburgh pa 15230 might be detected as duplicates

Thanks in advance,
Mark 

0
Mark
3/18/2010 5:40:30 PM
access 16762 articles. 3 followers. Follow

9 Replies
1095 Views

Similar Articles

[PageSpeed] 30

You may not have received any responses yet because what you are proposing 
is not particularly simple.

.... and there are some potentially serious flaws with your analysis!

How do you expect Access to be able to correctly categorize "Bob Smith" as 
duplicating "Bobby Smith" when your database could legitimately contain two 
separate individuals with those names?

And what happens when you have two unique individuals, both named Lynne 
Johnson?  (there are two in my state, and they were both born on the same 
date!)

I suspect you'll have to create your own code that tells Access exactly when 
and how to consider two records to be close enough to be a match ... and you 
might want to consider them only as "potential" matches.

After all, can YOU be sure that all of the following are duplicates?:

    John Smith    12345 Elm St
    J. J. Smith     12345 Elm Street
    John J. Smith 12345 Elm St NW
    Johnny Smith 12354 Elm St.
    J. Smith         12345 Elm St

Regards

Jeff Boyce
Microsoft Access MVP

-- 
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.


"Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>I would like to add code to detect for duplicates or records with similar 
>information.
> Looking to do this check on (firstname and lastname) and on address 
> (street1, street2, city, state, zip)
>
> But I want something a little more advanced than just checking for exact 
> matches.
>
> Wondering if anyone has some code they would care to share that might make 
> my job of writing it a little easier?
>
> Example:
> Bob Smith and Bobby Smith would be detected as duplicates
> Rob Jones and Robert Jones would be detected as duplicates
> 123 main street pittsburgh, pa 15126
> 123 main st pittsburgh pa 15230 might be detected as duplicates
>
> Thanks in advance,
> Mark 


0
Jeff
3/18/2010 7:24:01 PM
The idea is:
- the user enters "Bobby Smith" and the program pops up a screen saying here 
are some similar contacts "just to make sure this is not a duplicate contact 
being entered".  Example: Bob Smith 123 Main Street Pittsburgh PA 15230
If they knew that the "Bobby Smith" they were entering lived on Main Street 
in Pittsburgh they would choose to quit entering data for this new contact 
because they know that they are entering a duplicate or alternately they 
could continue entering the contact

- similar type logic for addresses

Yes I should of used the wording "potential duplicates" in my post and yes 
it's not extremely simple thus the newsgroup post.
Don't you MVPs like a challenge once in a while?

First thoughts: I could do a simple comparison with 'like' or some sort of 
character by character comparison (if 90% of the characters match consider 
it a "potential duplicate").  I need some sort of "Similar" function.

I don't think there are any potentially serious flaws with my analysis, my 
analysis at this point is "hey this might be a little work, I wonder if 
anyone else has attempted this and would let me see their code".

Now is your chance to post the code you wrote to do these types of checks,
I'm sure others have tackled duplicate issues in various ways (some 
approaches better than others),
Thanks,
Mark

"Jeff Boyce" <nonsense@nonsense.com> wrote in message 
news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
> You may not have received any responses yet because what you are proposing 
> is not particularly simple.
>
> ... and there are some potentially serious flaws with your analysis!
>
> How do you expect Access to be able to correctly categorize "Bob Smith" as 
> duplicating "Bobby Smith" when your database could legitimately contain 
> two separate individuals with those names?
>
> And what happens when you have two unique individuals, both named Lynne 
> Johnson?  (there are two in my state, and they were both born on the same 
> date!)
>
> I suspect you'll have to create your own code that tells Access exactly 
> when and how to consider two records to be close enough to be a match ... 
> and you might want to consider them only as "potential" matches.
>
> After all, can YOU be sure that all of the following are duplicates?:
>
>    John Smith    12345 Elm St
>    J. J. Smith     12345 Elm Street
>    John J. Smith 12345 Elm St NW
>    Johnny Smith 12354 Elm St.
>    J. Smith         12345 Elm St
>
> Regards
>
> Jeff Boyce
> Microsoft Access MVP
>
> -- 
> Disclaimer: This author may have received products and services mentioned
> in this post. Mention and/or description of a product or service herein
> does not constitute endorsement thereof.
>
> Any code or pseudocode included in this post is offered "as is", with no
> guarantee as to suitability.
>
> You can thank the FTC of the USA for making this disclaimer
> possible/necessary.
>
>
> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
> news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>I would like to add code to detect for duplicates or records with similar 
>>information.
>> Looking to do this check on (firstname and lastname) and on address 
>> (street1, street2, city, state, zip)
>>
>> But I want something a little more advanced than just checking for exact 
>> matches.
>>
>> Wondering if anyone has some code they would care to share that might 
>> make my job of writing it a little easier?
>>
>> Example:
>> Bob Smith and Bobby Smith would be detected as duplicates
>> Rob Jones and Robert Jones would be detected as duplicates
>> 123 main street pittsburgh, pa 15126
>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>
>> Thanks in advance,
>> Mark
>
> 
0
Mark
3/18/2010 7:55:34 PM
Mark

Just for the record, the folks who read and write here in the newsgroups are 
not all MVPs ... and some of the best answers I've seen come from folks who 
aren't.  Don't limit your audience...

Most of the approaches I've seen that work for this involve USB (using 
someone's brain).  It sounds like that's part of your approach, too.

Do a search on "Soundex".  This is an algorithm that uses how words (e.g., 
names) sound to compare them.  Words with similar soundex scores sounds 
similar.  This could help with last names and street names, but I don't see 
it helping with Bobby vs. Robert, or with all the embellishments that 
addresses have.  Again, you'd need to tell Access that Bobby and Robert are 
(sometimes) synonymous.

One approach might be to sort all entries by "lastname, firstname - delivery 
address" (as a concatenated field) and USB to break the ties.

Another approach might be to use the built-in autocomplete feature in Access 
comboboxes on a form.

Have your user start typing a lastname and have Access jump to the 
"lastname, firstname - delivery address"es that start that way.

You don't mention whether you're working with a couple hundreds entries, a 
couple thousand, or a couple hundred thousand.  The approach you take may 
need to differ, depending on volume.

Good luck!

Regards

Jeff Boyce
Microsoft Access MVP

"Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
news:OYbK5TtxKHA.404@TK2MSFTNGP02.phx.gbl...
> The idea is:
> - the user enters "Bobby Smith" and the program pops up a screen saying 
> here are some similar contacts "just to make sure this is not a duplicate 
> contact being entered".  Example: Bob Smith 123 Main Street Pittsburgh PA 
> 15230
> If they knew that the "Bobby Smith" they were entering lived on Main 
> Street in Pittsburgh they would choose to quit entering data for this new 
> contact because they know that they are entering a duplicate or 
> alternately they could continue entering the contact
>
> - similar type logic for addresses
>
> Yes I should of used the wording "potential duplicates" in my post and yes 
> it's not extremely simple thus the newsgroup post.
> Don't you MVPs like a challenge once in a while?
>
> First thoughts: I could do a simple comparison with 'like' or some sort of 
> character by character comparison (if 90% of the characters match consider 
> it a "potential duplicate").  I need some sort of "Similar" function.
>
> I don't think there are any potentially serious flaws with my analysis, my 
> analysis at this point is "hey this might be a little work, I wonder if 
> anyone else has attempted this and would let me see their code".
>
> Now is your chance to post the code you wrote to do these types of checks,
> I'm sure others have tackled duplicate issues in various ways (some 
> approaches better than others),
> Thanks,
> Mark
>
> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
> news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
>> You may not have received any responses yet because what you are 
>> proposing is not particularly simple.
>>
>> ... and there are some potentially serious flaws with your analysis!
>>
>> How do you expect Access to be able to correctly categorize "Bob Smith" 
>> as duplicating "Bobby Smith" when your database could legitimately 
>> contain two separate individuals with those names?
>>
>> And what happens when you have two unique individuals, both named Lynne 
>> Johnson?  (there are two in my state, and they were both born on the same 
>> date!)
>>
>> I suspect you'll have to create your own code that tells Access exactly 
>> when and how to consider two records to be close enough to be a match ... 
>> and you might want to consider them only as "potential" matches.
>>
>> After all, can YOU be sure that all of the following are duplicates?:
>>
>>    John Smith    12345 Elm St
>>    J. J. Smith     12345 Elm Street
>>    John J. Smith 12345 Elm St NW
>>    Johnny Smith 12354 Elm St.
>>    J. Smith         12345 Elm St
>>
>> Regards
>>
>> Jeff Boyce
>> Microsoft Access MVP
>>
>> -- 
>> Disclaimer: This author may have received products and services mentioned
>> in this post. Mention and/or description of a product or service herein
>> does not constitute endorsement thereof.
>>
>> Any code or pseudocode included in this post is offered "as is", with no
>> guarantee as to suitability.
>>
>> You can thank the FTC of the USA for making this disclaimer
>> possible/necessary.
>>
>>
>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>> news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>>I would like to add code to detect for duplicates or records with similar 
>>>information.
>>> Looking to do this check on (firstname and lastname) and on address 
>>> (street1, street2, city, state, zip)
>>>
>>> But I want something a little more advanced than just checking for exact 
>>> matches.
>>>
>>> Wondering if anyone has some code they would care to share that might 
>>> make my job of writing it a little easier?
>>>
>>> Example:
>>> Bob Smith and Bobby Smith would be detected as duplicates
>>> Rob Jones and Robert Jones would be detected as duplicates
>>> 123 main street pittsburgh, pa 15126
>>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>>
>>> Thanks in advance,
>>> Mark
>>
>> 


0
Jeff
3/18/2010 8:11:27 PM
Jeff,

Ok searching on Soundex I think will help me dig up some code or flush out 
the best approach a little better.

If anyone has done this would love to see your approach!

Code I'm writing will be installed at multiple companies, most will have 
5000 or less contact records.
The method should be designed to work well for 10,000 contacts.

Thanks,
Mark

"Jeff Boyce" <nonsense@nonsense.com> wrote in message 
news:#Y#zxctxKHA.5776@TK2MSFTNGP06.phx.gbl...
> Mark
>
> Just for the record, the folks who read and write here in the newsgroups 
> are not all MVPs ... and some of the best answers I've seen come from 
> folks who aren't.  Don't limit your audience...
>
> Most of the approaches I've seen that work for this involve USB (using 
> someone's brain).  It sounds like that's part of your approach, too.
>
> Do a search on "Soundex".  This is an algorithm that uses how words (e.g., 
> names) sound to compare them.  Words with similar soundex scores sounds 
> similar.  This could help with last names and street names, but I don't 
> see it helping with Bobby vs. Robert, or with all the embellishments that 
> addresses have.  Again, you'd need to tell Access that Bobby and Robert 
> are (sometimes) synonymous.
>
> One approach might be to sort all entries by "lastname, firstname - 
> delivery address" (as a concatenated field) and USB to break the ties.
>
> Another approach might be to use the built-in autocomplete feature in 
> Access comboboxes on a form.
>
> Have your user start typing a lastname and have Access jump to the 
> "lastname, firstname - delivery address"es that start that way.
>
> You don't mention whether you're working with a couple hundreds entries, a 
> couple thousand, or a couple hundred thousand.  The approach you take may 
> need to differ, depending on volume.
>
> Good luck!
>
> Regards
>
> Jeff Boyce
> Microsoft Access MVP
>
> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
> news:OYbK5TtxKHA.404@TK2MSFTNGP02.phx.gbl...
>> The idea is:
>> - the user enters "Bobby Smith" and the program pops up a screen saying 
>> here are some similar contacts "just to make sure this is not a duplicate 
>> contact being entered".  Example: Bob Smith 123 Main Street Pittsburgh PA 
>> 15230
>> If they knew that the "Bobby Smith" they were entering lived on Main 
>> Street in Pittsburgh they would choose to quit entering data for this new 
>> contact because they know that they are entering a duplicate or 
>> alternately they could continue entering the contact
>>
>> - similar type logic for addresses
>>
>> Yes I should of used the wording "potential duplicates" in my post and 
>> yes it's not extremely simple thus the newsgroup post.
>> Don't you MVPs like a challenge once in a while?
>>
>> First thoughts: I could do a simple comparison with 'like' or some sort 
>> of character by character comparison (if 90% of the characters match 
>> consider it a "potential duplicate").  I need some sort of "Similar" 
>> function.
>>
>> I don't think there are any potentially serious flaws with my analysis, 
>> my analysis at this point is "hey this might be a little work, I wonder 
>> if anyone else has attempted this and would let me see their code".
>>
>> Now is your chance to post the code you wrote to do these types of 
>> checks,
>> I'm sure others have tackled duplicate issues in various ways (some 
>> approaches better than others),
>> Thanks,
>> Mark
>>
>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>> news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
>>> You may not have received any responses yet because what you are 
>>> proposing is not particularly simple.
>>>
>>> ... and there are some potentially serious flaws with your analysis!
>>>
>>> How do you expect Access to be able to correctly categorize "Bob Smith" 
>>> as duplicating "Bobby Smith" when your database could legitimately 
>>> contain two separate individuals with those names?
>>>
>>> And what happens when you have two unique individuals, both named Lynne 
>>> Johnson?  (there are two in my state, and they were both born on the 
>>> same date!)
>>>
>>> I suspect you'll have to create your own code that tells Access exactly 
>>> when and how to consider two records to be close enough to be a match 
>>> ... and you might want to consider them only as "potential" matches.
>>>
>>> After all, can YOU be sure that all of the following are duplicates?:
>>>
>>>    John Smith    12345 Elm St
>>>    J. J. Smith     12345 Elm Street
>>>    John J. Smith 12345 Elm St NW
>>>    Johnny Smith 12354 Elm St.
>>>    J. Smith         12345 Elm St
>>>
>>> Regards
>>>
>>> Jeff Boyce
>>> Microsoft Access MVP
>>>
>>> -- 
>>> Disclaimer: This author may have received products and services 
>>> mentioned
>>> in this post. Mention and/or description of a product or service herein
>>> does not constitute endorsement thereof.
>>>
>>> Any code or pseudocode included in this post is offered "as is", with no
>>> guarantee as to suitability.
>>>
>>> You can thank the FTC of the USA for making this disclaimer
>>> possible/necessary.
>>>
>>>
>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>> news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>>>I would like to add code to detect for duplicates or records with 
>>>>similar information.
>>>> Looking to do this check on (firstname and lastname) and on address 
>>>> (street1, street2, city, state, zip)
>>>>
>>>> But I want something a little more advanced than just checking for 
>>>> exact matches.
>>>>
>>>> Wondering if anyone has some code they would care to share that might 
>>>> make my job of writing it a little easier?
>>>>
>>>> Example:
>>>> Bob Smith and Bobby Smith would be detected as duplicates
>>>> Rob Jones and Robert Jones would be detected as duplicates
>>>> 123 main street pittsburgh, pa 15126
>>>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>>>
>>>> Thanks in advance,
>>>> Mark
>>>
>>>
>
> 
0
Mark
3/18/2010 8:38:43 PM
Mark

Check on-line for Allen Browne's website.  He has a routine that helps limit 
the number of records a combobox has to pull down by "waiting" for the first 
"n" letters to get entered.  This would be useful if you followed the 
combobox and concatenation route.

Good luck!

Regards

Jeff Boyce
Microsoft Access MVP

-- 
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.

"Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
news:esPHAstxKHA.5940@TK2MSFTNGP02.phx.gbl...
> Jeff,
>
> Ok searching on Soundex I think will help me dig up some code or flush out 
> the best approach a little better.
>
> If anyone has done this would love to see your approach!
>
> Code I'm writing will be installed at multiple companies, most will have 
> 5000 or less contact records.
> The method should be designed to work well for 10,000 contacts.
>
> Thanks,
> Mark
>
> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
> news:#Y#zxctxKHA.5776@TK2MSFTNGP06.phx.gbl...
>> Mark
>>
>> Just for the record, the folks who read and write here in the newsgroups 
>> are not all MVPs ... and some of the best answers I've seen come from 
>> folks who aren't.  Don't limit your audience...
>>
>> Most of the approaches I've seen that work for this involve USB (using 
>> someone's brain).  It sounds like that's part of your approach, too.
>>
>> Do a search on "Soundex".  This is an algorithm that uses how words 
>> (e.g., names) sound to compare them.  Words with similar soundex scores 
>> sounds similar.  This could help with last names and street names, but I 
>> don't see it helping with Bobby vs. Robert, or with all the 
>> embellishments that addresses have.  Again, you'd need to tell Access 
>> that Bobby and Robert are (sometimes) synonymous.
>>
>> One approach might be to sort all entries by "lastname, firstname - 
>> delivery address" (as a concatenated field) and USB to break the ties.
>>
>> Another approach might be to use the built-in autocomplete feature in 
>> Access comboboxes on a form.
>>
>> Have your user start typing a lastname and have Access jump to the 
>> "lastname, firstname - delivery address"es that start that way.
>>
>> You don't mention whether you're working with a couple hundreds entries, 
>> a couple thousand, or a couple hundred thousand.  The approach you take 
>> may need to differ, depending on volume.
>>
>> Good luck!
>>
>> Regards
>>
>> Jeff Boyce
>> Microsoft Access MVP
>>
>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>> news:OYbK5TtxKHA.404@TK2MSFTNGP02.phx.gbl...
>>> The idea is:
>>> - the user enters "Bobby Smith" and the program pops up a screen saying 
>>> here are some similar contacts "just to make sure this is not a 
>>> duplicate contact being entered".  Example: Bob Smith 123 Main Street 
>>> Pittsburgh PA 15230
>>> If they knew that the "Bobby Smith" they were entering lived on Main 
>>> Street in Pittsburgh they would choose to quit entering data for this 
>>> new contact because they know that they are entering a duplicate or 
>>> alternately they could continue entering the contact
>>>
>>> - similar type logic for addresses
>>>
>>> Yes I should of used the wording "potential duplicates" in my post and 
>>> yes it's not extremely simple thus the newsgroup post.
>>> Don't you MVPs like a challenge once in a while?
>>>
>>> First thoughts: I could do a simple comparison with 'like' or some sort 
>>> of character by character comparison (if 90% of the characters match 
>>> consider it a "potential duplicate").  I need some sort of "Similar" 
>>> function.
>>>
>>> I don't think there are any potentially serious flaws with my analysis, 
>>> my analysis at this point is "hey this might be a little work, I wonder 
>>> if anyone else has attempted this and would let me see their code".
>>>
>>> Now is your chance to post the code you wrote to do these types of 
>>> checks,
>>> I'm sure others have tackled duplicate issues in various ways (some 
>>> approaches better than others),
>>> Thanks,
>>> Mark
>>>
>>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>>> news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
>>>> You may not have received any responses yet because what you are 
>>>> proposing is not particularly simple.
>>>>
>>>> ... and there are some potentially serious flaws with your analysis!
>>>>
>>>> How do you expect Access to be able to correctly categorize "Bob Smith" 
>>>> as duplicating "Bobby Smith" when your database could legitimately 
>>>> contain two separate individuals with those names?
>>>>
>>>> And what happens when you have two unique individuals, both named Lynne 
>>>> Johnson?  (there are two in my state, and they were both born on the 
>>>> same date!)
>>>>
>>>> I suspect you'll have to create your own code that tells Access exactly 
>>>> when and how to consider two records to be close enough to be a match 
>>>> ... and you might want to consider them only as "potential" matches.
>>>>
>>>> After all, can YOU be sure that all of the following are duplicates?:
>>>>
>>>>    John Smith    12345 Elm St
>>>>    J. J. Smith     12345 Elm Street
>>>>    John J. Smith 12345 Elm St NW
>>>>    Johnny Smith 12354 Elm St.
>>>>    J. Smith         12345 Elm St
>>>>
>>>> Regards
>>>>
>>>> Jeff Boyce
>>>> Microsoft Access MVP
>>>>
>>>> -- 
>>>> Disclaimer: This author may have received products and services 
>>>> mentioned
>>>> in this post. Mention and/or description of a product or service herein
>>>> does not constitute endorsement thereof.
>>>>
>>>> Any code or pseudocode included in this post is offered "as is", with 
>>>> no
>>>> guarantee as to suitability.
>>>>
>>>> You can thank the FTC of the USA for making this disclaimer
>>>> possible/necessary.
>>>>
>>>>
>>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>>> news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>>>>I would like to add code to detect for duplicates or records with 
>>>>>similar information.
>>>>> Looking to do this check on (firstname and lastname) and on address 
>>>>> (street1, street2, city, state, zip)
>>>>>
>>>>> But I want something a little more advanced than just checking for 
>>>>> exact matches.
>>>>>
>>>>> Wondering if anyone has some code they would care to share that might 
>>>>> make my job of writing it a little easier?
>>>>>
>>>>> Example:
>>>>> Bob Smith and Bobby Smith would be detected as duplicates
>>>>> Rob Jones and Robert Jones would be detected as duplicates
>>>>> 123 main street pittsburgh, pa 15126
>>>>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>>>>
>>>>> Thanks in advance,
>>>>> Mark
>>>>
>>>>
>>
>> 


0
Jeff
3/18/2010 8:49:44 PM
I searched a little, my best lead right now is the code I found at:
http://www.kdkeys.net/forums/thread/6450.aspx

You do need to sign up to the forum to download it.

It's an MDE but it looks like he included the source code for most of the 
fuzzy logic search algorithms.

algorithms included
- Levenshtein Edit Distance
- Dice Coefficient
- Longest Common Subsequence
- Double Metaphone

I also read soundex2 is good (soundex is a little too general).

I am by no means an expert but I did do a little searching and if you want 
to do fuzzy matching of some
 sort I guess you need to jump into this stuff.  Perhaps even store some 
algorithm results so at runtime
you can compare faster to the thousands of records you have in the db.

Maybe this will help someone else out?
Mark



"Jeff Boyce" <nonsense@nonsense.com> wrote in message 
news:eJQqKytxKHA.4240@TK2MSFTNGP06.phx.gbl...
> Mark
>
> Check on-line for Allen Browne's website.  He has a routine that helps 
> limit the number of records a combobox has to pull down by "waiting" for 
> the first "n" letters to get entered.  This would be useful if you 
> followed the combobox and concatenation route.
>
> Good luck!
>
> Regards
>
> Jeff Boyce
> Microsoft Access MVP
>
> -- 
> Disclaimer: This author may have received products and services mentioned
> in this post. Mention and/or description of a product or service herein
> does not constitute endorsement thereof.
>
> Any code or pseudocode included in this post is offered "as is", with no
> guarantee as to suitability.
>
> You can thank the FTC of the USA for making this disclaimer
> possible/necessary.
>
> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
> news:esPHAstxKHA.5940@TK2MSFTNGP02.phx.gbl...
>> Jeff,
>>
>> Ok searching on Soundex I think will help me dig up some code or flush 
>> out the best approach a little better.
>>
>> If anyone has done this would love to see your approach!
>>
>> Code I'm writing will be installed at multiple companies, most will have 
>> 5000 or less contact records.
>> The method should be designed to work well for 10,000 contacts.
>>
>> Thanks,
>> Mark
>>
>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>> news:#Y#zxctxKHA.5776@TK2MSFTNGP06.phx.gbl...
>>> Mark
>>>
>>> Just for the record, the folks who read and write here in the newsgroups 
>>> are not all MVPs ... and some of the best answers I've seen come from 
>>> folks who aren't.  Don't limit your audience...
>>>
>>> Most of the approaches I've seen that work for this involve USB (using 
>>> someone's brain).  It sounds like that's part of your approach, too.
>>>
>>> Do a search on "Soundex".  This is an algorithm that uses how words 
>>> (e.g., names) sound to compare them.  Words with similar soundex scores 
>>> sounds similar.  This could help with last names and street names, but I 
>>> don't see it helping with Bobby vs. Robert, or with all the 
>>> embellishments that addresses have.  Again, you'd need to tell Access 
>>> that Bobby and Robert are (sometimes) synonymous.
>>>
>>> One approach might be to sort all entries by "lastname, firstname - 
>>> delivery address" (as a concatenated field) and USB to break the ties.
>>>
>>> Another approach might be to use the built-in autocomplete feature in 
>>> Access comboboxes on a form.
>>>
>>> Have your user start typing a lastname and have Access jump to the 
>>> "lastname, firstname - delivery address"es that start that way.
>>>
>>> You don't mention whether you're working with a couple hundreds entries, 
>>> a couple thousand, or a couple hundred thousand.  The approach you take 
>>> may need to differ, depending on volume.
>>>
>>> Good luck!
>>>
>>> Regards
>>>
>>> Jeff Boyce
>>> Microsoft Access MVP
>>>
>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>> news:OYbK5TtxKHA.404@TK2MSFTNGP02.phx.gbl...
>>>> The idea is:
>>>> - the user enters "Bobby Smith" and the program pops up a screen saying 
>>>> here are some similar contacts "just to make sure this is not a 
>>>> duplicate contact being entered".  Example: Bob Smith 123 Main Street 
>>>> Pittsburgh PA 15230
>>>> If they knew that the "Bobby Smith" they were entering lived on Main 
>>>> Street in Pittsburgh they would choose to quit entering data for this 
>>>> new contact because they know that they are entering a duplicate or 
>>>> alternately they could continue entering the contact
>>>>
>>>> - similar type logic for addresses
>>>>
>>>> Yes I should of used the wording "potential duplicates" in my post and 
>>>> yes it's not extremely simple thus the newsgroup post.
>>>> Don't you MVPs like a challenge once in a while?
>>>>
>>>> First thoughts: I could do a simple comparison with 'like' or some sort 
>>>> of character by character comparison (if 90% of the characters match 
>>>> consider it a "potential duplicate").  I need some sort of "Similar" 
>>>> function.
>>>>
>>>> I don't think there are any potentially serious flaws with my analysis, 
>>>> my analysis at this point is "hey this might be a little work, I wonder 
>>>> if anyone else has attempted this and would let me see their code".
>>>>
>>>> Now is your chance to post the code you wrote to do these types of 
>>>> checks,
>>>> I'm sure others have tackled duplicate issues in various ways (some 
>>>> approaches better than others),
>>>> Thanks,
>>>> Mark
>>>>
>>>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>>>> news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
>>>>> You may not have received any responses yet because what you are 
>>>>> proposing is not particularly simple.
>>>>>
>>>>> ... and there are some potentially serious flaws with your analysis!
>>>>>
>>>>> How do you expect Access to be able to correctly categorize "Bob 
>>>>> Smith" as duplicating "Bobby Smith" when your database could 
>>>>> legitimately contain two separate individuals with those names?
>>>>>
>>>>> And what happens when you have two unique individuals, both named 
>>>>> Lynne Johnson?  (there are two in my state, and they were both born on 
>>>>> the same date!)
>>>>>
>>>>> I suspect you'll have to create your own code that tells Access 
>>>>> exactly when and how to consider two records to be close enough to be 
>>>>> a match ... and you might want to consider them only as "potential" 
>>>>> matches.
>>>>>
>>>>> After all, can YOU be sure that all of the following are duplicates?:
>>>>>
>>>>>    John Smith    12345 Elm St
>>>>>    J. J. Smith     12345 Elm Street
>>>>>    John J. Smith 12345 Elm St NW
>>>>>    Johnny Smith 12354 Elm St.
>>>>>    J. Smith         12345 Elm St
>>>>>
>>>>> Regards
>>>>>
>>>>> Jeff Boyce
>>>>> Microsoft Access MVP
>>>>>
>>>>> -- 
>>>>> Disclaimer: This author may have received products and services 
>>>>> mentioned
>>>>> in this post. Mention and/or description of a product or service 
>>>>> herein
>>>>> does not constitute endorsement thereof.
>>>>>
>>>>> Any code or pseudocode included in this post is offered "as is", with 
>>>>> no
>>>>> guarantee as to suitability.
>>>>>
>>>>> You can thank the FTC of the USA for making this disclaimer
>>>>> possible/necessary.
>>>>>
>>>>>
>>>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>>>> news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>>>>>I would like to add code to detect for duplicates or records with 
>>>>>>similar information.
>>>>>> Looking to do this check on (firstname and lastname) and on address 
>>>>>> (street1, street2, city, state, zip)
>>>>>>
>>>>>> But I want something a little more advanced than just checking for 
>>>>>> exact matches.
>>>>>>
>>>>>> Wondering if anyone has some code they would care to share that might 
>>>>>> make my job of writing it a little easier?
>>>>>>
>>>>>> Example:
>>>>>> Bob Smith and Bobby Smith would be detected as duplicates
>>>>>> Rob Jones and Robert Jones would be detected as duplicates
>>>>>> 123 main street pittsburgh, pa 15126
>>>>>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Mark
>>>>>
>>>>>
>>>
>>>
>
> 
0
Mark
3/18/2010 9:49:39 PM
Thanks for posting back what you found.  That will undoubtedly help someone 
in their (future) search.

Be aware that "thousands" of records in a combobox leads to poor response 
time.  Allen B's approach speeds that up considerably.

Good luck!

Regards

Jeff Boyce
Microsoft Access MVP

-- 
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.

"Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
news:OUkvoTuxKHA.2552@TK2MSFTNGP04.phx.gbl...
>I searched a little, my best lead right now is the code I found at:
> http://www.kdkeys.net/forums/thread/6450.aspx
>
> You do need to sign up to the forum to download it.
>
> It's an MDE but it looks like he included the source code for most of the 
> fuzzy logic search algorithms.
>
> algorithms included
> - Levenshtein Edit Distance
> - Dice Coefficient
> - Longest Common Subsequence
> - Double Metaphone
>
> I also read soundex2 is good (soundex is a little too general).
>
> I am by no means an expert but I did do a little searching and if you want 
> to do fuzzy matching of some
> sort I guess you need to jump into this stuff.  Perhaps even store some 
> algorithm results so at runtime
> you can compare faster to the thousands of records you have in the db.
>
> Maybe this will help someone else out?
> Mark
>
>
>
> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
> news:eJQqKytxKHA.4240@TK2MSFTNGP06.phx.gbl...
>> Mark
>>
>> Check on-line for Allen Browne's website.  He has a routine that helps 
>> limit the number of records a combobox has to pull down by "waiting" for 
>> the first "n" letters to get entered.  This would be useful if you 
>> followed the combobox and concatenation route.
>>
>> Good luck!
>>
>> Regards
>>
>> Jeff Boyce
>> Microsoft Access MVP
>>
>> -- 
>> Disclaimer: This author may have received products and services mentioned
>> in this post. Mention and/or description of a product or service herein
>> does not constitute endorsement thereof.
>>
>> Any code or pseudocode included in this post is offered "as is", with no
>> guarantee as to suitability.
>>
>> You can thank the FTC of the USA for making this disclaimer
>> possible/necessary.
>>
>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>> news:esPHAstxKHA.5940@TK2MSFTNGP02.phx.gbl...
>>> Jeff,
>>>
>>> Ok searching on Soundex I think will help me dig up some code or flush 
>>> out the best approach a little better.
>>>
>>> If anyone has done this would love to see your approach!
>>>
>>> Code I'm writing will be installed at multiple companies, most will have 
>>> 5000 or less contact records.
>>> The method should be designed to work well for 10,000 contacts.
>>>
>>> Thanks,
>>> Mark
>>>
>>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>>> news:#Y#zxctxKHA.5776@TK2MSFTNGP06.phx.gbl...
>>>> Mark
>>>>
>>>> Just for the record, the folks who read and write here in the 
>>>> newsgroups are not all MVPs ... and some of the best answers I've seen 
>>>> come from folks who aren't.  Don't limit your audience...
>>>>
>>>> Most of the approaches I've seen that work for this involve USB (using 
>>>> someone's brain).  It sounds like that's part of your approach, too.
>>>>
>>>> Do a search on "Soundex".  This is an algorithm that uses how words 
>>>> (e.g., names) sound to compare them.  Words with similar soundex scores 
>>>> sounds similar.  This could help with last names and street names, but 
>>>> I don't see it helping with Bobby vs. Robert, or with all the 
>>>> embellishments that addresses have.  Again, you'd need to tell Access 
>>>> that Bobby and Robert are (sometimes) synonymous.
>>>>
>>>> One approach might be to sort all entries by "lastname, firstname - 
>>>> delivery address" (as a concatenated field) and USB to break the ties.
>>>>
>>>> Another approach might be to use the built-in autocomplete feature in 
>>>> Access comboboxes on a form.
>>>>
>>>> Have your user start typing a lastname and have Access jump to the 
>>>> "lastname, firstname - delivery address"es that start that way.
>>>>
>>>> You don't mention whether you're working with a couple hundreds 
>>>> entries, a couple thousand, or a couple hundred thousand.  The approach 
>>>> you take may need to differ, depending on volume.
>>>>
>>>> Good luck!
>>>>
>>>> Regards
>>>>
>>>> Jeff Boyce
>>>> Microsoft Access MVP
>>>>
>>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>>> news:OYbK5TtxKHA.404@TK2MSFTNGP02.phx.gbl...
>>>>> The idea is:
>>>>> - the user enters "Bobby Smith" and the program pops up a screen 
>>>>> saying here are some similar contacts "just to make sure this is not a 
>>>>> duplicate contact being entered".  Example: Bob Smith 123 Main Street 
>>>>> Pittsburgh PA 15230
>>>>> If they knew that the "Bobby Smith" they were entering lived on Main 
>>>>> Street in Pittsburgh they would choose to quit entering data for this 
>>>>> new contact because they know that they are entering a duplicate or 
>>>>> alternately they could continue entering the contact
>>>>>
>>>>> - similar type logic for addresses
>>>>>
>>>>> Yes I should of used the wording "potential duplicates" in my post and 
>>>>> yes it's not extremely simple thus the newsgroup post.
>>>>> Don't you MVPs like a challenge once in a while?
>>>>>
>>>>> First thoughts: I could do a simple comparison with 'like' or some 
>>>>> sort of character by character comparison (if 90% of the characters 
>>>>> match consider it a "potential duplicate").  I need some sort of 
>>>>> "Similar" function.
>>>>>
>>>>> I don't think there are any potentially serious flaws with my 
>>>>> analysis, my analysis at this point is "hey this might be a little 
>>>>> work, I wonder if anyone else has attempted this and would let me see 
>>>>> their code".
>>>>>
>>>>> Now is your chance to post the code you wrote to do these types of 
>>>>> checks,
>>>>> I'm sure others have tackled duplicate issues in various ways (some 
>>>>> approaches better than others),
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>>>>> news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
>>>>>> You may not have received any responses yet because what you are 
>>>>>> proposing is not particularly simple.
>>>>>>
>>>>>> ... and there are some potentially serious flaws with your analysis!
>>>>>>
>>>>>> How do you expect Access to be able to correctly categorize "Bob 
>>>>>> Smith" as duplicating "Bobby Smith" when your database could 
>>>>>> legitimately contain two separate individuals with those names?
>>>>>>
>>>>>> And what happens when you have two unique individuals, both named 
>>>>>> Lynne Johnson?  (there are two in my state, and they were both born 
>>>>>> on the same date!)
>>>>>>
>>>>>> I suspect you'll have to create your own code that tells Access 
>>>>>> exactly when and how to consider two records to be close enough to be 
>>>>>> a match ... and you might want to consider them only as "potential" 
>>>>>> matches.
>>>>>>
>>>>>> After all, can YOU be sure that all of the following are duplicates?:
>>>>>>
>>>>>>    John Smith    12345 Elm St
>>>>>>    J. J. Smith     12345 Elm Street
>>>>>>    John J. Smith 12345 Elm St NW
>>>>>>    Johnny Smith 12354 Elm St.
>>>>>>    J. Smith         12345 Elm St
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Jeff Boyce
>>>>>> Microsoft Access MVP
>>>>>>
>>>>>> -- 
>>>>>> Disclaimer: This author may have received products and services 
>>>>>> mentioned
>>>>>> in this post. Mention and/or description of a product or service 
>>>>>> herein
>>>>>> does not constitute endorsement thereof.
>>>>>>
>>>>>> Any code or pseudocode included in this post is offered "as is", with 
>>>>>> no
>>>>>> guarantee as to suitability.
>>>>>>
>>>>>> You can thank the FTC of the USA for making this disclaimer
>>>>>> possible/necessary.
>>>>>>
>>>>>>
>>>>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in 
>>>>>> message news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>>>>>>I would like to add code to detect for duplicates or records with 
>>>>>>>similar information.
>>>>>>> Looking to do this check on (firstname and lastname) and on address 
>>>>>>> (street1, street2, city, state, zip)
>>>>>>>
>>>>>>> But I want something a little more advanced than just checking for 
>>>>>>> exact matches.
>>>>>>>
>>>>>>> Wondering if anyone has some code they would care to share that 
>>>>>>> might make my job of writing it a little easier?
>>>>>>>
>>>>>>> Example:
>>>>>>> Bob Smith and Bobby Smith would be detected as duplicates
>>>>>>> Rob Jones and Robert Jones would be detected as duplicates
>>>>>>> 123 main street pittsburgh, pa 15126
>>>>>>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>> Mark
>>>>>>
>>>>>>
>>>>
>>>>
>>
>> 


0
Jeff
3/18/2010 10:20:28 PM
No problem.  Hope it ends up working.  The word Soundex got me started down 
the right path.
"Fuzzy logic vba" is also a good search keyword.

"Jeff Boyce" <nonsense@nonsense.com> wrote in message 
news:ORjS4kuxKHA.1796@TK2MSFTNGP02.phx.gbl...
> Thanks for posting back what you found.  That will undoubtedly help 
> someone in their (future) search.
>
> Be aware that "thousands" of records in a combobox leads to poor response 
> time.  Allen B's approach speeds that up considerably.
>
> Good luck!
>
> Regards
>
> Jeff Boyce
> Microsoft Access MVP
>
> -- 
> Disclaimer: This author may have received products and services mentioned
> in this post. Mention and/or description of a product or service herein
> does not constitute endorsement thereof.
>
> Any code or pseudocode included in this post is offered "as is", with no
> guarantee as to suitability.
>
> You can thank the FTC of the USA for making this disclaimer
> possible/necessary.
>
> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
> news:OUkvoTuxKHA.2552@TK2MSFTNGP04.phx.gbl...
>>I searched a little, my best lead right now is the code I found at:
>> http://www.kdkeys.net/forums/thread/6450.aspx
>>
>> You do need to sign up to the forum to download it.
>>
>> It's an MDE but it looks like he included the source code for most of the 
>> fuzzy logic search algorithms.
>>
>> algorithms included
>> - Levenshtein Edit Distance
>> - Dice Coefficient
>> - Longest Common Subsequence
>> - Double Metaphone
>>
>> I also read soundex2 is good (soundex is a little too general).
>>
>> I am by no means an expert but I did do a little searching and if you 
>> want to do fuzzy matching of some
>> sort I guess you need to jump into this stuff.  Perhaps even store some 
>> algorithm results so at runtime
>> you can compare faster to the thousands of records you have in the db.
>>
>> Maybe this will help someone else out?
>> Mark
>>
>>
>>
>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>> news:eJQqKytxKHA.4240@TK2MSFTNGP06.phx.gbl...
>>> Mark
>>>
>>> Check on-line for Allen Browne's website.  He has a routine that helps 
>>> limit the number of records a combobox has to pull down by "waiting" for 
>>> the first "n" letters to get entered.  This would be useful if you 
>>> followed the combobox and concatenation route.
>>>
>>> Good luck!
>>>
>>> Regards
>>>
>>> Jeff Boyce
>>> Microsoft Access MVP
>>>
>>> -- 
>>> Disclaimer: This author may have received products and services 
>>> mentioned
>>> in this post. Mention and/or description of a product or service herein
>>> does not constitute endorsement thereof.
>>>
>>> Any code or pseudocode included in this post is offered "as is", with no
>>> guarantee as to suitability.
>>>
>>> You can thank the FTC of the USA for making this disclaimer
>>> possible/necessary.
>>>
>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>> news:esPHAstxKHA.5940@TK2MSFTNGP02.phx.gbl...
>>>> Jeff,
>>>>
>>>> Ok searching on Soundex I think will help me dig up some code or flush 
>>>> out the best approach a little better.
>>>>
>>>> If anyone has done this would love to see your approach!
>>>>
>>>> Code I'm writing will be installed at multiple companies, most will 
>>>> have 5000 or less contact records.
>>>> The method should be designed to work well for 10,000 contacts.
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>>>> news:#Y#zxctxKHA.5776@TK2MSFTNGP06.phx.gbl...
>>>>> Mark
>>>>>
>>>>> Just for the record, the folks who read and write here in the 
>>>>> newsgroups are not all MVPs ... and some of the best answers I've seen 
>>>>> come from folks who aren't.  Don't limit your audience...
>>>>>
>>>>> Most of the approaches I've seen that work for this involve USB (using 
>>>>> someone's brain).  It sounds like that's part of your approach, too.
>>>>>
>>>>> Do a search on "Soundex".  This is an algorithm that uses how words 
>>>>> (e.g., names) sound to compare them.  Words with similar soundex 
>>>>> scores sounds similar.  This could help with last names and street 
>>>>> names, but I don't see it helping with Bobby vs. Robert, or with all 
>>>>> the embellishments that addresses have.  Again, you'd need to tell 
>>>>> Access that Bobby and Robert are (sometimes) synonymous.
>>>>>
>>>>> One approach might be to sort all entries by "lastname, firstname - 
>>>>> delivery address" (as a concatenated field) and USB to break the ties.
>>>>>
>>>>> Another approach might be to use the built-in autocomplete feature in 
>>>>> Access comboboxes on a form.
>>>>>
>>>>> Have your user start typing a lastname and have Access jump to the 
>>>>> "lastname, firstname - delivery address"es that start that way.
>>>>>
>>>>> You don't mention whether you're working with a couple hundreds 
>>>>> entries, a couple thousand, or a couple hundred thousand.  The 
>>>>> approach you take may need to differ, depending on volume.
>>>>>
>>>>> Good luck!
>>>>>
>>>>> Regards
>>>>>
>>>>> Jeff Boyce
>>>>> Microsoft Access MVP
>>>>>
>>>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in message 
>>>>> news:OYbK5TtxKHA.404@TK2MSFTNGP02.phx.gbl...
>>>>>> The idea is:
>>>>>> - the user enters "Bobby Smith" and the program pops up a screen 
>>>>>> saying here are some similar contacts "just to make sure this is not 
>>>>>> a duplicate contact being entered".  Example: Bob Smith 123 Main 
>>>>>> Street Pittsburgh PA 15230
>>>>>> If they knew that the "Bobby Smith" they were entering lived on Main 
>>>>>> Street in Pittsburgh they would choose to quit entering data for this 
>>>>>> new contact because they know that they are entering a duplicate or 
>>>>>> alternately they could continue entering the contact
>>>>>>
>>>>>> - similar type logic for addresses
>>>>>>
>>>>>> Yes I should of used the wording "potential duplicates" in my post 
>>>>>> and yes it's not extremely simple thus the newsgroup post.
>>>>>> Don't you MVPs like a challenge once in a while?
>>>>>>
>>>>>> First thoughts: I could do a simple comparison with 'like' or some 
>>>>>> sort of character by character comparison (if 90% of the characters 
>>>>>> match consider it a "potential duplicate").  I need some sort of 
>>>>>> "Similar" function.
>>>>>>
>>>>>> I don't think there are any potentially serious flaws with my 
>>>>>> analysis, my analysis at this point is "hey this might be a little 
>>>>>> work, I wonder if anyone else has attempted this and would let me see 
>>>>>> their code".
>>>>>>
>>>>>> Now is your chance to post the code you wrote to do these types of 
>>>>>> checks,
>>>>>> I'm sure others have tackled duplicate issues in various ways (some 
>>>>>> approaches better than others),
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>> "Jeff Boyce" <nonsense@nonsense.com> wrote in message 
>>>>>> news:eMdDSCtxKHA.5936@TK2MSFTNGP04.phx.gbl...
>>>>>>> You may not have received any responses yet because what you are 
>>>>>>> proposing is not particularly simple.
>>>>>>>
>>>>>>> ... and there are some potentially serious flaws with your analysis!
>>>>>>>
>>>>>>> How do you expect Access to be able to correctly categorize "Bob 
>>>>>>> Smith" as duplicating "Bobby Smith" when your database could 
>>>>>>> legitimately contain two separate individuals with those names?
>>>>>>>
>>>>>>> And what happens when you have two unique individuals, both named 
>>>>>>> Lynne Johnson?  (there are two in my state, and they were both born 
>>>>>>> on the same date!)
>>>>>>>
>>>>>>> I suspect you'll have to create your own code that tells Access 
>>>>>>> exactly when and how to consider two records to be close enough to 
>>>>>>> be a match ... and you might want to consider them only as 
>>>>>>> "potential" matches.
>>>>>>>
>>>>>>> After all, can YOU be sure that all of the following are 
>>>>>>> duplicates?:
>>>>>>>
>>>>>>>    John Smith    12345 Elm St
>>>>>>>    J. J. Smith     12345 Elm Street
>>>>>>>    John J. Smith 12345 Elm St NW
>>>>>>>    Johnny Smith 12354 Elm St.
>>>>>>>    J. Smith         12345 Elm St
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Jeff Boyce
>>>>>>> Microsoft Access MVP
>>>>>>>
>>>>>>> -- 
>>>>>>> Disclaimer: This author may have received products and services 
>>>>>>> mentioned
>>>>>>> in this post. Mention and/or description of a product or service 
>>>>>>> herein
>>>>>>> does not constitute endorsement thereof.
>>>>>>>
>>>>>>> Any code or pseudocode included in this post is offered "as is", 
>>>>>>> with no
>>>>>>> guarantee as to suitability.
>>>>>>>
>>>>>>> You can thank the FTC of the USA for making this disclaimer
>>>>>>> possible/necessary.
>>>>>>>
>>>>>>>
>>>>>>> "Mark Andrews" <mandrews___NOSPAM___@rptsoftware.com> wrote in 
>>>>>>> message news:OCNpZIsxKHA.2644@TK2MSFTNGP04.phx.gbl...
>>>>>>>>I would like to add code to detect for duplicates or records with 
>>>>>>>>similar information.
>>>>>>>> Looking to do this check on (firstname and lastname) and on address 
>>>>>>>> (street1, street2, city, state, zip)
>>>>>>>>
>>>>>>>> But I want something a little more advanced than just checking for 
>>>>>>>> exact matches.
>>>>>>>>
>>>>>>>> Wondering if anyone has some code they would care to share that 
>>>>>>>> might make my job of writing it a little easier?
>>>>>>>>
>>>>>>>> Example:
>>>>>>>> Bob Smith and Bobby Smith would be detected as duplicates
>>>>>>>> Rob Jones and Robert Jones would be detected as duplicates
>>>>>>>> 123 main street pittsburgh, pa 15126
>>>>>>>> 123 main st pittsburgh pa 15230 might be detected as duplicates
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
> 
0
Mark
3/18/2010 11:22:34 PM
On Thu, 18 Mar 2010 12:24:01 -0700, "Jeff Boyce" <nonsense@nonsense.com>
wrote:

>After all, can YOU be sure that all of the following are duplicates?:

And what about my friends, Fred Brown and Fred Brown? Young Fred is no longer
living at home, but he was when I first met him...
-- 

             John W. Vinson [MVP]
0
John
3/19/2010 1:51:29 AM
Reply:

Similar Artilces:

Send to look up problem
When sending an email I would like to only view contacts with email addresses and not ones with FAX also. What do I need to change in Outlook 2002 to make this work? See http://www.slipstick.com/contacts/nofax.htm Michael Servidio wrote: > When sending an email I would like to only view contacts > with email addresses and not ones with FAX also. What do > I need to change in Outlook 2002 to make this work? ...

Print Forms as they Look
I have a custom form that users will need to print the form as it looks, but with the regular print everything is changed to text. I tried XPath but the problem with that is if a machine does not have that dll, the information on the sent form will not appear ...

Which Office X 10.1.5 duplicate fonts can be removed from which folders?
Which of these duplicate fonts can be removed from the Applications/Microsoft/Office X/Office/Fonts folder AND/OR the User/nnager/Library/Fonts folder? All of the following are duplicated in the two folders. Two of them, Times New Roman and Verdana, also are in OS 10.3.3/Library/Fonts. Arial Arial Black Century Gothic Comic Sans MS Copperplate Gothic Bold Copperplate Gothic Light Curlz MT Edwardian Script ITC Impact Lucida Handwriting Monotype Sorts Tahoma Times New Roman Verdana Wingdings Respectfully, Norm On Tue, 6 Apr 2004 04:42:59 -0400, Norman R. Nager, Ph.D. wrote (in article <...

Duplicates
One worksheet has 600 + lines with a lot of expenses, including SOME of the local expenses. This is a check register.. Another worksheet has 300 + lines with ONLY local expenses. Trying to get ALL expenses together, but eliminating the duplicates from new list. Thinking B2=date. C2=name, E2=amount - - SO B2&C2&E2 on first long worksheet and then the same on short worksheet... Bring BOTH over to new tab and then stuck. Any ideas greatly appreciated, or another better way to attack it. You could use excel filter: Data - Filter - Advanced Filter - Unique records on...

Duplicate data from field into another field in same form?
Can I duplicate data from one field on a form into another field in the same form? For example: On our exhibitor entry form, the Program Contact info (fields for name, address, phone, fax, email) is entered first. Mail Contact info is often the same, but not always. Is it possible to autopopulate the Mail Contact with the data from the Program Contact, but allow me to enter new data if necessary? If so, can you tell me exactly how? Thanks! -- Andi On Fri, 23 Mar 2007 09:46:24 -0700, Andi <Andi@discussions.microsoft.com> wrote: >Can I duplicate data from one field on a ...

Receiving duplicate emails
I have just recently created my email account in Microsoft Outlook and now when I receive email I get the same one 15 times (and that is no stretch!) Is there something I have done wrong? How do I fix this?! Thanks What version of Outlook do you have? What sort of mail account(s)? Do you only get a single message many times, or all the messages many times? -- Jeff Stephenson Outlook Development This posting is provided "AS IS" with no warranties, and confers no rights "Carrie" <cadelaney@rogers.com> wrote in message news:0a0e01c3afaf$f8efab60$a001280a@phx.g...

Duplicate, Duplicate, dupli ...
Hi, I'm new to the group, so hope I don't disgrace myself. I have a list of names in the first column, and their respective postal codes in the second. Unfortunately, many of the names, (and respective codes), are duplicated... sometimes there are up to four of the same name, (and code), under each other. Is there a way to get rid up the duplicates and make a usable list please? I can do it manually of course, but .. .. Thanks. Dave Dave Data>Filter>Advanced Filter. "unique records only" and "copy to another location" For more on this visit Debra Dalgl...

IDE - 2003
Recently installed Office 2003 and just noticed how when in my Code window the cursor "covers/highlights" a full single character versus allowing me to click and get the "|" between the characters (true notepad/edit environment). Do I have a "setting" engaged that is causing this? I'de like to have it (the cursor) appear as "|". never mind;;; I needed to press the "insert" key daaa.. "Jim May" <jmay@cox.net> wrote in message news:pvOEe.82035$Fv.16277@lakeread01... > Recently installed Office 2003 and just noticed...

Quit creating duplicate desktop icons when doing upgrades/hotfix
Upon any sort of upgrade or hotfix, RMS creates a whole new set of desktop icons. What a pain! Pls fix. Ask if we want to create new desktop icons, pls. ---------------- This post is a suggestion for Microsoft, and Microsoft responds to the suggestions with the most votes. To vote for this suggestion, click the "I Agree" button in the message pane. If you do not see the button, follow this link to open the suggestion in the Microsoft Web-based Newsreader and then click "I Agree" in the message pane. http://www.microsoft.com/Businesssolutions/Community/NewsGroups/...

How do I duplicate numbering on a raffle?
I am creating an event ticket but would like to number the tickets on both the left and right sides of the ticket. Create a data base with your numbers. Use mail merge for the tickets. You can insert a mail merge field on both sides of your ticket. When you merge have one ticket on your screen. What version Publisher are you using. Some early versions of Publisher shows all the merge items the same in print preview. This is a Publisher bug. Mail, e-mail, and catalog merge http://office.microsoft.com/en-us/publisher/CH100502901033.aspx -- Mary Sauer http://msauer.mvps.org/ "Candi...

New to Charts
Hello there! I have a task to put together 5 "ladders" showing sales targets. Each of the 5 have a different "goal", but I need all the ladders to be the same height. (So, some "rungs" will be thicker or some ladders will have more rungs, but the overall height of each ladder must be the same). These ladders are reflecting sales for this quarter and will "grow" as the quarter goes on. Each ladder is a different color. Raw data is collected daily, but I think I'm only going to use the accumulative total each time I plot the graph. Bob's go...

Generating Code...
Recently I have encountered what appears to be an endless hangup, tying up 100% of the cpu. Could someone explain what is going on here? ThePlanner3.cpp Generating Code... cl.exe terminated at user request. Tool execution canceled by user. I might add that this is still occurring after resetting my Release Project Settings. The Debug configuration is fine. ----------- "SteveR" <srussell@removethisinnernet.net> wrote in message news:udziG18cHHA.2316@TK2MSFTNGP04.phx.gbl... > Recently I have encountered what appears to be an endless hangup, tying up > 100% of th...

Does this look scary to you?
I'm reviewing some code and I'm concerned about how the code is dealing with returning a BSTR value. Consider the following function (shorter version of the real function). STDMETHODIMP CForm::GetStringValue(LONG lFieldID, BSTR* pBstr) { if (!pBstr) return E_POINTER; CString csDesc = GetRuntimeText(lFieldID); *pBstr = (*pBstr ? csDesc.SetSysString(pBstr) : csDesc.AllocSysString()); if (csDesc.IsEmpty()) return E_FAIL; else return S_OK; } My first concern is the line: *pBstr = (*pBstr ? csDesc.SetSysString(pBstr) : csDesc.AllocSys...

Using look-up wizard to create a mulivalued "checklist"
I am trying to create a field in one of my tables that displays a list of courses that my school offers. Students would access the databse and click on the courses they would like to take. I am able to create a field using the lookup wizard that refers to another table in my database; however, I have no option to have it allow multiple selections from the students. I have followed through the instructions as found here: http://office.microsoft.com/en-us/access/HA100140981033.aspx Under the heading, "Create the multivalued lookup based on a table or query" Step 7 sa...

Visual Basic Code #2
Hi, I am trying to write some visual basic code behind a macro and I want it to do the following. 1) Select the whole of coloum C 2) Activate the find box as in (CTRL + F) Alternativly some code so that I could enter something in cell A1 then click the macro and it would search coloumn C for the value that is in cell A1. Many thanks, Glenn If you search VBA Help for: FindMethod you'll see some examples. Meanwhile....Here's something to get you started: Sub UseFind() Dim vResult Dim sht As Worksheet Set sht = ActiveSheet With sht Set vResult = .Range("C:C") _ ...

Complete code that works with MemoryStream but not with FileStream
Hi! This is complete code that is a console application. I encrypt a string test using the symmetric algoritm RijndaelManaged and write the encrypting string to a file using a FileStream. I then read the decryped file and put the contents in a byte array If I now use a MemoryStream as the first parameter of CryptoStream it works perfect. If I instead use the FileStream refering to the encrypted file as the first parameter of CryptoStream it cause an IndexOutOfRangeException saying "The index laid outside the limit for the matrix" I mean that because fsEncrypt is a Fi...

How to delete "non-identical" duplicate records in an Access table
How to delete "non-identical" duplicate records in an Access Table? Where "non-identical" duplicate record means a record in the table that has slightly different datum in one of the fields, but an identical duplicate datum in the field that I am concerned with. For example: SSN MRN CLIENT NAME 001-00-2222 11170419 Smith, Jane 001-00-2222 11170419 Smith, Jane T 001-00-2222 11170419 Smith, Jane Thompson The data of these two records in the fields SSN and MRN are identical; but "non-identical" in the CLIENT NAME field (notice...

Duplicate ItemLookupcode
Any clues on how I could get duplicate itemlookupcodes?? I found a few of them recently. No SQL queries have been run to generate them and if I try to create a duplicate itemLookupCodes I am stopped with a message telling me to use a unique name. How could they have gotten in the database? Any clues, boos or comments appreciated. Thanks for all the help! Jamie W This is a multi-part message in MIME format. ------=_NextPart_000_001F_01C755DB.F64982C0 Content-Type: text/plain; charset="Utf-8" Content-Transfer-Encoding: quoted-printable As you mentioned, RMS Manager doesn't ...

Find duplicates in a column
Hi, I have 6000 emails addresses in a col and wish to know if there are any duplicates. How can I do this please. I have Excel 2002. rock Assuming your emails are in A1 to A6000, in B1 enter =COUNTIF($A$1:$A$6000,A1) copy down B1 all the way to B6000 Apply a conditional format of red pattern on cells B1 to B6000 where Cell is greater than 1 and all duplicates will be shown in red. "rock" wrote: > Hi, > > I have 6000 emails addresses in a col and wish to know if there are any > duplicates. > > How can I do this please. > > I have Excel 2002. > ...

Duplicates!
Why do the songs in my media player library often duplicate or tripulicate themselves, so i end up with a library three times the size it should be. I have on occassions gone through and deleted all the replicated songs, but over a period of time they seem to "grow again" filling my library again with unwanted duplicates, taking up more space and memory. Is there anyway i can stop this happening????? Thanks. What version of WMP/Windows? First, make sure that WMP isn't monitoring the same folder twice. Duplicate songs can appear when, for example, both C:\Music a...

area codes is auto filling my own 9 digit ph# vs just area code
When I enter a phone number for a contact it autofills with my personal area code and phone number instead of just the area code. How to I change this to just autofil the area code? I am using Outlook 2007 on an ACER laptop. Make sure you have your area code entered correctly in "Dialing Properties". "Computer Dummy" wrote: > When I enter a phone number for a contact it autofills with my personal area > code and phone number instead of just the area code. How to I change this to > just autofil the area code? I am using Outlook 2007 on an ACER ...

VBA Code for Pasting Sheets
I would like a spreadhseet that pastes the contents of one sheet into another sheet. I like like to do this for 7 different sheets For example: I would like paste the contents form sheet titled "sheet1" into a sheet titled "data1". Continue to process for pasting "sheet2" into "data2" and "sheet3" into "data3" all way until "sheet7" and "data7". thanks, Curt Subject: Automated Copy Paste Subject: Copy/Paste Import/Export Data VBA Code On Apr 27, 10:49=A0am, Curt <C...@discussions.mi...

owner draw bitmap button, sloppy looking bitblt
My ownerdraw button class has three states: up, mouse over, and down. Sometimes -- but not always -- the transition between states produces an ugly flicker. What causes this and how can I remove it? Thanks. > Sometimes -- but not always -- the transition between states produces an > ugly flicker. What causes this and how can I remove it? To answer my own question, implementing OnEraseBkgnd so it does nothing fixes the problem. BOOL CSkinButton::OnEraseBkgnd(CDC* pDC) { // TODO: Add your message handler code here and/or call default return FALSE; //return CButton::OnEraseBkgnd...

Integrate Paycodes, benefit codes and deduction codes
Has anyone used integration manager to update new pay rates, deduction amounts and benefit amounts for employees? At the beginning of each year, our company gives pay increases and we need to update the pay codes, deduction codes and benefit codes for 40 employees, which we get the information from a spreadsheet. I thought that maybe I could use integration manager to update the pay, benefit and deduction codes instead of going into each employee's card, which is time consuming. Thanks, Laura Integration Manager will allow you to do this. Use the Payroll Master Destination. one ...

zip codes don't merge #2
I am trying to mail merge w/ Word 2000 the names and addresses in my worksheet. When I get to the part to choose the format for the mailing labels, I choose F1, F2, etc. to F6 (which is the zip code column). A few do get there, but the vast majority stop at the state, leaving off the entire zip code. I have gone to menu/format and selected text in the number tab. I have gone to format/cells and chosen special/zip code in the number tab. I've read Excel for Dummies. Please help me. TIA bb ...