Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 06-22-2018, 11:55 AM
strodden strodden is offline Locating/deduping similar but not identical text strings Windows 10 Locating/deduping similar but not identical text strings Office 2010 32bit
Novice
Locating/deduping similar but not identical text strings
 
Join Date: Aug 2017
Posts: 22
strodden is on a distinguished road
Default Locating/deduping similar but not identical text strings

Hello - the attached list has a lot of companies which are similarly, but identically spelled which I assume are duplicates - i.e., ABC Company vs ABC Company Inc. I have worked with my IT dept to try and use Fuzzy Lookup add in to fix this, but we could not get it to function properly. Does someone have a suggestion?
Attached Files
File Type: xlsx Company List.xlsx (141.4 KB, 12 views)
Reply With Quote
  #2  
Old 06-25-2018, 10:24 AM
ArviLaanemets ArviLaanemets is offline Locating/deduping similar but not identical text strings Windows 8 Locating/deduping similar but not identical text strings Office 2016
Expert
 
Join Date: May 2017
Posts: 873
ArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud of
Default

When you hope to get some formula or code which assigns correct company names everywhere, then you are out of luck! There a so much combinations of less than 10 stray words, that when Excel can decide, which ones are same company and which not, it's IQ must be a lot higher than IQ of average Joe. As IQ of Excel is the one of useful idiot (equals with 0), then you have to work yourself.

But you can use tools, which make this work easier. The attached file can be one.

The sheet FirmList is your original sheet, defined as Table (With tables writing/editing formulas is much easier, and formulas are added automatically whenever new data is added into table). I also added aa couple of columns - about them later. And on same sheet you can select firm/company you want to check. Company name is selected from list based on Firm column of Table on same sheet (look at name lSourceFirm).

When the firm is selected, on sheet SelectedFirm the selected firm name without leading or trailing spaces is calculated, and number of words in firm name (separated by spaces) is calculated. And the firm name is splitted into separate words in Table on same sheet. In this example workbook, up to 5 words are processed - when you want more you have to modify the design. And to add according number of rows into Table on this sheet too.

Now follows sheet Combinations. At top of sheet is Table with word numbers (W1 ... W5). When you want to increase the number of processed words, you have to add columns there.

Below is another table. Here you determine, for which combinations of words in company name the application will search for.
Column Code indicates, which words form Table on sheet SelectedFirm are combined.
Columns W1...W5 are for company names consisting 1 to 5 words. When you increase max number of words, you have to add those W# columns (up to 9, when more, then you have to change the coding system). Value 1 in column means, the according word with same number as added to header is read from table on sheet SelectedFirm and combined at position determined with Code. Value 0 means, that according word is left out. This is the table where you have a lot of work when you decide to increase the number of processed words - you have to declare all those combinations (to add a lot of records, to add Codes, and right values into columns W1...Wn according those Codes.
Column RowNo simply counts rows in Table, and is used in formulas.
Column IsSelection - here the formula reads data from column W#, where # is max number of words in Table on sheet SelectedFirm, and return1 Code when the value in cell is 1. Otherwise 0 is returned.
SelectionNo counts rows with IsSelection not 0.
Word1...Word5 - words matching with IsSelection > 0 are returned, ortherwise empty string is returned. When you increase max number of words, you have to add those Word# columns with proper formula.
CheckStr0 combines all words from Word1...Word5.
CheckStr reads data from CheckStr0, but excludes result strings with length 1(as otherwise all companies where such character is in name are considered as matching), and all result strings matching with some entry in Table on sheet ExcludeList.

About sheet ExcludeList is all clear, I hope.

Sheet CheckList has a Table, which reads data from Combinations sheet (Empty CheckStr0 values are excluded).

Sheet QryTbl contains the result table of ODBC query, which combines Firm column on FirmList sheet with CheckStr column on CheckList sheet - only records where Firm column value contains a value in CheckStr column is displayed. It was only way, I could think of. It gives a lot of abundant matches for very sort checkstrings, but a couple of hundred is anyway less then several thousand. You can also consider excluding all checkstrings with length less than 3 in column CheckStr on Combinations sheet.
This query is the reason, why the workbook is in Macro Workbook format - whenever the workbook is renamed or copied to another location, and after that opened anew, the Open event of workbook updates the connection, so no error is returned or no data is read from wrong workbook.

The work order is: you select a company, and refresh the query (right-click on querytable, and select Refresh, or select from Data menu Refresh All). In sheet FirmList in column In Selection all rows where some match with querytable is found, TRUE is returned, otherwise FALSE is returned. You can set autofilter to TRUE, and work through the list. OR you activate the autofilter, select from autofilter list companys you want to check (the list is in alphabetical order), and display only those records.

Now 2 sheets are remaining.

On sheet FirmRegistry is Table, where you can register the correct company name. It may be some name from your table, or entirely different (e.g. found by Google).

On sheet FirmAliases, you can register all variants of company name you did find in your table, and attach the correct company name from sheet FirmRegistry to them. On sheet FirmList, a correct company name is displayed in column FirmReg, whenever a registered alias is found.
Reply With Quote
  #3  
Old 06-25-2018, 10:37 AM
ArviLaanemets ArviLaanemets is offline Locating/deduping similar but not identical text strings Windows 8 Locating/deduping similar but not identical text strings Office 2016
Expert
 
Join Date: May 2017
Posts: 873
ArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud ofArviLaanemets has much to be proud of
Default

I added the workbook, but somehow it got lost. A new try.
Attached Files
File Type: xlsm CompanyList.xlsm (277.9 KB, 8 views)
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Locating/deduping similar but not identical text strings Renaming several strings of text by their pattern noname91 Word 10 01-05-2017 05:46 PM
Locating/deduping similar but not identical text strings Highlighting placeholder text similar to Adobe ohi Word 1 07-29-2016 09:09 PM
Locating/deduping similar but not identical text strings How to link similar strings. PRA007 Word VBA 2 01-20-2016 04:22 AM
Locating/deduping similar but not identical text strings Need help with a VLOOKUP formula & two similar, yet non-identical columns of data. mikey386 Excel 1 12-18-2014 01:59 AM
Locating/deduping similar but not identical text strings update style of all strings available between two specific strings vikrantkale Word 1 03-28-2011 06:13 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 11:19 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft