Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 05-21-2015, 05:40 AM
saurabhlotankar saurabhlotankar is offline Removing duplicates Windows XP Removing duplicates Office 2010 32bit
Novice
Removing duplicates
 
Join Date: May 2015
Posts: 29
saurabhlotankar is on a distinguished road
Post Removing duplicates

Hi Folks,



I have an excel sheet. In column A, I have 200 names(Firstname Lastname format). I want to remove duplicates from this column. Now the challenge is somes firstname and last names gets interchanged(e.g. value in A3 is John Smith and value is A80 is Smith John). Ideally its the same person and hence i have to keep his name only once.

So can anyone please assist me in removing such interchanged names duplicates.

Thanks.
Reply With Quote
  #2  
Old 05-21-2015, 09:59 PM
excelledsoftware excelledsoftware is offline Removing duplicates Windows 8 Removing duplicates Office 2003
IT Specialist
 
Join Date: Jan 2012
Location: Utah
Posts: 455
excelledsoftware will become famous soon enough
Default

Hmmmn I will need to see all the names to see if the code I am thinking will work. Can you post a sample book and I can work on this for you.

Thanks
Reply With Quote
  #3  
Old 05-24-2015, 05:00 PM
charlesdh charlesdh is offline Removing duplicates Windows 7 32bit Removing duplicates Office 2010 32bit
Expert
 
Join Date: Apr 2014
Location: Mississippi
Posts: 382
charlesdh is on a distinguished road
Default

HI,

As excelledsoftware mention. If you can attach a workbook so we can see what the data is we may be able to help.
Reply With Quote
  #4  
Old 05-24-2015, 11:47 PM
saurabhlotankar saurabhlotankar is offline Removing duplicates Windows XP Removing duplicates Office 2010 32bit
Novice
Removing duplicates
 
Join Date: May 2015
Posts: 29
saurabhlotankar is on a distinguished road
Default

Hi,

I have attached an excel sheet. In column A you will find few names one below another. Few names are repeate and few names are repeated with interchnaged first and last names. e.g. A4= Ethan Anderson and A10= Anderson Ethan. I am facing challange here in removing such duplicate names.

Kindly assist.

Thanks,
Attached Files
File Type: xlsx Duplicates.xlsx (9.6 KB, 12 views)
Reply With Quote
  #5  
Old 05-25-2015, 11:01 AM
charlesdh charlesdh is offline Removing duplicates Windows 7 32bit Removing duplicates Office 2010 32bit
Expert
 
Join Date: Apr 2014
Location: Mississippi
Posts: 382
charlesdh is on a distinguished road
Default

Hi,

Thanks for the file. But, do you have something in the other columns?
Can you provide more info for the other columns?
Reply With Quote
  #6  
Old 05-25-2015, 12:40 PM
excelledsoftware excelledsoftware is offline Removing duplicates Windows 8 Removing duplicates Office 2003
IT Specialist
 
Join Date: Jan 2012
Location: Utah
Posts: 455
excelledsoftware will become famous soon enough
Default

Charles, Ive got the idea on how to do this but you look pretty interested. Do you want to take a stab at it before I submit the code?
Reply With Quote
  #7  
Old 05-25-2015, 01:04 PM
Bruno Campanini Bruno Campanini is offline Removing duplicates Windows 8 Removing duplicates Office 2013
Novice
 
Join Date: Nov 2014
Posts: 19
Bruno Campanini is on a distinguished road
Default

Public Sub Test()
Dim h, i, j, k As Long, Split_1
Dim FirstName As String, SR As Range
Set SR = [Sheet1!A2:A11]
k = 1
For Each i In Range(SR(1), SR(SR.Count - 1))
k = k + 1
FirstName = Mid(i, 1, InStr(1, i, " ") - 1)
For Each j In Range(i(k), SR(SR.Count))
Split_1 = Split(j, " ")
For Each h In Split_1
If FirstName = h Then
MsgBox FirstName & " (" & i.Address & _
" " & j.Address & ")"
End If
Next
Next
Next

Bruno
End Sub
Reply With Quote
  #8  
Old 05-25-2015, 01:07 PM
excelledsoftware excelledsoftware is offline Removing duplicates Windows 8 Removing duplicates Office 2003
IT Specialist
 
Join Date: Jan 2012
Location: Utah
Posts: 455
excelledsoftware will become famous soon enough
Default

Quote:
Originally Posted by Bruno Campanini View Post
Public Sub Test()
Dim h, i, j, k As Long, Split_1
Dim FirstName As String, SR As Range
Set SR = [Sheet1!A2:A11]
k = 1
For Each i In Range(SR(1), SR(SR.Count - 1))
k = k + 1
FirstName = Mid(i, 1, InStr(1, i, " ") - 1)
For Each j In Range(i(k), SR(SR.Count))
Split_1 = Split(j, " ")
For Each h In Split_1
If FirstName = h Then
MsgBox FirstName & " (" & i.Address & _
" " & j.Address & ")"
End If
Next
Next
Next

Bruno
End Sub
Ok then I guess Bruno here decided to just post some code. Nevermind Charles. Be sure to remove the text Bruno from the sub or you will get a compile error. I can see that you went ahead and indented this code Bruno but it was not surrounded by code tags so the formatting was lost.
Reply With Quote
  #9  
Old 05-25-2015, 01:40 PM
excelledsoftware excelledsoftware is offline Removing duplicates Windows 8 Removing duplicates Office 2003
IT Specialist
 
Join Date: Jan 2012
Location: Utah
Posts: 455
excelledsoftware will become famous soon enough
Default

Actually this code has a bug. It only checks for the first name and not the last name. If you change a first name to Jacob the code will show that there is a duplicate when in fact there is not. Also I am not sure if msgboxes are the most efficient way to let the user know what duplicates to remove. I am curious about the solution to this myself so I am working on a sub that should do it. However since Bruno spent some time on providing a "in the works" solution I would like to see if he wants to fix the issues before I submit my code.

Thanks
Reply With Quote
  #10  
Old 05-25-2015, 03:13 PM
excelledsoftware excelledsoftware is offline Removing duplicates Windows 8 Removing duplicates Office 2003
IT Specialist
 
Join Date: Jan 2012
Location: Utah
Posts: 455
excelledsoftware will become famous soon enough
Default

Ok it has been over an hour with no reply so I am submitting the code that I wrote for this. This code will place a result in column B next to the value in Column A.
I have written 2 codes. One that will identify every duplicate to where you could sort and filter to identify which ones to remove and then another code that will identify just the duplicates. Meaning that if cell A3 has Jacob Parkar and cell A5 and A6 has Jacob Parkar or Parkar Jacob then only cells A5 and A6 will be identifed as duplicates. The only issue with this is that you may want the name with the last name starting as the original so play around with each and see which one works best.

As with all code be sure to back up your workbook and save before running.

Code:
Sub CheckAllDuplicates()
  'Checks each name to see if there are duplicates
  Dim ws As Worksheet, CheckRow As Long, FirstName As String, LastName As String
  Dim LastRow As Long, NameString As String, CheckName As String, CheckString As String
  Dim Counter As Integer
  
  'Set the references
  Set ws = ThisWorkbook.Worksheets(1)
  LastRow = ws.Range("A2").End(xlDown).Row
  For CheckRow = 2 To LastRow
    CheckString = ws.Range("A" & CheckRow).Value
    CheckName = "<" & CheckString & ">"
    FirstName = Mid(CheckString, 1, InStr(1, CheckString, " ") - 1)
    LastName = Mid(CheckString, Len(FirstName) + 2)
    'NameString = NameString & "<" & FirstName & " " & LastName & ">"
    NameString = NameString & "<" & LastName & " " & FirstName & ">"
  Next CheckRow
  For CheckRow = 2 To LastRow
    CheckString = ws.Range("A" & CheckRow).Value
    CheckName = "<" & CheckString & ">"
    Counter = 0
    Counter = WorksheetFunction.CountIf(ws.Range("A2" & ":A" & LastRow), CheckString)
    If InStr(1, NameString, CheckName) Then
      Counter = Counter + 1
    End If
    If Counter <> 1 Then
      Range("B" & CheckRow).Value = "Duplicate"
    End If
  Next CheckRow
  
End Sub
This one will show ONLY duplicates
Code:


Sub CountNameDuplicates()
  'Checks each name to see if there are duplicates
  Dim ws As Worksheet, CheckRow As Long, FirstName As String, LastName As String
  Dim LastRow As Long, NameArray() As String, CheckName As String, CheckString As String
  Dim Counter As Integer, arr As Long
  
  'Set the references
  Set ws = ThisWorkbook.Worksheets(1)
  LastRow = ws.Range("A2").End(xlDown).Row
  ReDim NameArray(2 To LastRow) As String
  For CheckRow = 2 To LastRow
    CheckString = ws.Range("A" & CheckRow).Value
    CheckName = "<" & CheckString & ">"
    FirstName = Mid(CheckString, 1, InStr(1, CheckString, " ") - 1)
    LastName = Mid(CheckString, Len(FirstName) + 2)
    'NameString = NameString & "<" & FirstName & " " & LastName & ">"
    NameArray(CheckRow) = "<" & LastName & " " & FirstName & ">"
  Next CheckRow
  For CheckRow = 2 To LastRow
    CheckString = ws.Range("A" & CheckRow).Value
    CheckName = "<" & CheckString & ">"
    Counter = 0
    Counter = WorksheetFunction.CountIf(ws.Range("A2" & ":A" & CheckRow), CheckString)
    'Check the array
    For arr = 2 To CheckRow
      If NameArray(arr) = CheckName Then
        Counter = Counter + 1
        Exit For 'no need to grab more
      End If
    Next arr
    'Check
    If Counter <> 1 Then
      Range("B" & CheckRow).Value = "Duplicate"
    End If
  Next CheckRow
End Sub
any questions please let me know.

To Bruno,

I appreciate the code that you wrote and effort you put forth. Charles and I were just unaware that you were going to provide a solution. I have had a time where I posted something without looking at the replies and caused unnecessary work for the person that was already working on it. Please let me know if you have any concerns regarding my comments.

Thanks so much.
Reply With Quote
  #11  
Old 05-25-2015, 04:21 PM
Bruno Campanini Bruno Campanini is offline Removing duplicates Windows 8 Removing duplicates Office 2013
Novice
 
Join Date: Nov 2014
Posts: 19
Bruno Campanini is on a distinguished road
Default

No concerns at all!
I just noted the bug you mentioned and the non-ability of my code to discover
names like:
Juan Carlos Primero -- Juan Primero Carlos
aa bb cc dd ee ff gg -- ff gg cc aa bb ee dd
and so on.
Then this new code:
=========================
Public Sub Test()
Dim k As Long, Split_1, Split_2
Dim h, i, j, n, SR As Range
Set SR = [Sheet1!A2:A26]
k = 1
For Each i In Range(SR(1), SR(SR.Count - 1))
k = k + 1
Split_1 = Split(i, " ")
For Each j In Range(i(k), SR(SR.Count))
Split_2 = Split(j, " ")
If UBound(Split_1) = UBound(Split_2) Then
k = 0
For Each h In Split_1
For Each n In Split_2
If h = n Then
k = k + 1
End If
Next
If k = UBound(Split_1) + 1 Then
MsgBox i & " " & j & vbCrLf & _
i.Address & " " & j.Address
End If
Next
End If
Next
k = 1
Next
End Sub
======================

Your opinion would be much appreciated

Bruno
Reply With Quote
  #12  
Old 05-25-2015, 05:05 PM
Bruno Campanini Bruno Campanini is offline Removing duplicates Windows 8 Removing duplicates Office 2013
Novice
 
Join Date: Nov 2014
Posts: 19
Bruno Campanini is on a distinguished road
Default

If you want to immediately delete duplicate (record or entire row):
=============================
Public Sub Test_1()
Dim k As Long, Split_1, Split_2
Dim h, i, j, n, SR As Range
Set SR = [Sheet1!A2:A29]
k = 1
For Each i In Range(SR(1), SR(SR.Count - 1))
k = k + 1
Split_1 = Split(i, " ")
For Each j In Range(i(k), SR(SR.Count))
Split_2 = Split(j, " ")
If UBound(Split_1) = UBound(Split_2) Then
k = 0
For Each h In Split_1
For Each n In Split_2
If h = n Then
k = k + 1
End If
Next
If k = UBound(Split_1) + 1 Then
'MsgBox i & " " & j & vbCrLf & _
i.Address & " " & j.Address
j(1, 1) = "*" & j.Address
End If
Next
End If
Next
k = 1
Next
For k = SR.Count To 1 Step -1
If Left(SR(k), 1) = "*" Then
' Delete Record
SR(k).Delete Shift:=xlUp

' Delete Entire Row
'SR(k).EntireRow.Delete Shift:=xlUp
End If
Next
End Sub
===============================

Bruno
Reply With Quote
  #13  
Old 05-25-2015, 08:43 PM
excelledsoftware excelledsoftware is offline Removing duplicates Windows 8 Removing duplicates Office 2003
IT Specialist
 
Join Date: Jan 2012
Location: Utah
Posts: 455
excelledsoftware will become famous soon enough
Default

Quote:
Originally Posted by Bruno Campanini View Post
No concerns at all!
I just noted the bug you mentioned and the non-ability of my code to discover
names like:
Juan Carlos Primero -- Juan Primero Carlos
aa bb cc dd ee ff gg -- ff gg cc aa bb ee dd
and so on.
Then this new code:
=========================
Public Sub Test()
Dim k As Long, Split_1, Split_2
Dim h, i, j, n, SR As Range
Set SR = [Sheet1!A2:A26]
k = 1
For Each i In Range(SR(1), SR(SR.Count - 1))
k = k + 1
Split_1 = Split(i, " ")
For Each j In Range(i(k), SR(SR.Count))
Split_2 = Split(j, " ")
If UBound(Split_1) = UBound(Split_2) Then
k = 0
For Each h In Split_1
For Each n In Split_2
If h = n Then
k = k + 1
End If
Next
If k = UBound(Split_1) + 1 Then
MsgBox i & " " & j & vbCrLf & _
i.Address & " " & j.Address
End If
Next
End If
Next
k = 1
Next
End Sub
======================

Your opinion would be much appreciated

Bruno
I would be happy to lend my opinion. Everyone has their own coding style and I can see you have your own. I will say that the variable names are very difficult to understand but they work for you and if you are the only person that needs to understand your code you are just fine with that.

I absolutely love how you used the Split function to extract out the first and last name. Brilliant I of course used 2 mids but it would have been much easier to use your method.

The first code fixed the issue it displayed a msgbox identifying the duplicates. The next code went a step further by deleting the entry right out and not actually deleting the entire row (Good call on that) the only issue I ever have with code deleting data is if there was some reason the user needed to validate what the code did. I am confident that your code works but users may want to see the data to delete before they actually do so. I learned a good amount from your code and the techniques you did. The last bit of feed back is to place any code in the code brackets. use [ then "code" then another ] without the quotes then to end your code tag use [ then /code then antoher ] This is very very important as it will maintain your indentation and make things more consistent.

Keep up the great work.

Thanks
Reply With Quote
  #14  
Old 05-26-2015, 12:11 AM
saurabhlotankar saurabhlotankar is offline Removing duplicates Windows XP Removing duplicates Office 2010 32bit
Novice
Removing duplicates
 
Join Date: May 2015
Posts: 29
saurabhlotankar is on a distinguished road
Default

Hello Folks,

Both codes are working perfectly fine. Thank you so much for such a quick turn around.

Once again thanks a ton to both excelledsoftware and Bruno Campanini for your help.

Have a great day ahead guys.
Reply With Quote
  #15  
Old 05-26-2015, 10:13 AM
charlesdh charlesdh is offline Removing duplicates Windows 7 32bit Removing duplicates Office 2010 32bit
Expert
 
Join Date: Apr 2014
Location: Mississippi
Posts: 382
charlesdh is on a distinguished road
Default

excelledsoftware, Bruno

Nice bit of code.
I was wanting to see if there was data in another column that would indicate if the names were duplicated. I'm still not too old to learn from others,
Reply With Quote
Reply

Tags
duplicate entries



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract duplicates in table goran.c Excel 0 01-21-2015 12:47 AM
Removing duplicates Adding duplicates as comments mnut Excel Programming 1 10-17-2014 10:05 PM
Mail duplicates mixy Outlook 0 02-10-2011 12:54 AM
Removing duplicates Inbox and subfolder duplicates JoJo Outlook 1 01-28-2011 04:39 PM
Removing duplicates sum of duplicates zxmax Excel 1 09-29-2006 08:29 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 10:20 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft