![]() |
#1
|
|||
|
|||
![]()
This may sound ridiculously complicated, but here a stab.
Looking for a practical/proficient process to identify matches between data in two sheet columns. For example lets say I have Sheet 1 with columns A and B and 300,000 rows. Ref# Folder Path 1 C:\Folder 1\Test.png ... 243 C:\Folder 2\Subfolder 1\Testing.jpg ... ..... 300000 E:\Folder1\SubFolder2\Test.doc I have Sheet 2 with 5 columns and 100 rows SN Type Title Date Name 1-23 A Dogs 1//1/25 Test.doc .... 35-456 A Cats 1/2/24 Test.png ... 100-D-3 B Pigs 1/4/20 Testing.jpg I need to create a third sheet that gets the data from Sheet 1 Column A where the data in Sheet 2 Column 5 "Name" is "found" in Sheet 1 Column B. So in the example. Sheet 3 would look like this: A 1 35-456 Cats 1/2/24 Test.png 243 100-D-3 Pigs 1/4/20 Testing.jpg 300000 1-23 Dogs 1/1/25 Test.doc My initial process was like this Loop through each row in Sheet 1 (300,000) Loop through each row in Sheet 2 (100) Check InStr(Sheet1.Cell, Sheet2,Cell) > 1 Yes we have a match, develop Sheet3 row This process takes about a minute to run. I revised the process to: Create a collection Loop through each row in Sheet 2 (100) Check if Sheet2.Column 5 is Found in Sheet1 Yes For Each find Add row index to collection For Each Index in Collection Develop Sheet3 row This process takes about 12 seconds. Either method is working, but the potential exists for the row counts in sheet 1 or sheet 2 to grow considerably and the only result is longer processing time. Wondering if there is a better way. I know nothing about Power Query and very little about Excel. Hoping some expert knows a more efficient process to minimize looping to identify all row indexes in sheet 1 that requires processing? Thanks cross posted at Identify Matches between Sheet Columns |
#2
|
|||
|
|||
![]() Code:
Sub demo() Dim lastRow As Long, pos As Long, i As Long, count As Long, path As String, key As String, data(), result(), dic As Object With Worksheets("Sheet1") lastRow = .Cells(Rows.count, "A").End(xlUp).Row If lastRow = 1 Then Exit Sub ' there is no data in Sheet1 data = .Range("A2").Resize(lastRow - 1, 2).Value ' data from sheet1 to data() End With With Worksheets("Sheet2") lastRow = .Cells(Rows.count, "A").End(xlUp).Row If lastRow = 1 Then Exit Sub ' there is no data in Sheet2 End With Set dic = CreateObject("Scripting.Dictionary") dic.CompareMode = vbTextCompare For i = 1 To UBound(data, 1) path = data(i, 2) ' column B sheet1 pos = InStrRev(path, "\") If pos > 0 Then key = Mid(path, pos + 1, Len(path)) If Not dic.exists(key) Then dic.Add key, data(i, 1) ' item = column A End If Next i data = Worksheets("Sheet2").Range("A2").Resize(lastRow - 1, 5).Value ' data from sheet2 to data() ReDim result(1 To UBound(data, 1), 1 To 1) ' result in one column: <column A sheet1)><sheet2: <SN><Title>< Date><Name>> For i = 1 To UBound(data, 1) key = data(i, 5) If dic.exists(key) Then count = count + 1 result(count, 1) = dic.Item(key) & " " & data(i, 1) & " " & data(i, 3) & " " & data(i, 4) & " " & data(i, 5) End If Next i If count Then Worksheets("Sheet3").Range("A2").Resize(count).Value = result End If Set dic = Nothing End Sub |
#3
|
|||
|
|||
![]()
Thanks. I'll look at this and see if I can make it work.
|
#4
|
|||
|
|||
![]()
If the results are to be in 5 columns: Ref#, SN, Title, Date, Name then change it to
Code:
ReDim result(1 To UBound(data, 1), 1 To 5) … If dic.exists(key) Then count = count + 1 result(count, 1) = dic.Item(key) result(count, 2) = data(i, 1) result(count, 3) = data(i, 3) result(count, 4) = data(i, 4) result(count, 5) = data(i, 5) End If … Worksheets("Sheet3").Range("A2").Resize(count, 5).Value = result |
#5
|
|||
|
|||
![]()
batman1,
Interesting approach and while I am close to making it work, the actual file is much larger and more complicated than my simple example. Two things: 1. There is not always a match. So when I run your code, Count and Ubound(result) are not the same. 2. So, I think that is why I get error about subscript out of range with Resize(count, 5). I can't upload the file here. It is too big. You can get it at the cross post link if your want to look. |
#6
|
|||
|
|||
![]() Quote:
Code:
ReDim result(1 To UBound(data, 1), 1 To 5) But since there are UBound(data, 1) rows in Sheet2 then there are maximum count = UBound(data, 1) matches: Always count <= UBound(data, 1) = UBound(result, 1) Quote:
Since you didn't attach a file, I created the file myself and I'm testing for data in such structures. For the future, attach the file. |
#7
|
|||
|
|||
![]()
I tried to attach the file previously but it is too big. I was able to get your code to run but the result when run on my test file only returns 22 rows when it should return 43. I was able to post the test file here:
Identify Matches between Sheet Columns (see post #6) It is a pretty complicated process (for me at least) and the example I asked about was a feeler to see if there was a way to do it without loop through all of the rows (speed enhancement). I believe with some slight mods that your approach might work, but it stills does a lot of looping and I'm not sure if in the end there would be that much improvement in speed. Thank you for sharing your ideas. I have used a dictionary before but clearly not as proficient with them as you. Well done!! |
#8
|
|||
|
|||
![]() Quote:
For example, let's take row 5 in NDS_SHEET, the value to search for = ??? 2. We already have the value to search for, but where do we search in sheet REF_SHEET? It is best to attach a file: 1. NDS_SHEET with 20 rows of data 2. REF_SHEET with 100 rows of data 3. Manually enter all matches in the RESULT sheet |
#9
|
|||
|
|||
![]()
I'll have to create something for you tomorrow. Thank you.
|
#10
|
|||
|
|||
![]()
The attached should let you see the current process and results.
|
#11
|
|||
|
|||
![]() Sub demo in Module1 |
#12
|
|||
|
|||
![]()
No need for coding for this.
As start, I'd advice to split the 2nd column to path and file columns respectively (a golden database law - don't keep several types of information in same column/field). This will help not only with current task, but will be useful in future too. In my excample, on sheet Files I added formulas how to do this easily. Don't use special characters in column names (I replaced Ref# with RefNo). On Info sheet, I added a RefNo column with formula which returns matching RefNo from Files sheet. Now you can simply set the filter for this column >0 to display all entries where the matching RefNo was found - or set the filer = 0 to display all entries where matching RefNo was not found. |
#13
|
|||
|
|||
![]() Batman, While you matched the output of the last test file, your process when run on the actual data file returned 21 rows when I should have returned 39. I think the problem is your output is limited to the number of unique file names you find in NDS_SHEET. That is not the intended process. The intended process is to look at every row in the RS_SHEET For Every row look at Every row in the NDS_SHEET If the row NDS_SHEET File Name (column 8) is InStr(RS_SHEET column 2) Then create record in Overlay. I'm currently doing that process just in a different way and I get the result I need. It is faster than looping through every row and every row but still takes time. Your process seems faster but it isn't returning all the required results. I've attached another file which I think will help you see what I am talking about if you still have interest. Thank you. |
#14
|
|||
|
|||
![]() Quote:
Thank for you reply. I am currently reviewing two other methods 1) by Batman here and 2) In the cross posted forum. Give me a chance to look at what you propose. However, it does have to be something coded as it is part of a much larger utility where the end users just click buttons. |
#15
|
|||
|
|||
![]() Quote:
Arvil, Thanks again. Your points are well taken and deserve attention. However, my actual data and requirement is more complicated than simply splitting file paths. My fault for giving such a simplistic example. In a previous reply to Batman, I have posted a better example of what I am trying to achieve. |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to Remove the subtotal columns in a sheet | Marcia | Excel | 3 | 12-01-2023 05:48 AM |
![]() |
mbesspiata3 | Excel | 2 | 01-06-2017 05:42 AM |
![]() |
malam | Excel Programming | 1 | 10-17-2014 10:01 PM |
![]() |
Zubairkhan | Excel | 2 | 03-04-2014 10:57 PM |
Removing columns within sheet | shabbaranks | Excel | 2 | 09-11-2012 05:03 AM |