Quote:
Originally Posted by batman1
Well, because you didn't write many things. I can't guess your intention. You have to say everything.
1. The case of duplicates in column H in NDS_SHEET In NSD_SHEET H24 = H25 = DemoData_0000015_Import.zip You didn't write how to search, so I did this: When the code finds H24 = "DemoData_0000015_Import.zip" in B229 in REF_SHEET, it no longer searches for H25 in REF_SHEET (that's why I use dictionary DIC) So how do you want it now? Should there be 2 results (for H24 and H25)? <#266912.1><line 24> <#266912.1><line 25> ??? Or maybe something else? If you don't say anything, I won't know anything.
2. The issue of duplicates in column B in REF_SHEET. For example, let's say that "DemoData_0000015_Import.zip" occurs 3 times in column B of REF_SHEET - in B299, B367 and B489. How many results should there be? At the moment, the code will find "DemoData_0000015_Import.zip" in B299 and stop searching, it will not search any further to find in B367 and B489 (Exit For in If pos Then … End If). So what do you want now? The code should search further to B367 and B489?
3. Are we looking for "File Name" from column H of sheet NDS_SHEET in column B of sheet REF_SHEET, or vice versa: will we find for each "File Path" from column B of sheet REF_SHEET its "companion" in column H of sheet NDS_SHEET?
4. Or maybe something else that I don't know about? If you don't say anything, I won't know anything.
|
Batman,
I understand your frustration. As I tried to explain at the very start, it is a complicated process. Complicated and difficult to explain.
The REF_SHEET can contain hundreds of thousands of records. The one I am testing with has 440,000. The NDS_SHEET can have tens of thousands, but for testing I have only 21.
The code I provided in modMain launches a userform interface. With that interface you can create all of the possible outcomes. It works. I am just looking for something that might work faster.
DemoData_0000015_Import.zip You didn't write how to search, so I did this: When the code finds H24 = "DemoData_0000015_Import.zip" in B229 in REF_SHEET, it no longer searches for H25 in REF_SHEET (that's why I use dictionary DIC) So how do you want it now? Should there be 2 results (for H24 and H25)? <#266912.1><line 24> <#266912.1><line 25> ??? Or maybe something else? If you don't say anything, I won't know anything.
That depends on if you select "First Match Only" or "Duplicate Reference Row" in the first case then there should be only one record in the Overlay for #266912.1" in the second case there should be two records.
2. The issue of duplicates in column B in REF_SHEET. For example, let's say that "DemoData_0000015_Import.zip" occurs 3 times in column B of REF_SHEET - in B299, B367 and B489. How many results should there be? At the moment, the code will find "DemoData_0000015_Import.zip" in B299 and stop searching, it will not search any further to find in B367 and B489 (Exit For in If pos Then … End If). So what do you want now? The code should search further to B367 and B489?
Yes. Every record of the REF_SHEET that might have a match in the NDS_SHEET must be processed. That was my initial approach. Loop through every REF_SHEET record ... with my actual data this was taking about a minute.
I changed that initial approach to first loop through the smaller NDS_SHEET to find and create a collection of REF_SHEET Rows that "DO" have a match. I then looped through the Rows in that collection. With the actual data the process now takes about 12 seconds. The options for the REF_SHEET records are:
If No Match Found in NDS_SHEET
1. Discard (don't include) RS_SHEET REF/Control# records in Overlay
2. Copy control ID from RS_SHEET as new record in Overlay
3. Cancel processing - don't create overlay
3. Are we looking for "File Name" from column H of sheet NDS_SHEET in column B of sheet REF_SHEET.
Yes. NDS_SHEET column 8 list single file name e.g., Test.png.
REF_SHEET column 2 can list single or mulitple files names e.g., Testing.png//Test.png//This is a test.png. If InStr(REF_SHEET(2),NDS_SHEET(8)> 0 Then
If First Match Option
Record and get out
Else
Record either as new row, or delimited data in existing row.
End If
... will we find for each "File Path" from column B of sheet REF_SHEET its "companion" in column H of sheet NDS_SHEET?
It is not really a file path. It can be a single file name or multiple files name delimited with // / or \. No. There are three options if a "companion" is not found in the NDS_SHEET.
All the looping takes time. As I've said, I've gotten down to 12 seconds with the 440,000 REF_SHEET I have. I like your thought process and its possible it might shave off a a few seconds. I thought a power query might get it done in the blink of and eye?? Thank you.