Quote:
Originally Posted by gmaxey
Batman,
After working with this code a bit more, I am going to mark this thread solved. I found that I didn't really need to use a dictionary at all and I believe the real efficiency gain was writing data to the arrays vice directly in the worksheet. The following code does 95% on what I need and with a 440,000 row REF sheet, it does it in under 9 seconds. That is a big improvement and I think with a little more work I can call the cannon ball fully polished. If someone comes along in this tread or the crosspost with a PowerQuery solution, I'll be eager to look at it but you have been a tremendous help. Thank you again!!
|
1. Yes, when there is a lot of data, reading/writing to the sheet should not be done cell by cell. Data should be read once into tables, something should be done on this data and once put into the sheet. And a dictionary can be useful. Note that at the beginning I did not know much about what you intended to do, you did not say anything about the data. At the beginning I did not know whether you were taking duplicates in H in the NDS_SHEET sheet but I assumed that I was not taking duplicates. So I am taking only the first one and will not check the next ones. I did not know anything about the data but it is possible that there are e.g. 10 duplicates of "DemoData_0000015_Import.zip". Of course, you can not use the dictionary, but each time you have to assess whether it is better to use the dictionary or not. Let's assume that there are 10 rows in NDS_SHEET and each one contains "DemoData_0000015_Import.zip" and in REF_SHEET 1000 rows and only 1 row contains "DemoData_0000015_Import.zip". There is only 1 result. DIC contains only 1 key and in the worst case, when "DemoData_0000015_Import.zip" is in the last row of REF_SHEET, the code executes 1000 FOR loops (1000 INSTR). If the dictionary is not used, then in the worst case after executing 999*10 = 9990 FOR loops (999 external * 10 internal) - 9990 times of INSTR execution the code will not find a match yet. Only with lngIndex = 1000 and lngItemIndex = 1 will it find a match.
2. I don't understand this
Code:
lngNDSRowIndex = lngItemIndex + 1
…
varOverlay_Data(lngRecordIndex, lngFldIndex + 1) = varNDS_Data(lngNDSRowIndex, lngFldIndex)
In my opinion it should be
Code:
lngNDSRowIndex = lngItemIndex
…
varOverlay_Data(lngRecordIndex, lngFldIndex + 1) = varNDS_Data(lngNDSRowIndex, lngFldIndex)
or more simply
Code:
varOverlay_Data(lngRecordIndex, lngFldIndex + 1) = varNDS_Data(lngItemIndex, lngFldIndex)