This you call "random"? Actually this looks cleaner than I expected. I do see what you mean about it not putting the important data in consistent columns, but your program should be able to sort that out with very little difficulty; it's just a matter of determining the pattern and explaining it to your program.
Let's see what we can figure out. Each table starts with the word "Code", always in column 1; so your program can find the start of each table by searching for the next appearance of "Code" in col 1—or, if you determine that it sometimes appears in other columns you can look elsewhere too. In the same row, the other column headings are consistent even though they're in varying columns, so your program can determine where to find the Code, the Price and the other data. The code and the description are sometimes in separate columns and sometimes combined, but that's easy to figure out. And the column for the sale price has no header; but the column is missing only when there is no sale, and when it's present it's always between Sales End and Price.
Figuring out the length of the table is the only tricky part, and that only slightly; in most cases the end of the table is marked by a blank cell in the Code column, but in two of the tables there are sections where the Code and description are in the next column. Maybe if both columns are empty, that's the end of a table? No, in more than one table the footnote ("Retail price include...") is up against the table with no blank space intervening. Ah, here we go: In the Price column, "Page n" always appears at the end of the table. Ok, let me play with this and come up with a way to turn this data into something more rational. Or maybe this gives you the right idea without a demo?
|