Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 12-13-2013, 12:59 PM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default PDF to Excel


I am trying to convert an 800 page PDF file into Excel so I can automatically update pricing information. I have tried a number of different sources and have not had any luck with this conversion. There are programs that will export the data into Excel applications, but it does not do it in a clean format. I have tried all of the basic applications that pop up from a basic Google search and have also tried converting it using the full version of Adobe Acrobat.

Here is a link to the PDF: http://www.lcbapps.lcb.state.pa.us/w...uctCatalog.PDF

Once I convert that into an organized excel file, I am planning on writing a macro to have it automatically update a pricing field in an inventory spreadsheet. Any thoughts?
Reply With Quote
  #2  
Old 12-13-2013, 01:17 PM
Pecoflyer's Avatar
Pecoflyer Pecoflyer is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Expert
 
Join Date: Nov 2011
Location: Brussels Belgium
Posts: 2,779
Pecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant futurePecoflyer has a brilliant future
Default

Perhaps try Acrobat XI. Otherwise there are tons of converters out there
__________________
Did you know you can thank someone who helped you? Click on the tiny scale in the right upper hand corner of your helper's post
Reply With Quote
  #3  
Old 12-13-2013, 01:27 PM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default

I have tried to do the conversion in Acrobat already. It does not convert it in a usable format as it adds dozens of additional rows/columns, changes alignments, combines items in the same fields, etc. Same thing goes for the majority of the conversion programs available. These are some of the ones I have tried so far:

PDF to Excel
Zamar
https://www.pdftoexcelonline.com/‎
www.pdftoexcel.org/
www.freepdfconvert.com/pdf-excel‎
www.pdfexcelconverter.com/‎

I know there has to be a way to convert this is a clean format, but I have been banging away at it for months now with no luck.
Reply With Quote
  #4  
Old 12-13-2013, 07:58 PM
BobBridges's Avatar
BobBridges BobBridges is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 32bit
Expert
 
Join Date: May 2013
Location: USA
Posts: 700
BobBridges has a spectacular aura aboutBobBridges has a spectacular aura about
Default

Shane, it may seem a second-best solution to you, but if you were planning on writing a VBA program to modify the worksheet, why not have the program also clean up the translation as well? Surely whatever problems you find with any one translation utility will occur the same way every time, right? So a VBA program should be able to run the translation, open the result, look through the resulting mess, pick out the part of the data that you actually want and do the updating that you wanted in the first place.

As a bonus, the VBA program could also fetch the latest copy of the PDF from the internet—one less step for the user.
Reply With Quote
  #5  
Old 12-14-2013, 11:05 AM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default

Thank you for your response. All of the conversion formats I have seen thus far are too inconsistent even to write code to clean up. PDF to Excel is the program I found that comes the closest, but it puts the retail pricing and sales pricing in different columns randomly. I attached an example of what I am referring to.

The ideal situation is to include VBA code to download a fresh copy of teh PDF every month, but that is going to be more of a nice-to-have feature vs a necessity for this project.
Attached Files
File Type: xlsx plugin-productCatalog - example.xlsx (71.6 KB, 20 views)
Reply With Quote
  #6  
Old 12-14-2013, 11:37 AM
BobBridges's Avatar
BobBridges BobBridges is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 32bit
Expert
 
Join Date: May 2013
Location: USA
Posts: 700
BobBridges has a spectacular aura aboutBobBridges has a spectacular aura about
Default

This you call "random"? Actually this looks cleaner than I expected. I do see what you mean about it not putting the important data in consistent columns, but your program should be able to sort that out with very little difficulty; it's just a matter of determining the pattern and explaining it to your program.

Let's see what we can figure out. Each table starts with the word "Code", always in column 1; so your program can find the start of each table by searching for the next appearance of "Code" in col 1—or, if you determine that it sometimes appears in other columns you can look elsewhere too. In the same row, the other column headings are consistent even though they're in varying columns, so your program can determine where to find the Code, the Price and the other data. The code and the description are sometimes in separate columns and sometimes combined, but that's easy to figure out. And the column for the sale price has no header; but the column is missing only when there is no sale, and when it's present it's always between Sales End and Price.

Figuring out the length of the table is the only tricky part, and that only slightly; in most cases the end of the table is marked by a blank cell in the Code column, but in two of the tables there are sections where the Code and description are in the next column. Maybe if both columns are empty, that's the end of a table? No, in more than one table the footnote ("Retail price include...") is up against the table with no blank space intervening. Ah, here we go: In the Price column, "Page n" always appears at the end of the table. Ok, let me play with this and come up with a way to turn this data into something more rational. Or maybe this gives you the right idea without a demo?
Reply With Quote
  #7  
Old 12-14-2013, 02:14 PM
BobBridges's Avatar
BobBridges BobBridges is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 32bit
Expert
 
Join Date: May 2013
Location: USA
Posts: 700
BobBridges has a spectacular aura aboutBobBridges has a spectacular aura about
Default

There, take a look at that. It takes a bit of work to write this sort of thing, and of course every time the publisher changes the layout you may have to adjust your work, but this is the general idea; and to my way of thinking it's worth the effort if you're going to have to run it multiple times.
Attached Files
File Type: xlsm x.xlsm (47.6 KB, 31 views)
Reply With Quote
  #8  
Old 12-14-2013, 10:30 PM
macropod's Avatar
macropod macropod is offline PDF to Excel Windows 7 32bit PDF to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Cross-posted at: http://www.mrexcel.com/forum/general...mpossible.html
For cross-posting etiquette, please read: http://www.excelguru.ca/content.php?184
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #9  
Old 12-15-2013, 01:35 AM
macropod's Avatar
macropod macropod is offline PDF to Excel Windows 7 32bit PDF to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

I'd suggest saving the PDF as a text file, then opening it Word and running the following Word macro:
Code:
Sub ParsePDFData()
Application.ScreenUpdating = False
With ActiveDocument.Range
  .Paragraphs.First.Range.Delete
  .Paragraphs.First.Range.Delete
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchWildcards = True
    .Text = "^13[!^13]@^13[!^13]@^13[!^13]@^13^12^13[!^13]@^13"
    .Replacement.Text = "^p"
    .Execute Replace:=wdReplaceAll
    .Text = "^13[!^13]@^13[!^13]@^13[!^13]@^13^12^13"
    .Replacement.Text = "^p"
    .Execute Replace:=wdReplaceAll
    .Text = "[ ]{1,}^13"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "^13{2,}"
    .Replacement.Text = "^p"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "(^13[0-9]{1,}>)([!$]@)($[!^13]{1,})"
    .Replacement.Text = "\1^t\2^t^t\3"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}"
    .Replacement.Text = "^t^&"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "(EACH )(^t)(^t$)"
    .Replacement.Text = "\2\1\3"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "([0-9]{3,}^t[!^t]@^t)([!0-9])"
    .Replacement.Text = "\1^t^t\2"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "($[0-9.]{4,}) ($[!^13]{1,})"
    .Replacement.Text = "\1^t\2"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "^t[ ]{1,}"
    .Replacement.Text = "^t"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "[ ]{1,}^t"
    .Replacement.Text = "^t"
    .Execute Replace:=wdReplaceAll
  End With
  .Copy
End With
Call Export
Application.ScreenUpdating = True
End Sub
 
Sub Export()
Dim xlApp As Object, xlWkBk As Object
Set xlApp = CreateObject("Excel.Application")
xlApp.Visible = True
xlApp.ScreenUpdating = False
Set xlWkBk = xlApp.Workbooks.Add
With xlWkBk.Sheets(1)
  .Range("A1").PasteSpecial Paste:=-4163 'xlPasteValues
  .Columns.AutoFit
  .Columns("A:A").ColumnWidth = 8
  .Range("A1").Select
End With
xlApp.ScreenUpdating = True
Set xlWkBk = Nothing: Set xlApp = Nothing
End Sub
The result will be an Excel worksheet containing the data, all nicely aligned. As coded, the only substantive difference is that the red values and cross-out values are shifted one column to the right, so that all the current prices are in the same column.

Note: With 800 pages of data to process, the code will take some time to complete.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]

Last edited by macropod; 12-15-2013 at 04:15 AM. Reason: Enhanced XL output
Reply With Quote
  #10  
Old 12-24-2013, 04:36 PM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default

Bob, that is definitely what I am looking for! Could you let me know the code you used to clean it up that way so I could try running it through the entire file to see if it works? Thank you again very much for your help with this!
Reply With Quote
  #11  
Old 12-24-2013, 04:58 PM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default

Macropod, that macro does to a pretty decent job at outputting the txt into an xls file. There are a couple fields that are still off (mainly the first item under each heading), but I don't think any of them are going to affect me.

My ultimate goal is to run a script that will pull all of the updated prices for each item and copy it to Cost/Inventory Sheet. Is this something that you could help me write as well? I am thinking it would go something like this:

1. Search through the outputted text for the item "code". Ex. Jim Beam would be "4079".
2. Once the item is found in the outputted text, see if there is any changes in the "Price" field based on the price in the Cost/Inventory Sheet.
3. If that price is changed, update the price in the Cost/Inventory Sheet and highlight it to show there was a change.
4. Repeat this process for each item in the inventory

Thanks again for all the help and I hope you have a good holiday!
Reply With Quote
  #12  
Old 12-24-2013, 06:20 PM
BobBridges's Avatar
BobBridges BobBridges is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 32bit
Expert
 
Join Date: May 2013
Location: USA
Posts: 700
BobBridges has a spectacular aura aboutBobBridges has a spectacular aura about
Default

The code is actually in the workbook, Shane. Open it up and then look at the VBA code—if you don't know how, ask and any one of us will be able to explain it—and you can see how it works.

Then again, if you don't know how, maybe you've never written a VBA macro and you won't see how it works, in which case you're starting from further back than I suspected. If so, we should settle down and discuss the program a piece at a time.
Reply With Quote
  #13  
Old 12-24-2013, 06:25 PM
macropod's Avatar
macropod macropod is offline PDF to Excel Windows 7 32bit PDF to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by shanemarkley View Post
There are a couple fields that are still off (mainly the first item under each heading)
Easily enough fixed:
Code:
Sub ParsePDFData()
Application.ScreenUpdating = False
With ActiveDocument.Range
  .Paragraphs.First.Range.Delete
  .Paragraphs.First.Range.Text = "Code" & vbTab & "Product" & vbTab & "Size" & vbTab & "Sales Start" & vbTab & "Sales End" & vbTab & "Price" & vbTab & "Old Price" & vbCr
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchWildcards = True
    .Text = "^13[!^13]@^13[!^13]@^13[!^13]@^13^12^13[!^13]@^13"
    .Replacement.Text = "^p"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "[ ]{1,}^13"
    .Execute Replace:=wdReplaceAll
    .Text = "^13{2,}"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "^13[0-9]{1,2}/[0-9]{1,2}/[0-9]{4} Page [0-9]{1,4}*notice.^13"
    .Replacement.Text = ""
    .Execute Replace:=wdReplaceAll
    .Text = "(^13)([A-Z][!^13]@)^13([A-Z0-9][!$^13]@)^13([0-9]{1,}) ([!$]@$[0-9.]{4,}>)"
    .Replacement.Text = "\1\4 \2 \3 \5"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "(^13[0-9]{1,}>)([!$]@)($[!^13]{1,})"
    .Replacement.Text = "\1^t\2^t^t^t^t\3"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "([0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}) ([0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}) ^t^t^t"
    .Replacement.Text = "^t^t\1^t\2"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "(EACH )(^t)"
    .Replacement.Text = "\2\1"
    .Execute Replace:=wdReplaceAll
    .Text = "([0-9.]{2,5}[ ML]{2,4})(^t)"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "($[0-9.]{4,}) ($[!^13]{1,})"
    .Replacement.Text = "\1^t\2"
    .Execute Replace:=wdReplaceAll
    DoEvents
    .Text = "^t[ ]{1,}"
    .Replacement.Text = "^t"
    .Execute Replace:=wdReplaceAll
    .Text = "[ ]{1,}^t"
    .Execute Replace:=wdReplaceAll
    DoEvents
  End With
  .Copy
End With
Call Export
Application.ScreenUpdating = True
End Sub
 
Sub Export()
Dim xlApp As Object, xlWkBk As Object
Set xlApp = CreateObject("Excel.Application")
With xlApp
  .Visible = True
  .ScreenUpdating = False
  Set xlWkBk = .Workbooks.Add
  With xlWkBk.Sheets(1)
    .Range("A1").PasteSpecial Paste:=-4163 'xlPasteValues
    .Columns.AutoFit
    .Columns("A:A").ColumnWidth = 8
    .Range("A2").Select
    .Columns("C:C").HorizontalAlignment = xlRight
  End With
  With .ActiveWindow
      .SplitColumn = 0
      .SplitRow = 1
      .FreezePanes = True
  End With
  .ScreenUpdating = True
End With
Set xlWkBk = Nothing: Set xlApp = Nothing
End Sub
Note: There's a few extra enhancements to deal with wrapped lines in the source that come out as disjointed paragraphs in the text file. Plus there's a header row for the output.
Quote:
My ultimate goal is to run a script that will pull all of the updated prices for each item and copy it to Cost/Inventory Sheet. Is this something that you could help me write as well?
Does that mean you're really only concerned with the items that have the two prices?
Quote:
I am thinking it would go something like this:

1. Search through the outputted text for the item "code". Ex. Jim Beam would be "4079".
2. Once the item is found in the outputted text, see if there is any changes in the "Price" field based on the price in the Cost/Inventory Sheet.
3. If that price is changed, update the price in the Cost/Inventory Sheet and highlight it to show there was a change.
4. Repeat this process for each item in the inventory
As coded, the macro sends its output to a new Excel file. Obviously some changes would be required to output the results to a file that already exists. Indeed, in such a scenario it might be better to run the macro from Excel, let Excel automate a Word session for the parsing, then just do the necessary updating.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #14  
Old 12-26-2013, 11:34 AM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default

Bob,my apologies. I was able to see the macro in that file. I have been doing some training on VBA, but I am still pretty new to it. I tried to run the macro against the entire converted PDF and it ended up locking up Excel. I tried to let it run for about 2 hours, but still no luck. Based on the code, I am guessing the the macro sorts through the data in tab2 and outputs it into tab1? What I did was paste the data from the fully converted PDF into tab2 (plugin-productCatalog2) and then ran the macro from that tab.

Maybe it would be best if the macro just pulled out the needed data based on what liquors an establishment sells and outputted that in the correct format instead of everything? This would save a ton of time and processing power since it would only need to be a couple hundred items. I attached part of the Cost/Inventory spreadsheet I am working with so you can see an example of the format of spreadsheet I am trying to update.

Also, the first converted PDF I sent you was done using an online program called "PDF to Excel". Would it be possible for the macro you wrote to clean up an .xls file that was converted from Adobe Acrobat? This would be the more ideal solution as opposed to using an online program to do the conversion every time. I attached part of the outputted spreadsheet from the Adobe Acrobat conversion.

Thanks again for all your help with this and I hope you had a happy holiday!
Attached Files
File Type: xlsm Liquor Inventory_Cost sheet - Example.xlsm (63.5 KB, 10 views)
File Type: xlsx Adobe PDF to Excel Convestion example.xlsx (130.6 KB, 14 views)
Reply With Quote
  #15  
Old 12-26-2013, 11:38 AM
shanemarkley shanemarkley is offline PDF to Excel Windows 7 64bit PDF to Excel Office 2010 64bit
Novice
PDF to Excel
 
Join Date: Dec 2013
Posts: 29
shanemarkley is on a distinguished road
Default

Macropod, thank you for fixing that piece. You are correct in saying that this would be more ideal if it could all be run via Excel if that is not too difficult. I am concerned with not only the fields that have special prices, but all of the fields that are in our inventory.

Not sure if you saw my response to Bob, but maybe it would be best if the macro just pulled out the needed data based on what liquors an establishment sells and outputted that in the correct format instead of everything? This would save a ton of time and processing power since it would only need to be a couple hundred items. In the above message, I attached part of the Cost/Intventory spreadsheet I am working with so you can see an example of the format of spreadsheet I am trying to update.

Many thanks for all your help with this!
Reply With Quote
Reply

Tags
adobe, conversion, pdf



Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to Excel [Excel 2007] Building Power Point Slides from data in an Excel Table bremen22 Excel Programming 1 08-07-2013 11:01 AM
Paste special an Excel range into Outlook as an Excel Worksheet charlesh3 Excel Programming 3 02-04-2013 04:33 PM
PDF to Excel Excel 2011 can't open old Excel 98 or Excel X files FLJohnson Excel 8 05-09-2012 11:26 PM
Excel 2007 custom ribbon not showing in Excel 2010 Paulzak Excel 2 02-17-2012 06:35 PM
PDF to Excel saving data in excel 2010 from excel 2003 johnkcalg Excel 1 02-06-2012 07:33 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 01:49 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft