![]() |
|
|
|
#1
|
|||
|
|||
|
Hi! I'm new to the forum
Sort of an advanced question, I think? THE FOLLOWING WORKS But it's slow. It takes about a half hour to work through an 84 page document Maybe my biggest question is state management In Excel, I can turn off printer communication, formula calculation, etc. to add a ton of speed. None of that appears to be a thing for Word VBA - except for maybe screen updating. (And that doesn't appear to work - I can still watch the macro's progress before Word locks up?) My second (less important) question is is there a simpler approach? (A lot of my first attempts included a few things that wouldn't work for manipulating a varied-column-count table.) I have a lot of experience with Excel VBA - but hardly any with Microsoft Word VBA. So maybe I'm taking the wrong approach? Thinking in tables instead of strings? Here's the situation: I'm using Acrobat Pro to export a PDF into Word ... so I can paste it into Excel and read/manipulate the numbers. The exporter (or the PDF itself?) has a few flaws: It doesn't create one, uniform table (same number of columns, same width, all rows) ... it sort of barfs it all into row after row - with diverse (apparently arbitrary?) column counts and widths. Worse, the exporter uses tab stops and alignment to format the table from there - it looks great (like the PDF) but cannot be read as data. (Exporting direct to Excel is worse - it just skips the tab stops and alignment part altogether. Without that formatting, I have even fewer clues to tell what information corresponds to what column, etc..) So here's what I have so far The code below uses delimiters to: Replace each tab character with the associated tab stop's apparent page position Add the apparent right margin's page position to right-justified cells What it creates, I can paste into Excel and use another (much faster) macro to redistribute data across columns - i.e. 1.42 inches to 1.95 inches = destination column 3, etc. Code:
Option Explicit
Private Sub wordMacro_start()
Dim xcount As Long
Dim xcol As Long, xrow As Long, xlng As Long
Dim xstr As String
Dim xdbl As Double
'Tracks macro's speed
Dim startTime As Date
startTime = Now()
Application.ScreenUpdating = False
Dim tbl As Table
Dim tbs As TabStop
Dim rng As Range
'Loops through every table
For Each tbl In ActiveDocument.Tables
xcount = xcount + 1
Debug.Print xcount 'So I can read progress
DoEvents 'So I have windows to pause execution
'Loops through every column
xcol = 1
Do
If xcol > tbl.Columns.Count Then Exit Do
'Loops through every row
xrow = 1
Do
If xrow > tbl.Rows.Count Then Exit Do
'Skips when cell address doesn't exist within the table
Set rng = Nothing
On Error Resume Next
Set rng = tbl.Cell(xrow, xcol).Range
On Error GoTo 0
If Not rng Is Nothing Then
'Tab stops and tab characters
'Loops through every tab stop within the cell
For Each tbs In rng.Paragraphs.TabStops
'Gets starting position of next tab character
xlng = InStr(1, rng.Text, vbTab)
If xlng = 0 Then Exit For
'Gets position of the tab stop + previous cell widths in the row
'Equals page position, sorta
xstr = rng.Text
xstr = Left(xstr, xlng - 1) & _
"{" & (pullRecursive_width(tbl, xrow, xcol) - _
PointsToInches(tbl.Cell(xrow, xcol).Width)) + _
PointsToInches(tbs.Position) _
& "}" & _
Right(xstr, Len(xstr) - xlng)
'Replaces the tab character with the position number
rng.Text = xstr
Next tbs
'Alignment
'Fortunately, cells with right alignments don't generate with tab characters
'If alignment is right
If tbl.Cell(xrow, xcol).Range.Paragraphs.Alignment = wdAlignParagraphRight Then
'If cell didn't generate as 'empty'
If tbl.Cell(xrow, xcol).Range.Text <> Chr(13) & _
Chr(7) Then
'Gets this + all previous cell widths in the row
'Equals page position of the right margin of the cell
xdbl = pullRecursive_width(tbl, xrow, xcol)
xstr = "{" & _
xdbl & "}" & _
rng.Text
'Adds the position to the cell width
rng.Text = xstr
End If
End If
End If
xrow = xrow + 1
Loop
xcol = xcol + 1
Loop
Next tbl
'Takes forever, sometimes start it and leave - so I have the macro save on finish
ActiveDocument.Save
'Describes macro's speed
MsgBox "(" & Format(Now() - startTime, "Nn \mi\nute\s, Ss \se\co\n\d\s") & ")"
Application.ScreenUpdating = True
End Sub
Private Function pullRecursive_width(sent_table As Table, _
xrow As Long, _
xcol As Long) As Double
'Starts with current cell's width
pullRecursive_width = PointsToInches(sent_table.Cell(xrow, xcol).Width)
'If current cell is first cell, returns
If xcol <= 1 Then Exit Function
'If current cell isn't first cell, adds previous cell's width
pullRecursive_width = pullRecursive_width + _
pullRecursive_width(sent_table, xrow, xcol - 1)
End Function
|
|
#2
|
||||
|
||||
|
Perhaps you could explain what it is you're trying to achieve? You've posted over 100 lines of code...
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#3
|
|||
|
|||
|
Have you tried opening the PDF from within Word. It might give you a more usable document.
|
|
#4
|
|||
|
|||
|
Have you thought about using the Range.ConvertToTable method?
If the pdf saves to word in a fairly structured way, it may be easiest to use this built in tool. It may have some errors, but that is almost inevitable when trying to access data from a pdf. |
|
#5
|
|||
|
|||
|
@macropod -
I am trying to add speed to the code I posted. Preferably with some state management ... but I'm thinking that's not a thing in Word VBA the way it is in Excel VBA. My end game is turning this FROM.png into this AFTER.png Because Exporting a PDF to Word mangles the tables and I want to analyze the data. Exporting direct to Excel is worse. My code uses tabs, tabstop position info, column widths, alignment info, etc. to determine where text is on the page. Once that's done, I paste it to Excel then move all the text into columns according to where Word put them @slaycock - I didn't think of that - thanks - unfortunately, the file came through with different, less workable problems? Some rows were moved to the beginning of their respective pages at random. I think this just straight up destroys the association I need @d4okeefe - I didn't know about .ConvertToTable, so I played around with it - unfortunately, no joy. I mean, this is the kind of thing Acrobat Pro export spits out: Data1 tab Data4 Data1 | Data2 | Data3 | Data4 The single tab there uses a tabstop to visually move Data4 into the correct spot visually - but it isn't actually in column 4 So ConvertToTable just moved it to column 2, etc. |
|
#6
|
||||
|
||||
|
Your process would be much easier to revise the code for if we had access to the Word file the conversion produces. Can you attach the actual document to a post with some representative data (delete anything sensitive)? You do this via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#7
|
|||
|
|||
|
|
|
#8
|
||||
|
||||
|
In my experience, Word itself often does a better job of converting a PDF especially when there are tables involved. See if that streamlines the process and results in a more useable Word version than that created by Acrobat.
See https://www.techrepublic.com/article...-in-word-2013/ to get the import happening.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
|
#9
|
||||
|
||||
|
Zars01: With the document you attached, I note that first table contains more columns than do the other tables. Are those extra columns of any consequence? Similarly, are there any rows you don't need (so they can be deleted instead of being tidied up unnecessarily)?
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#10
|
|||
|
|||
|
The code is slow, I think, because you set a range object for every table cell in the document.
Code:
Set rng = tbl.Cell(xrow, xcol).Range Code:
Sub test_for_tab_inside_table()
Dim r As Range
Set r = ActiveDocument.Content
With r.Find
.ClearFormatting
.Text = "^t"
.Execute
Do While .Found
If r.Information(wdWithInTable) Then
'Do the work
End If
.Execute
Loop
End With
End Sub
Last edited by d4okeefe; 06-20-2018 at 01:41 PM. |
|
#11
|
||||
|
||||
|
Assuming the extra columns on the first page of your attachment are superfluous, the following will reformat the table content quite quickly:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim t As Long, r As Long, c As Long, i As Long, Rng As Range, StrTmp As String
With ActiveDocument
For t = 1 To .Tables.Count
With .Tables(t)
.AllowAutoFit = False
With .Range.Font
.Name = "Arial"
.Size = 7
End With
.Rows.HeightRule = wdRowHeightExactly
.Rows.Height = 10
With .Range.ParagraphFormat
.SpaceBefore = 0
.SpaceAfter = 0
.LineSpacingRule = wdLineSpaceSingle
End With
For r = 1 To .Rows.Count
With .Rows(r)
If .Cells.Count > 5 Then
Set Rng = .Range
Rng.Start = .Cells(6).Range.Start
Rng.Cells.Delete
End If
Do While Split(.Cells(.Cells.Count).Range.Text, vbCr)(0) = ""
.Cells(.Cells.Count).Delete
Loop
i = .Cells.Count
If i < 5 Then
Set Rng = .Range
With .Range.Tables(1)
If r < .Rows.Count Then .Split .Rows(r + 1)
.Split .Rows(r)
.Rows.Add
End With
Select Case i
Case 1
StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
.Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
.Cells(5).Range.Text = Split(StrTmp, vbTab)(2)
Case 2
StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
StrTmp = Split(Rng.Tables(2).Cell(1, 2).Range.Text, vbCr)(0)
If InStr(StrTmp, vbTab) > 0 Then
.Cells(4).Range.Text = Split(StrTmp, vbTab)(UBound(Split(StrTmp, vbTab)) - 1)
End If
.Cells(5).Range.Text = Split(StrTmp, vbTab)(UBound(Split(StrTmp, vbTab)))
Case 3
StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
Case 4
StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
StrTmp = Split(Rng.Tables(2).Cell(1, 2).Range.Text, vbCr)(0)
If StrTmp <> "" Then .Cells(2).Range.Text = Split(StrTmp, vbTab)(0)
StrTmp = Split(Rng.Tables(2).Cell(1, 3).Range.Text, vbCr)(0)
If StrTmp <> "" Then .Cells(4).Range.Text = Split(StrTmp, vbTab)(0)
StrTmp = Split(Rng.Tables(2).Cell(1, 4).Range.Text, vbCr)(0)
If StrTmp <> "" Then .Cells(5).Range.Text = Split(StrTmp, vbTab)(0)
End Select
.Range.Tables(1).Rows(r).Range.Font.Bold = Not IsNumeric(Split(.Cells(.Cells.Count).Range.Text, vbCr)(0))
With Rng
.End = .Tables(2).Range.End + 1
.Start = .Tables(2).Range.Start - 1
.Delete
End With
End If
End With
Next
End With
DoEvents
Next
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
| Tags |
| state, tabstops |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to Prevent auto period after two-letter state abbreviation | gogreen | Word | 12 | 05-19-2018 08:36 PM |
| Cable management | adminstefan | Visio | 1 | 02-13-2017 08:48 AM |
User Selectable Buttons That Express State
|
Andrew H | Word | 1 | 11-08-2012 07:36 PM |
| Resetting to Default State not working | carlgrossman | Word | 0 | 08-02-2008 01:31 AM |
| Please help with Sum formula to add totals by State! asap | dutch4fire23 | Excel | 0 | 07-28-2006 12:41 PM |