#1
|
|||
|
|||
State management?
Hi! I'm new to the forum
Sort of an advanced question, I think? THE FOLLOWING WORKS But it's slow. It takes about a half hour to work through an 84 page document Maybe my biggest question is state management In Excel, I can turn off printer communication, formula calculation, etc. to add a ton of speed. None of that appears to be a thing for Word VBA - except for maybe screen updating. (And that doesn't appear to work - I can still watch the macro's progress before Word locks up?) My second (less important) question is is there a simpler approach? (A lot of my first attempts included a few things that wouldn't work for manipulating a varied-column-count table.) I have a lot of experience with Excel VBA - but hardly any with Microsoft Word VBA. So maybe I'm taking the wrong approach? Thinking in tables instead of strings? Here's the situation: I'm using Acrobat Pro to export a PDF into Word ... so I can paste it into Excel and read/manipulate the numbers. The exporter (or the PDF itself?) has a few flaws: It doesn't create one, uniform table (same number of columns, same width, all rows) ... it sort of barfs it all into row after row - with diverse (apparently arbitrary?) column counts and widths. Worse, the exporter uses tab stops and alignment to format the table from there - it looks great (like the PDF) but cannot be read as data. (Exporting direct to Excel is worse - it just skips the tab stops and alignment part altogether. Without that formatting, I have even fewer clues to tell what information corresponds to what column, etc..) So here's what I have so far The code below uses delimiters to: Replace each tab character with the associated tab stop's apparent page position Add the apparent right margin's page position to right-justified cells What it creates, I can paste into Excel and use another (much faster) macro to redistribute data across columns - i.e. 1.42 inches to 1.95 inches = destination column 3, etc. Code:
Option Explicit Private Sub wordMacro_start() Dim xcount As Long Dim xcol As Long, xrow As Long, xlng As Long Dim xstr As String Dim xdbl As Double 'Tracks macro's speed Dim startTime As Date startTime = Now() Application.ScreenUpdating = False Dim tbl As Table Dim tbs As TabStop Dim rng As Range 'Loops through every table For Each tbl In ActiveDocument.Tables xcount = xcount + 1 Debug.Print xcount 'So I can read progress DoEvents 'So I have windows to pause execution 'Loops through every column xcol = 1 Do If xcol > tbl.Columns.Count Then Exit Do 'Loops through every row xrow = 1 Do If xrow > tbl.Rows.Count Then Exit Do 'Skips when cell address doesn't exist within the table Set rng = Nothing On Error Resume Next Set rng = tbl.Cell(xrow, xcol).Range On Error GoTo 0 If Not rng Is Nothing Then 'Tab stops and tab characters 'Loops through every tab stop within the cell For Each tbs In rng.Paragraphs.TabStops 'Gets starting position of next tab character xlng = InStr(1, rng.Text, vbTab) If xlng = 0 Then Exit For 'Gets position of the tab stop + previous cell widths in the row 'Equals page position, sorta xstr = rng.Text xstr = Left(xstr, xlng - 1) & _ "{" & (pullRecursive_width(tbl, xrow, xcol) - _ PointsToInches(tbl.Cell(xrow, xcol).Width)) + _ PointsToInches(tbs.Position) _ & "}" & _ Right(xstr, Len(xstr) - xlng) 'Replaces the tab character with the position number rng.Text = xstr Next tbs 'Alignment 'Fortunately, cells with right alignments don't generate with tab characters 'If alignment is right If tbl.Cell(xrow, xcol).Range.Paragraphs.Alignment = wdAlignParagraphRight Then 'If cell didn't generate as 'empty' If tbl.Cell(xrow, xcol).Range.Text <> Chr(13) & _ Chr(7) Then 'Gets this + all previous cell widths in the row 'Equals page position of the right margin of the cell xdbl = pullRecursive_width(tbl, xrow, xcol) xstr = "{" & _ xdbl & "}" & _ rng.Text 'Adds the position to the cell width rng.Text = xstr End If End If End If xrow = xrow + 1 Loop xcol = xcol + 1 Loop Next tbl 'Takes forever, sometimes start it and leave - so I have the macro save on finish ActiveDocument.Save 'Describes macro's speed MsgBox "(" & Format(Now() - startTime, "Nn \mi\nute\s, Ss \se\co\n\d\s") & ")" Application.ScreenUpdating = True End Sub Private Function pullRecursive_width(sent_table As Table, _ xrow As Long, _ xcol As Long) As Double 'Starts with current cell's width pullRecursive_width = PointsToInches(sent_table.Cell(xrow, xcol).Width) 'If current cell is first cell, returns If xcol <= 1 Then Exit Function 'If current cell isn't first cell, adds previous cell's width pullRecursive_width = pullRecursive_width + _ pullRecursive_width(sent_table, xrow, xcol - 1) End Function |
#2
|
||||
|
||||
Perhaps you could explain what it is you're trying to achieve? You've posted over 100 lines of code...
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#3
|
|||
|
|||
Have you tried opening the PDF from within Word. It might give you a more usable document.
|
#4
|
|||
|
|||
Have you thought about using the Range.ConvertToTable method?
If the pdf saves to word in a fairly structured way, it may be easiest to use this built in tool. It may have some errors, but that is almost inevitable when trying to access data from a pdf. |
#5
|
|||
|
|||
@macropod -
I am trying to add speed to the code I posted. Preferably with some state management ... but I'm thinking that's not a thing in Word VBA the way it is in Excel VBA. My end game is turning this FROM.png into this AFTER.png Because Exporting a PDF to Word mangles the tables and I want to analyze the data. Exporting direct to Excel is worse. My code uses tabs, tabstop position info, column widths, alignment info, etc. to determine where text is on the page. Once that's done, I paste it to Excel then move all the text into columns according to where Word put them @slaycock - I didn't think of that - thanks - unfortunately, the file came through with different, less workable problems? Some rows were moved to the beginning of their respective pages at random. I think this just straight up destroys the association I need @d4okeefe - I didn't know about .ConvertToTable, so I played around with it - unfortunately, no joy. I mean, this is the kind of thing Acrobat Pro export spits out: Data1 tab Data4 Data1 | Data2 | Data3 | Data4 The single tab there uses a tabstop to visually move Data4 into the correct spot visually - but it isn't actually in column 4 So ConvertToTable just moved it to column 2, etc. |
#6
|
||||
|
||||
Your process would be much easier to revise the code for if we had access to the Word file the conversion produces. Can you attach the actual document to a post with some representative data (delete anything sensitive)? You do this via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#7
|
|||
|
|||
|
#8
|
||||
|
||||
In my experience, Word itself often does a better job of converting a PDF especially when there are tables involved. See if that streamlines the process and results in a more useable Word version than that created by Acrobat.
See https://www.techrepublic.com/article...-in-word-2013/ to get the import happening.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#9
|
||||
|
||||
Zars01: With the document you attached, I note that first table contains more columns than do the other tables. Are those extra columns of any consequence? Similarly, are there any rows you don't need (so they can be deleted instead of being tidied up unnecessarily)?
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
|||
|
|||
The code is slow, I think, because you set a range object for every table cell in the document.
Code:
Set rng = tbl.Cell(xrow, xcol).Range Code:
Sub test_for_tab_inside_table() Dim r As Range Set r = ActiveDocument.Content With r.Find .ClearFormatting .Text = "^t" .Execute Do While .Found If r.Information(wdWithInTable) Then 'Do the work End If .Execute Loop End With End Sub Last edited by d4okeefe; 06-20-2018 at 01:41 PM. |
#11
|
||||
|
||||
Assuming the extra columns on the first page of your attachment are superfluous, the following will reformat the table content quite quickly:
Code:
Sub Demo() Application.ScreenUpdating = False Dim t As Long, r As Long, c As Long, i As Long, Rng As Range, StrTmp As String With ActiveDocument For t = 1 To .Tables.Count With .Tables(t) .AllowAutoFit = False With .Range.Font .Name = "Arial" .Size = 7 End With .Rows.HeightRule = wdRowHeightExactly .Rows.Height = 10 With .Range.ParagraphFormat .SpaceBefore = 0 .SpaceAfter = 0 .LineSpacingRule = wdLineSpaceSingle End With For r = 1 To .Rows.Count With .Rows(r) If .Cells.Count > 5 Then Set Rng = .Range Rng.Start = .Cells(6).Range.Start Rng.Cells.Delete End If Do While Split(.Cells(.Cells.Count).Range.Text, vbCr)(0) = "" .Cells(.Cells.Count).Delete Loop i = .Cells.Count If i < 5 Then Set Rng = .Range With .Range.Tables(1) If r < .Rows.Count Then .Split .Rows(r + 1) .Split .Rows(r) .Rows.Add End With Select Case i Case 1 StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0) .Cells(1).Range.Text = Split(StrTmp, vbTab)(0) .Cells(5).Range.Text = Split(StrTmp, vbTab)(2) Case 2 StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0) If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0) StrTmp = Split(Rng.Tables(2).Cell(1, 2).Range.Text, vbCr)(0) If InStr(StrTmp, vbTab) > 0 Then .Cells(4).Range.Text = Split(StrTmp, vbTab)(UBound(Split(StrTmp, vbTab)) - 1) End If .Cells(5).Range.Text = Split(StrTmp, vbTab)(UBound(Split(StrTmp, vbTab))) Case 3 StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0) If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0) Case 4 StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0) If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0) StrTmp = Split(Rng.Tables(2).Cell(1, 2).Range.Text, vbCr)(0) If StrTmp <> "" Then .Cells(2).Range.Text = Split(StrTmp, vbTab)(0) StrTmp = Split(Rng.Tables(2).Cell(1, 3).Range.Text, vbCr)(0) If StrTmp <> "" Then .Cells(4).Range.Text = Split(StrTmp, vbTab)(0) StrTmp = Split(Rng.Tables(2).Cell(1, 4).Range.Text, vbCr)(0) If StrTmp <> "" Then .Cells(5).Range.Text = Split(StrTmp, vbTab)(0) End Select .Range.Tables(1).Rows(r).Range.Font.Bold = Not IsNumeric(Split(.Cells(.Cells.Count).Range.Text, vbCr)(0)) With Rng .End = .Tables(2).Range.End + 1 .Start = .Tables(2).Range.Start - 1 .Delete End With End If End With Next End With DoEvents Next End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
Tags |
state, tabstops |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to Prevent auto period after two-letter state abbreviation | gogreen | Word | 12 | 05-19-2018 08:36 PM |
Cable management | adminstefan | Visio | 1 | 02-13-2017 08:48 AM |
User Selectable Buttons That Express State | Andrew H | Word | 1 | 11-08-2012 07:36 PM |
Resetting to Default State not working | carlgrossman | Word | 0 | 08-02-2008 01:31 AM |
Please help with Sum formula to add totals by State! asap | dutch4fire23 | Excel | 0 | 07-28-2006 12:41 PM |