Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 06-18-2018, 05:54 PM
Zars01 Zars01 is offline State management? Windows 10 State management? Office 2013
Novice
State management?
 
Join Date: Jun 2018
Posts: 3
Zars01 is on a distinguished road
Default State management?

Hi! I'm new to the forum
Sort of an advanced question, I think?

THE FOLLOWING WORKS
But it's slow. It takes about a half hour to work through an 84 page document
Maybe my biggest question is state management
In Excel, I can turn off printer communication, formula calculation, etc. to add a ton of speed.
None of that appears to be a thing for Word VBA - except for maybe screen updating.
(And that doesn't appear to work - I can still watch the macro's progress before Word locks up?)

My second (less important) question is is there a simpler approach?
(A lot of my first attempts included a few things that wouldn't work for manipulating a varied-column-count table.)

I have a lot of experience with Excel VBA - but hardly any with Microsoft Word VBA.
So maybe I'm taking the wrong approach? Thinking in tables instead of strings?
Here's the situation:
I'm using Acrobat Pro to export a PDF into Word ...
so I can paste it into Excel and read/manipulate the numbers.

The exporter (or the PDF itself?) has a few flaws:
It doesn't create one, uniform table (same number of columns, same width, all rows) ... it sort of barfs it all into row after row - with diverse (apparently arbitrary?) column counts and widths.
Worse, the exporter uses tab stops and alignment to format the table from there - it looks great (like the PDF) but cannot be read as data.

(Exporting direct to Excel is worse - it just skips the tab stops and alignment part altogether. Without that formatting, I have even fewer clues to tell what information corresponds to what column, etc..)

So here's what I have so far
The code below uses delimiters to:
Replace each tab character with the associated tab stop's apparent page position
Add the apparent right margin's page position to right-justified cells

What it creates, I can paste into Excel and use another (much faster) macro to redistribute data across columns - i.e. 1.42 inches to 1.95 inches = destination column 3, etc.

Code:
Option Explicit

Private Sub wordMacro_start()
    Dim xcount As Long
    
    Dim xcol As Long, xrow As Long, xlng As Long
    Dim xstr As String
    Dim xdbl As Double
    
    'Tracks macro's speed
    Dim startTime As Date
    startTime = Now()
    
    Application.ScreenUpdating = False
    
    Dim tbl As Table
    Dim tbs As TabStop
    Dim rng As Range
    'Loops through every table
    For Each tbl In ActiveDocument.Tables
        xcount = xcount + 1
        Debug.Print xcount 'So I can read progress
        DoEvents 'So I have windows to pause execution
        
        'Loops through every column
        xcol = 1
        Do
            If xcol > tbl.Columns.Count Then Exit Do
            'Loops through every row
            xrow = 1
            Do
                If xrow > tbl.Rows.Count Then Exit Do
                
                'Skips when cell address doesn't exist within the table
                Set rng = Nothing
                On Error Resume Next
                Set rng = tbl.Cell(xrow, xcol).Range
                On Error GoTo 0
                If Not rng Is Nothing Then
                    'Tab stops and tab characters
                        'Loops through every tab stop within the cell
                        For Each tbs In rng.Paragraphs.TabStops
                            
                            'Gets starting position of next tab character
                            xlng = InStr(1, rng.Text, vbTab)
                            If xlng = 0 Then Exit For
                            
                            'Gets position of the tab stop + previous cell widths in the row
                            'Equals page position, sorta
                            xstr = rng.Text
                            xstr = Left(xstr, xlng - 1) & _
                                "{" & (pullRecursive_width(tbl, xrow, xcol) - _
                                PointsToInches(tbl.Cell(xrow, xcol).Width)) + _
                                PointsToInches(tbs.Position) _
                                 & "}" & _
                                Right(xstr, Len(xstr) - xlng)
                            'Replaces the tab character with the position number
                            rng.Text = xstr
                        Next tbs
                    
                    'Alignment
                        'Fortunately, cells with right alignments don't generate with tab characters
                        'If alignment is right
                        If tbl.Cell(xrow, xcol).Range.Paragraphs.Alignment = wdAlignParagraphRight Then
                            'If cell didn't generate as 'empty'
                            If tbl.Cell(xrow, xcol).Range.Text <> Chr(13) & _
                                Chr(7) Then
                                
                                'Gets this + all previous cell widths in the row
                                'Equals page position of the right margin of the cell
                                xdbl = pullRecursive_width(tbl, xrow, xcol)
                                xstr = "{" & _
                                    xdbl & "}" & _
                                    rng.Text
                                'Adds the position to the cell width
                                rng.Text = xstr
                            End If
                        End If
                End If
                
                xrow = xrow + 1
            Loop
            xcol = xcol + 1
        Loop
    Next tbl
    
    'Takes forever, sometimes start it and leave - so I have the macro save on finish
    ActiveDocument.Save
    
    'Describes macro's speed
    MsgBox "(" & Format(Now() - startTime, "Nn \mi\nute\s, Ss \se\co\n\d\s") & ")"
    
    Application.ScreenUpdating = True
End Sub

Private Function pullRecursive_width(sent_table As Table, _
    xrow As Long, _
    xcol As Long) As Double
    
    'Starts with current cell's width
    pullRecursive_width = PointsToInches(sent_table.Cell(xrow, xcol).Width)
    'If current cell is first cell, returns
    If xcol <= 1 Then Exit Function
    
    'If current cell isn't first cell, adds previous cell's width
    pullRecursive_width = pullRecursive_width + _
        pullRecursive_width(sent_table, xrow, xcol - 1)
End Function

Reply With Quote
  #2  
Old 06-18-2018, 07:35 PM
macropod's Avatar
macropod macropod is offline State management? Windows 7 64bit State management? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Perhaps you could explain what it is you're trying to achieve? You've posted over 100 lines of code...
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #3  
Old 06-18-2018, 08:32 PM
slaycock slaycock is offline State management? Windows 7 64bit State management? Office 2016
Expert
 
Join Date: Sep 2013
Posts: 256
slaycock is on a distinguished road
Default

Have you tried opening the PDF from within Word. It might give you a more usable document.
Reply With Quote
  #4  
Old 06-19-2018, 08:05 AM
d4okeefe d4okeefe is offline State management? Windows 10 State management? Office 2016
Advanced Beginner
 
Join Date: Apr 2013
Posts: 77
d4okeefe is on a distinguished road
Default

Have you thought about using the Range.ConvertToTable method?

If the pdf saves to word in a fairly structured way, it may be easiest to use this built in tool. It may have some errors, but that is almost inevitable when trying to access data from a pdf.
Reply With Quote
  #5  
Old 06-19-2018, 09:30 PM
Zars01 Zars01 is offline State management? Windows 10 State management? Office 2013
Novice
State management?
 
Join Date: Jun 2018
Posts: 3
Zars01 is on a distinguished road
Default

@macropod -

I am trying to add speed to the code I posted. Preferably with some state management ... but I'm thinking that's not a thing in Word VBA the way it is in Excel VBA.

My end game is turning this
FROM.png

into this
AFTER.png

Because
Exporting a PDF to Word mangles the tables and I want to analyze the data.
Exporting direct to Excel is worse.

My code uses tabs, tabstop position info, column widths, alignment info, etc. to determine where text is on the page.
Once that's done, I paste it to Excel then move all the text into columns according to where Word put them

@slaycock - I didn't think of that - thanks - unfortunately, the file came through with different, less workable problems? Some rows were moved to the beginning of their respective pages at random. I think this just straight up destroys the association I need

@d4okeefe - I didn't know about .ConvertToTable, so I played around with it - unfortunately, no joy.

I mean, this is the kind of thing Acrobat Pro export spits out:
Data1 tab Data4
Data1 | Data2 | Data3 | Data4

The single tab there uses a tabstop to visually move Data4 into the correct spot visually - but it isn't actually in column 4
So ConvertToTable just moved it to column 2, etc.
Reply With Quote
  #6  
Old 06-19-2018, 09:34 PM
macropod's Avatar
macropod macropod is offline State management? Windows 7 64bit State management? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Your process would be much easier to revise the code for if we had access to the Word file the conversion produces. Can you attach the actual document to a post with some representative data (delete anything sensitive)? You do this via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #7  
Old 06-19-2018, 10:04 PM
Zars01 Zars01 is offline State management? Windows 10 State management? Office 2013
Novice
State management?
 
Join Date: Jun 2018
Posts: 3
Zars01 is on a distinguished road
Default

@macropod - ok, here's a sample
The actual document is 84 pages

No Data Payroll.docx
Reply With Quote
  #8  
Old 06-19-2018, 10:08 PM
Guessed's Avatar
Guessed Guessed is offline State management? Windows 10 State management? Office 2016
Expert
 
Join Date: Mar 2010
Location: Canberra/Melbourne Australia
Posts: 3,975
Guessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant future
Default

In my experience, Word itself often does a better job of converting a PDF especially when there are tables involved. See if that streamlines the process and results in a more useable Word version than that created by Acrobat.

See https://www.techrepublic.com/article...-in-word-2013/ to get the import happening.
__________________
Andrew Lockton
Chrysalis Design, Melbourne Australia
Reply With Quote
  #9  
Old 06-19-2018, 11:11 PM
macropod's Avatar
macropod macropod is offline State management? Windows 7 64bit State management? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Zars01: With the document you attached, I note that first table contains more columns than do the other tables. Are those extra columns of any consequence? Similarly, are there any rows you don't need (so they can be deleted instead of being tidied up unnecessarily)?
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #10  
Old 06-20-2018, 08:33 AM
d4okeefe d4okeefe is offline State management? Windows 10 State management? Office 2016
Advanced Beginner
 
Join Date: Apr 2013
Posts: 77
d4okeefe is on a distinguished road
Default

The code is slow, I think, because you set a range object for every table cell in the document.
Code:
Set rng = tbl.Cell(xrow, xcol).Range
Rather than testing each cell for a tab, maybe you could use Range.Find to identify all tabs in the document, then test if that tab is inside of a table.
Code:
Sub test_for_tab_inside_table()
    Dim r As Range
    Set r = ActiveDocument.Content
    With r.Find
        .ClearFormatting
        .Text = "^t"
        .Execute
        Do While .Found
            If r.Information(wdWithInTable) Then
                'Do the work
            End If
            .Execute
        Loop
    End With
End Sub
That said, I get the sense that you care more about the data in the document than the document's appearance. (Otherwise, the PDF would work fine for you.) You may want to look into setting up a database, and then extract data from the Word file into that. Or maybe the PDF was created from a database that already exists?

Last edited by d4okeefe; 06-20-2018 at 01:41 PM.
Reply With Quote
  #11  
Old 06-20-2018, 05:23 PM
macropod's Avatar
macropod macropod is offline State management? Windows 7 64bit State management? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Assuming the extra columns on the first page of your attachment are superfluous, the following will reformat the table content quite quickly:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim t As Long, r As Long, c As Long, i As Long, Rng As Range, StrTmp As String
With ActiveDocument
  For t = 1 To .Tables.Count
    With .Tables(t)
      .AllowAutoFit = False
      With .Range.Font
        .Name = "Arial"
        .Size = 7
      End With
      .Rows.HeightRule = wdRowHeightExactly
      .Rows.Height = 10
      With .Range.ParagraphFormat
        .SpaceBefore = 0
        .SpaceAfter = 0
        .LineSpacingRule = wdLineSpaceSingle
      End With
      For r = 1 To .Rows.Count
        With .Rows(r)
          If .Cells.Count > 5 Then
            Set Rng = .Range
            Rng.Start = .Cells(6).Range.Start
            Rng.Cells.Delete
          End If
          Do While Split(.Cells(.Cells.Count).Range.Text, vbCr)(0) = ""
            .Cells(.Cells.Count).Delete
          Loop
          i = .Cells.Count
          If i < 5 Then
            Set Rng = .Range
            With .Range.Tables(1)
              If r < .Rows.Count Then .Split .Rows(r + 1)
              .Split .Rows(r)
              .Rows.Add
            End With
            Select Case i
              Case 1
                StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
                .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
                .Cells(5).Range.Text = Split(StrTmp, vbTab)(2)
              Case 2
                StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
                If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
                StrTmp = Split(Rng.Tables(2).Cell(1, 2).Range.Text, vbCr)(0)
                If InStr(StrTmp, vbTab) > 0 Then
                  .Cells(4).Range.Text = Split(StrTmp, vbTab)(UBound(Split(StrTmp, vbTab)) - 1)
                End If
                .Cells(5).Range.Text = Split(StrTmp, vbTab)(UBound(Split(StrTmp, vbTab)))
              Case 3
                StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
                If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
              Case 4
                StrTmp = Split(Rng.Tables(2).Cell(1, 1).Range.Text, vbCr)(0)
                If StrTmp <> "" Then .Cells(1).Range.Text = Split(StrTmp, vbTab)(0)
                StrTmp = Split(Rng.Tables(2).Cell(1, 2).Range.Text, vbCr)(0)
                If StrTmp <> "" Then .Cells(2).Range.Text = Split(StrTmp, vbTab)(0)
                StrTmp = Split(Rng.Tables(2).Cell(1, 3).Range.Text, vbCr)(0)
                If StrTmp <> "" Then .Cells(4).Range.Text = Split(StrTmp, vbTab)(0)
                StrTmp = Split(Rng.Tables(2).Cell(1, 4).Range.Text, vbCr)(0)
                If StrTmp <> "" Then .Cells(5).Range.Text = Split(StrTmp, vbTab)(0)
            End Select
            .Range.Tables(1).Rows(r).Range.Font.Bold = Not IsNumeric(Split(.Cells(.Cells.Count).Range.Text, vbCr)(0))
            With Rng
              .End = .Tables(2).Range.End + 1
              .Start = .Tables(2).Range.Start - 1
              .Delete
            End With
          End If
        End With
      Next
    End With
    DoEvents
  Next
End With
Application.ScreenUpdating = True
End Sub
On my laptop, the above takes ~1 second/table(page) - less if the character/paragraph formatting code is omitted.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
Reply

Tags
state, tabstops



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to Prevent auto period after two-letter state abbreviation gogreen Word 12 05-19-2018 08:36 PM
Cable management adminstefan Visio 1 02-13-2017 08:48 AM
State management? User Selectable Buttons That Express State Andrew H Word 1 11-08-2012 07:36 PM
Resetting to Default State not working carlgrossman Word 0 08-02-2008 01:31 AM
Please help with Sum formula to add totals by State! asap dutch4fire23 Excel 0 07-28-2006 12:41 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 10:53 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft