Microsoft Office Forums

Microsoft Office Forums (https://www.msofficeforums.com/)
-   Word VBA (https://www.msofficeforums.com/word-vba/)
-   -   Search and replace tabulators of different length...? (https://www.msofficeforums.com/word-vba/14782-search-replace-tabulators-different-length.html)

Flabbergaster 10-19-2012 12:55 AM

Search and replace tabulators of different length...?
 
2 Attachment(s)
Hi
I have this source text (*.rtf) containing information in (sort of) tab-seperated columns. I want to copy it into a excel worksheet to do further analyzing.
My problem is, that the tabulator signs 'moving' the text into columns are of different length in the .rtf/.doc file, but exporting it excel treats them the same:

In the rtf file the signs look like this:

------------->

------------------------------------------>

Reading them as separators excel treats them alike.

I guess what I'm looking for is an easy way to (maybe w/search&replace) make the 'long' tabs into several tabs:

------------->
------------->------------->------------->

Real examples attached

It must be a 'automatic' solution, since we're talking about a rtf file of 1000+ pages..

macropod 10-21-2012 03:42 PM

You document has been formatted with different numbers of tab-stops for the different paragraphs, depending on what they contain. You can reformat the document with consistent tabbing with the following macro:
Code:

Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, para As Paragraph, Fmt As ParagraphFormat
With ActiveDocument
  .DefaultTabStop = 0
  j = .Paragraphs(1).TabStops.Count
  .Range.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
  For i = 1 To .Paragraphs.Count
    While (Len(.Paragraphs(i).Range.Text) - Len(Replace(.Paragraphs(i).Range.Text, vbTab, ""))) < j
      .Paragraphs(i).Range.InsertBefore vbTab
    Wend
  Next
End With
Application.ScreenUpdating = True
End Sub

Note that, as your first paragraph has an unused tabstop, but some other paragraphs use that same tabstop, the reformatting may leave you with an unused first column in Excel.

Flabbergaster 10-22-2012 12:35 AM

1 Attachment(s)
Thank you sooooo much!!! This works (almost) exactly how I imagined. The only problem is:

When there is nothing in the last "column" in the source text, the macro moves the content from the second to last "column" to this one instead, or, moves everything a column too much to the right (maybe that what it does).


New example (1th page source text, 2nd page after macro)

macropod 10-22-2012 12:53 AM

Try the following code revision:
Code:

Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, k As Long, para As Paragraph, Fmt As ParagraphFormat, HPos As Single
With ActiveDocument
  .DefaultTabStop = 0
  j = .Paragraphs(1).TabStops.Count
  For i = 1 To .Paragraphs.Count
    For k = 1 To j
      If .Paragraphs(i).Range.Characters.Last.Information(wdHorizontalPositionRelativeToTextBoundary) < _
        .Paragraphs(1).TabStops(k).Position Then .Paragraphs(i).Range.Characters.Last.InsertBefore vbTab
    Next
    While (Len(.Paragraphs(i).Range.Text) - Len(Replace(.Paragraphs(i).Range.Text, vbTab, ""))) < j
      .Paragraphs(i).Range.InsertBefore vbTab
    Wend
  Next
  .Range.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
End With
Application.ScreenUpdating = True
End Sub

Inevitably, you may still end up with some mis-aligned data if there are records with columns missing between the first and last with data (eg columns 2 & 5 have data, but columns 1, 3, 4 & 6 are empty). A post-processing visual inspection will be in order.

Flabbergaster 10-29-2012 09:06 AM

First of all thank you so much!!

Is it possible to add some code to the macro, making it possible to follow the progress - without delaying it too much? What I mean is, since the document is VERY long, literally thousands of pages, the process always takes a while (not to sat a very long time). It would be nice to be able to follow the process. Not by adding

Code:

Application.ScreenUpdating = True
which would slow it down too much, I guess. But by some kind of page x of xxx, or so, indicating the progression..

macropod 10-29-2012 02:16 PM

Try:
Code:

Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, k As Long, l As Long
Dim para As Paragraph, Fmt As ParagraphFormat, HPos As Single
With ActiveDocument
  .DefaultTabStop = 0
  j = .Paragraphs(1).TabStops.Count
  l = .Paragraphs.Count
  For i = 1 To l
    Application.StatusBar = "Processing paragraph " & i & " of " & l
    For k = 1 To j
      If .Paragraphs(i).Range.Characters.Last.Information(wdHorizontalPositionRelativeToTextBoundary) < _
        .Paragraphs(1).TabStops(k).Position Then .Paragraphs(i).Range.Characters.Last.InsertBefore vbTab
    Next
    While (Len(.Paragraphs(i).Range.Text) - Len(Replace(.Paragraphs(i).Range.Text, vbTab, ""))) < j
      .Paragraphs(i).Range.InsertBefore vbTab
    Wend
    If i Mod 100 = 0 Then DoEvents
  Next
  .Range.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
End With
Application.StatusBar = "Done!!"
Application.ScreenUpdating = True
End Sub


Flabbergaster 10-30-2012 12:41 AM

Hi again

You're a genius.... Thanks.

Is there any tweaks one could add to make it run faster? I see now, that there is 275103 paragraphs and it takes appr. 1500 pr. 10 min now?

macropod 10-30-2012 04:38 AM

I've tweaked the code for about as much performance as can be eked out of it:
Code:

Sub Demo()
Application.ScreenUpdating = True
Dim i As Long, j As Long, k As Long, l As Long, m As Long
Dim pRng As Paragraph, HPos As Single, StrStops As String
With ActiveDocument
  .DefaultTabStop = 0
  j = .Paragraphs(1).TabStops.Count
  For k = 1 To j
    StrStops = StrStops & "," & .Paragraphs(1).TabStops(k).Position
  Next
  l = .Paragraphs.Count
  With .Range.Duplicate
    For Each pRng In .Paragraphs
      i = i + 1
      Application.StatusBar = "Processing paragraph " & i & " of " & l
      With pRng.Range
        m = 0
        For k = 1 To j
          If .Characters.Last.Information(wdHorizontalPositionRelativeToTextBoundary) < _
            Split(StrStops, ",")(k) Then
            m = j - k
            Exit For
          End If
        Next
        Select Case m
          Case 1: .Characters.Last.InsertBefore vbTab
          Case 2: .Characters.Last.InsertBefore vbTab & vbTab
          Case 3: .Characters.Last.InsertBefore vbTab & vbTab & vbTab
          Case 4: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab
          Case 5: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab
          Case 6: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
          Case 7: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
          Case 8: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
          Case 9: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
        End Select
        m = Len(.Text) - Len(Replace(.Text, vbTab, ""))
        Select Case m
          Case j - 1: .InsertBefore vbTab
          Case j - 2: .InsertBefore vbTab & vbTab
          Case j - 3: .InsertBefore vbTab & vbTab & vbTab
          Case j - 4: .InsertBefore vbTab & vbTab & vbTab & vbTab
          Case j - 5: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab
          Case j - 6: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
          Case j - 7: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
          Case j - 8: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
          Case j - 9: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
        End Select
        If i Mod 10 = 0 Then
          DoEvents
          ActiveDocument.UndoClear
        End If
      End With
    Next
    .ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
  End With
End With
Application.StatusBar = "Done!!"
Application.ScreenUpdating = True
End Sub

We could also take out the progress report ... or make it update less often (eg in whole % iterations). The latter wouldn't be much faster, though and would update fairly infrequently.


Ultimately, you'd do far better getting whatever's generating the source file to put all the required tabs into the file at the outset.

Flabbergaster 10-30-2012 05:17 AM

Wow, thanks... Already much faster

Which part do I change to
a) make the updating less frequent ( is it i = i + 1 -> i = i +, say, 10 or 25??
b) remove the progress report completely

macropod 10-30-2012 05:40 AM

To delete the progress report, comment out or delete the line:
Application.StatusBar = "Processing paragraph " & i & " of " & l

To give, say, a % progress report requires adding more code to calculate the %. That calculation has its own overhead which, compared, updating the screen as per the existing progress report, takes only marginally less time. An alternative would be to relocate the existing status code to just above/below the 'DoEvents' line. That'll update the report only every 10th paragraph.


All times are GMT -7. The time now is 06:43 PM.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft