![]() |
|
#1
|
|||
|
|||
|
Hi
I have this source text (*.rtf) containing information in (sort of) tab-seperated columns. I want to copy it into a excel worksheet to do further analyzing. My problem is, that the tabulator signs 'moving' the text into columns are of different length in the .rtf/.doc file, but exporting it excel treats them the same: In the rtf file the signs look like this: -------------> ------------------------------------------> Reading them as separators excel treats them alike. I guess what I'm looking for is an easy way to (maybe w/search&replace) make the 'long' tabs into several tabs: -------------> ------------->------------->-------------> Real examples attached It must be a 'automatic' solution, since we're talking about a rtf file of 1000+ pages.. |
|
#2
|
||||
|
||||
|
You document has been formatted with different numbers of tab-stops for the different paragraphs, depending on what they contain. You can reformat the document with consistent tabbing with the following macro:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, para As Paragraph, Fmt As ParagraphFormat
With ActiveDocument
.DefaultTabStop = 0
j = .Paragraphs(1).TabStops.Count
.Range.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
For i = 1 To .Paragraphs.Count
While (Len(.Paragraphs(i).Range.Text) - Len(Replace(.Paragraphs(i).Range.Text, vbTab, ""))) < j
.Paragraphs(i).Range.InsertBefore vbTab
Wend
Next
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#3
|
|||
|
|||
|
Thank you sooooo much!!! This works (almost) exactly how I imagined. The only problem is:
When there is nothing in the last "column" in the source text, the macro moves the content from the second to last "column" to this one instead, or, moves everything a column too much to the right (maybe that what it does). New example (1th page source text, 2nd page after macro) |
|
#4
|
||||
|
||||
|
Try the following code revision:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, k As Long, para As Paragraph, Fmt As ParagraphFormat, HPos As Single
With ActiveDocument
.DefaultTabStop = 0
j = .Paragraphs(1).TabStops.Count
For i = 1 To .Paragraphs.Count
For k = 1 To j
If .Paragraphs(i).Range.Characters.Last.Information(wdHorizontalPositionRelativeToTextBoundary) < _
.Paragraphs(1).TabStops(k).Position Then .Paragraphs(i).Range.Characters.Last.InsertBefore vbTab
Next
While (Len(.Paragraphs(i).Range.Text) - Len(Replace(.Paragraphs(i).Range.Text, vbTab, ""))) < j
.Paragraphs(i).Range.InsertBefore vbTab
Wend
Next
.Range.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#5
|
|||
|
|||
|
First of all thank you so much!!
Is it possible to add some code to the macro, making it possible to follow the progress - without delaying it too much? What I mean is, since the document is VERY long, literally thousands of pages, the process always takes a while (not to sat a very long time). It would be nice to be able to follow the process. Not by adding Code:
Application.ScreenUpdating = True Last edited by Flabbergaster; 10-29-2012 at 09:07 AM. Reason: misspelling |
|
#6
|
||||
|
||||
|
Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, k As Long, l As Long
Dim para As Paragraph, Fmt As ParagraphFormat, HPos As Single
With ActiveDocument
.DefaultTabStop = 0
j = .Paragraphs(1).TabStops.Count
l = .Paragraphs.Count
For i = 1 To l
Application.StatusBar = "Processing paragraph " & i & " of " & l
For k = 1 To j
If .Paragraphs(i).Range.Characters.Last.Information(wdHorizontalPositionRelativeToTextBoundary) < _
.Paragraphs(1).TabStops(k).Position Then .Paragraphs(i).Range.Characters.Last.InsertBefore vbTab
Next
While (Len(.Paragraphs(i).Range.Text) - Len(Replace(.Paragraphs(i).Range.Text, vbTab, ""))) < j
.Paragraphs(i).Range.InsertBefore vbTab
Wend
If i Mod 100 = 0 Then DoEvents
Next
.Range.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
End With
Application.StatusBar = "Done!!"
Application.ScreenUpdating = True
End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#7
|
|||
|
|||
|
Hi again
You're a genius.... Thanks. Is there any tweaks one could add to make it run faster? I see now, that there is 275103 paragraphs and it takes appr. 1500 pr. 10 min now? |
|
#8
|
||||
|
||||
|
I've tweaked the code for about as much performance as can be eked out of it:
Code:
Sub Demo()
Application.ScreenUpdating = True
Dim i As Long, j As Long, k As Long, l As Long, m As Long
Dim pRng As Paragraph, HPos As Single, StrStops As String
With ActiveDocument
.DefaultTabStop = 0
j = .Paragraphs(1).TabStops.Count
For k = 1 To j
StrStops = StrStops & "," & .Paragraphs(1).TabStops(k).Position
Next
l = .Paragraphs.Count
With .Range.Duplicate
For Each pRng In .Paragraphs
i = i + 1
Application.StatusBar = "Processing paragraph " & i & " of " & l
With pRng.Range
m = 0
For k = 1 To j
If .Characters.Last.Information(wdHorizontalPositionRelativeToTextBoundary) < _
Split(StrStops, ",")(k) Then
m = j - k
Exit For
End If
Next
Select Case m
Case 1: .Characters.Last.InsertBefore vbTab
Case 2: .Characters.Last.InsertBefore vbTab & vbTab
Case 3: .Characters.Last.InsertBefore vbTab & vbTab & vbTab
Case 4: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab
Case 5: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab
Case 6: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
Case 7: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
Case 8: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
Case 9: .Characters.Last.InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
End Select
m = Len(.Text) - Len(Replace(.Text, vbTab, ""))
Select Case m
Case j - 1: .InsertBefore vbTab
Case j - 2: .InsertBefore vbTab & vbTab
Case j - 3: .InsertBefore vbTab & vbTab & vbTab
Case j - 4: .InsertBefore vbTab & vbTab & vbTab & vbTab
Case j - 5: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab
Case j - 6: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
Case j - 7: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
Case j - 8: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
Case j - 9: .InsertBefore vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab & vbTab
End Select
If i Mod 10 = 0 Then
DoEvents
ActiveDocument.UndoClear
End If
End With
Next
.ParagraphFormat = .Paragraphs(1).Range.ParagraphFormat
End With
End With
Application.StatusBar = "Done!!"
Application.ScreenUpdating = True
End Sub
Ultimately, you'd do far better getting whatever's generating the source file to put all the required tabs into the file at the outset.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
#9
|
|||
|
|||
|
Wow, thanks... Already much faster
Which part do I change to a) make the updating less frequent ( is it i = i + 1 -> i = i +, say, 10 or 25?? b) remove the progress report completely |
|
#10
|
||||
|
||||
|
To delete the progress report, comment out or delete the line:
Application.StatusBar = "Processing paragraph " & i & " of " & l To give, say, a % progress report requires adding more code to calculate the %. That calculation has its own overhead which, compared, updating the screen as per the existing progress report, takes only marginally less time. An alternative would be to relocate the existing status code to just above/below the 'DoEvents' line. That'll update the report only every 10th paragraph.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
| Tags |
| columns, search/replace, tabulator |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Advanced search and replace for bullet points | TishyMouse | Word | 5 | 02-17-2012 06:32 AM |
| Any wildcards search and replace in Powerpoint 2010? | tinfanide | PowerPoint | 0 | 09-10-2011 02:17 AM |
WORD: Rtf and search-replace (regexp/fonts)
|
seteshpl | Word | 1 | 09-06-2011 01:35 AM |
| Search Replace Copy | dblack7211 | Word | 0 | 05-05-2010 01:19 PM |
| Search and Replace - Clear Search box | JostClan | Word | 1 | 05-04-2010 08:46 PM |