![]() |
#1
|
|||
|
|||
![]()
A long text, 45 to 50 thousand words. But this text has many insertions, so several sentences are repeated, several paragraphs, with the same content are repeated throughout the text. Is there a way, a macro, to do a cleanup, looking for comparisons between paragraphs throughout the text, and, or perhaps phrases, or sentences, expressions, with more than 5 or six words that are repeated randomly throughout the text, not in the same paragraph, but throughout the text. Sorry, I don't make myself clear. It would be something like this, in paragraph 07 I have ""For Khandahar!", replied the inhabitants, raising their improvised weapons and facing the enemy with a bravery that surprised even themselves." and I know that up ahead, but I don't know in which paragraph there is something similar. How can I create a macro to help me clean up this text. I appreciate all the help.
|
#2
|
||||
|
||||
![]()
So which instance would you want to keep? Do be aware, too, that VBA has no idea what a grammatical sentence is. For example, consider the following:
Mr. Smith spent $1,234.56 at Dr. John's Grocery Store, to buy: 10.25kg of potatoes; 10kg of avocados; and 15.1kg of Mrs. Green's Mt. Pleasant macadamia nuts. For you and me, that would count as one sentence; for VBA it counts as 5 sentences.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#3
|
|||
|
|||
![]()
I understand, so the ideal would be a macro where I could find throughout the text, a certain number of words that are repeated in text strings. How do I build a macro that does this search and indicates "on page, line such and such, we already have this same sentence". Then it asks, can I delete this sentence? Can you help me with this?
|
#4
|
||||
|
||||
![]()
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#5
|
|||
|
|||
![]()
This is an adaptations of Paul's code that he provided in his linked answer. You can change the value I have "6" words to whatever number you like.
Code:
Sub FindDuplicateStrings() Application.ScreenUpdating = False Dim i As Long, RngSrc As Range, RngFnd As Range Dim oRng As Range Const Clr As Long = wdBrightGreen Dim eTime As Single eTime = Timer Options.DefaultHighlightColorIndex = Clr Dim oCol As New Collection With ActiveDocument With .Range.Find .ClearFormatting .Replacement.ClearFormatting .Forward = True .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False .Execute End With Set RngSrc = .Range RngSrc.End = .Words(6).End 'Change "6" to a number to suit. MsgBox RngSrc.Text Do i = i + 1 If i Mod 100 = 0 Then DoEvents On Error Resume Next 'If RngSrc.HighlightColorIndex <> Clr Then Set RngFnd = .Range(RngSrc.End, .Range.End) If Len(RngSrc.Text) < 256 Then With RngFnd.Find .Text = RngSrc.Text .Replacement.Text = "^&" .Replacement.Highlight = True .Wrap = wdFindStop .Execute Replace:=wdReplaceAll End With Else With RngFnd With .Find .Text = Left(RngSrc.Text, 255) .Wrap = wdFindStop .Execute End With Do While .Find.Found If RngSrc.Text = .Duplicate.Text Then RngSrc.HighlightColorIndex = Clr .Duplicate.HighlightColorIndex = Clr End If .Collapse wdCollapseEnd .Find.Execute Loop End With End If 'End If RngSrc.MoveStart wdWord, 1 RngSrc.MoveEnd wdWord, 1 Loop Until RngSrc.End = .Range.End End With ' Report time taken. Elapsed time calculation allows for execution to extend past midnight. MsgBox "Finished. Elapsed time: " & (Timer - eTime + 86400) Mod 86400 & " seconds." Application.ScreenUpdating = True End Sub |
![]() |
Tags |
macro, repetead, sentences |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
![]() |
MauiTruss | Word VBA | 7 | 10-03-2018 03:46 PM |
![]() |
tjf816 | Word VBA | 10 | 03-29-2017 05:42 PM |
![]() |
Pompidou | Word VBA | 7 | 11-14-2016 10:27 AM |
Macro to highlight repeated words in word file and extract into excel file | aabri | Word VBA | 1 | 06-14-2015 07:20 AM |
![]() |
jgarland | Word | 22 | 01-11-2012 11:19 AM |