#1
|
|||
|
|||
check for duplicates of a word within next 100 words
Good day,
Please, help me with a macro which can facilite proofreading of texts. This macro should look for dublicates of every word in all its forms within next ,let's say, 100 words or 3-4 sentences. There should be also a list of exclusions, like 'a, an, the, with, to' etc omitted from check. Found dublicates should be highlighted. Is it possible? |
#2
|
||||
|
||||
Yes, it is possible but not particularly quick. There may be more efficient ways of doing it but my attempt at it resulted in the following code. Note there wasn't any comprehensive error checking done on this.
Code:
Sub DupWordMarks() Dim aWord As String, iCounter As Long, iWords As Long, iLen As Long, iTotalWords As Long Dim aRng As Range, iEnd As Long, i As Long iLen = 20 'the size of the range searched forward iTotalWords = ActiveDocument.Words.Count For iCounter = 1 To iTotalWords - 1 Select Case Trim(ActiveDocument.Words(iCounter)) Case "you", "a", "the", "can", "for", ".", "to", "and", ",", ";" 'do nothing Case Else If iCounter + iLen < iTotalWords Then iEnd = iCounter + iLen Else iEnd = iTotalWords End If Set aRng = ActiveDocument.Range(ActiveDocument.Words(iCounter).End, ActiveDocument.Words(iEnd).End) For i = 1 To aRng.Words.Count If LCase(Trim(ActiveDocument.Words(iCounter))) = LCase(Trim(aRng.Words(i))) Then aRng.Words(i).HighlightColorIndex = wdPink Exit For End If Next i End Select 'If iCounter = 500 Then Exit Sub 'enable an artificial restraint to stop the code when testing Next iCounter End Sub
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#3
|
||||
|
||||
Perhaps:
Code:
Sub Demo() Application.ScreenUpdating = False Dim i As Long, j As Long, k As Long, Rng As Range, StrTmp As String, StrFnd As String Const StrExcl As String = "|,|.|!|:|;|<|>|(|)|{|}|{|}|a|an|the|to|from|with|": j = 100 With ActiveDocument For i = 1 To .Range.Words.Count - j If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then StrFnd = Replace(Trim(.Words(i)), vbCr, "") Set Rng = .Range(.Words(i).Start, .Words(i + j).End) With .Range(.Words(i).End, .Words(i + j).End) With .Find .ClearFormatting .Replacement.ClearFormatting .Text = StrFnd .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWholeWord = True .MatchWildcards = False .MatchAllWordForms = True .Execute End With If .Find.Found = True Then Rng.End = .Words.Last.End With Rng If .Words.Count < j Then .Words.First.HighlightColorIndex = wdBrightGreen .Words.Last.HighlightColorIndex = wdBrightGreen End If End With End If End With End If Next End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] Last edited by macropod; 01-17-2019 at 12:00 AM. Reason: Code update |
#4
|
|||
|
|||
Guessed, thank you for fast response!
Macropod, many thanks, the code works as expected except these few points: 1) If there is an inline image OR a complicated field (from a reference software) I get Run-time error '5610': The Find What text for a Find All Word Forms search can only contain alphabetic letters. I guess the 'StrFnd' value should be "flattened" to pure text and any images ignored. How to do that? 2) How to avoid highlighting of the spaces after found duplicates? 3) How to add greek/mathematic symbols to the exclusion list? 4) I would like to have a possibility to apply the macros to the selection or to the complete document if there is no selection. Is it possible too? |
#5
|
||||
|
||||
Try:
Code:
Sub Demo() Application.ScreenUpdating = False Dim i As Long, j As Long, k As Long, RngTxt As Range, RngTmp As Range, StrTmp As String, StrFnd As String Const StrExcl As String = "|,|.|!|:|;|<|>|(|)|{|}|{|}|a|an|the|to|from|with|": j = 25 With ActiveDocument For i = 1 To .Range.Words.Count - j If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then If .Words(i).InlineShapes.Count = 0 Then If .Words(i).Fields.Count = 0 Then If Asc(.Words(i).Characters.First) < 128 Then StrFnd = Replace(Trim(.Words(i)), vbCr, "") Set RngTxt = .Range(.Words(i).Start, .Words(i + j).End) With .Range(.Words(i).End, .Words(i + j).End) With .Find .ClearFormatting .Replacement.ClearFormatting .Text = StrFnd .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWholeWord = True .MatchWildcards = False .MatchAllWordForms = True .Execute End With If .Find.Found = True Then RngTxt.End = .Words.Last.End With RngTxt If .Words.Count < j Then Set RngTmp = .Words.First RngTmp.MoveEndWhile " ", -1 RngTmp.HighlightColorIndex = wdBrightGreen Set RngTmp = .Words.Last RngTmp.MoveEndWhile " ", -1 RngTmp.HighlightColorIndex = wdBrightGreen End If End With End If End With End If Else i = i + .Words(i).Fields(1).Result.Words.Count - 1 End If End If End If Next End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#6
|
|||
|
|||
macropod,
many thanks for the updated macros. But still I get the same error 5610 even on very simple texts... In the text below it hightlights "republic" which is not dublicated. The check proceeds up to the first word "states" and then I get the error... "The U.S. is a federal republic comprising 50 states and the Federal District of Columbia. It originated in the American Revolution, the successful rebellion of the colonies on the eastern coast against British rule in 1775 - 83. The original 13 states that formed the Union drew up a federal constitution in 1787, and George Washington was elected the first president in 1789. In the 19th century the territory of the U.S. was extended across the continent through the westward spread of pioneers and settlers and acquisitions such as that of Texas and California from Mexico in the 1840s. After a long period of isolation in foreign affairs, the U.S. participated on the Allied side in both world wars and emerged from the Cold War as the world's leading military and economic power" |
#7
|
||||
|
||||
You may need to add more characters (e.g. <>\/?*&^%$#@~) to the exclusions list. Alternatively, you might change:
Code:
If Asc(.Words(i).Characters.First) < 128 Then Code:
If ((Asc(.Words(i).Characters.First) > 64) And (Asc(.Words(i).Characters.First) <91)) Or ((Asc(.Words(i).Characters.First) > 96) And (Asc(.Words(i).Characters.First) <123))Then |,|.|!|:|;|<|>|(|)|{|}|{|} from: Const StrExcl As String
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#8
|
|||
|
|||
Many thanks, I have changed the code according to your suggestion, but set j = 100.
Exclusion list looks like this now: |a|an|as|and|of|on|the|to|from|for|with|in|as|it| I have checked the macros on the text with several dublicates, in total 98 words: "The US is a federal republic comprising 50 states and state the Federal District of Columbia. It originated in the American Revolution, the successful rebellion of the colonies on the eastern coast against British rule in 1775-83. The original 13 states that formed the Union drew up a federal constitution in 1787, and Texas George Washington is elected the first president in 1789. In the 19th century the territory of the US was extended across the continent through the westward spread of pioneers and settlers and acquisitions such as that of Texas and California from the Mexico state." In the attached image you will see that the macros does not find duplicates of "state" and "is". Manually through Find, Find all word forms, they all can be found easily. |
#9
|
||||
|
||||
Try:
Code:
Sub Demo() Application.ScreenUpdating = False Dim RngTxt As Range, RngFnd As Range, RngTmp As Range Dim StrTmp As String, StrTxt As String Dim i As Long, j As Long, k As Long Const StrExcl As String = "|a|an|and|from|in|is|of|the|to|with|": j = 100 With ActiveDocument For i = 1 To .Range.Words.Count - 1 If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then If .Words(i).InlineShapes.Count = 0 Then If .Words(i).Fields.Count = 0 Then StrTxt = .Words(i).Characters.First If ((Asc(StrTxt) > 64) And (Asc(StrTxt) < 91)) Or ((Asc(StrTxt) > 96) And (Asc(StrTxt) < 123)) Then StrTxt = Replace(Trim(.Words(i)), vbCr, "") Set RngTxt = .Range(.Words(i).Start, .Range.End) With RngTxt If .Words.Count > j Then .MoveEnd Unit:=wdWord, Count:=-(.Words.Count - j) End With Set RngFnd = .Range(.Words(i).End, RngTxt.End) With .Range(.Words(i).End, RngTxt.End) With .Find .ClearFormatting .Replacement.ClearFormatting .Text = StrTxt .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWholeWord = True .MatchWildcards = False .MatchAllWordForms = True .Execute End With If .Find.Found = True Then Set RngTmp = RngTxt.Words.First RngTmp.MoveEndWhile " ", -1 RngTmp.HighlightColorIndex = wdBrightGreen Set RngTmp = .Words.Last RngTmp.MoveEndWhile " ", -1 RngTmp.HighlightColorIndex = wdBrightGreen End If End With End If Else i = i + .Words(i).Fields(1).Result.Words.Count - 1 End If End If End If Next End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
|||
|
|||
Thank you, it works better now!
A little problem I found making a check of a real scientific text. If a word is a mixture of Latin letters and digits (CO2) or contains both Latin and Greek symbols (Tχ), then again I get the error 5610. Also, how to apply the macro only to a selection, not to the complete document? |
#11
|
||||
|
||||
You can jump over the compound words causing the 5160 error by changing:
.Execute to: On Error Resume Next .Execute On Error GoTo 0 Limiting the code to Selections will take more work. It would have been nice if you'd said up front that's what you wanted to do.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#12
|
||||
|
||||
To limit operations to a Selection:
Code:
Sub Demo() Application.ScreenUpdating = False Dim RngTxt As Range, RngFnd As Range, RngTmp As Range, RngSel As Range Dim StrTmp As String, StrTxt As String Dim i As Long, j As Long, k As Long Const StrExcl As String = "|a|an|and|from|in|is|of|the|to|with|": j = 100 Set RngSel = Selection.Range With ActiveDocument For i = .Range(0, Selection.Start + 1).Words.Count To .Range(0, Selection.End).Words.Count - 1 If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then If .Words(i).InlineShapes.Count = 0 Then If .Words(i).Fields.Count = 0 Then StrTxt = .Words(i).Characters.First If ((Asc(StrTxt) > 64) And (Asc(StrTxt) < 91)) Or ((Asc(StrTxt) > 96) And (Asc(StrTxt) < 123)) Then StrTxt = Replace(Trim(.Words(i)), vbCr, "") Set RngTxt = .Range(.Words(i).Start, .Range.End) With RngTxt If .Words.Count > j Then .MoveEnd Unit:=wdWord, Count:=-(.Words.Count - j) End With Set RngFnd = .Range(.Words(i).End, RngTxt.End) With .Range(.Words(i).End, RngTxt.End) With .Find .ClearFormatting .Replacement.ClearFormatting .Text = StrTxt .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWholeWord = True .MatchWildcards = False .MatchAllWordForms = True On Error Resume Next .Execute On Error GoTo 0 End With If .Find.Found = True Then If .InRange(RngSel) = False Then Exit For Set RngTmp = RngTxt.Words.First RngTmp.MoveEndWhile " ", -1 RngTmp.HighlightColorIndex = wdBrightGreen Set RngTmp = .Words.Last RngTmp.MoveEndWhile " ", -1 RngTmp.HighlightColorIndex = wdBrightGreen End If End With End If Else i = i + .Words(i).Fields(1).Result.Words.Count - 1 End If End If End If Next End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#13
|
|||
|
|||
Dear macropod,
thank you very much for your patience and help. I guess now the macros works as it should. Just the last wish, which can be ignored without any problem actually. It would be helpful to see the same highlight color of similar/duplicated words. No idea how difficult to realize this, if too much, let's stop now Because I understand that a random coloration of, let's say, three words will lead to the same highlight color of only two last words, the first will be colored by another color. The macros should somehow check every word whether it has already a highlight, if not and if there is a duplicate then apply random highlight to the word and its duplicate, if yes apply the same color to the duplicate... |
#14
|
||||
|
||||
Although that would be possible, consistency of highlighting would require a major re-write of the code. And, in any event, there's only 15 highlights one could use for this, meaning they'd have to be recycled once you get beyond that.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#15
|
|||
|
|||
Just a thought
Hi,
Thank you for the Code, it works a treat with something I'm working on, but as you previously stated, it runs quite slow. The document I'm working on is about 60'000 words, I left this macro running. but 9 hours later it's still looking like it's running. Testing on smaller text, it works quicker, so my thought was, could we some how modify the code to run per page in the document, update what it's found and then repeat for the remaining pages? I accept it would take a little longer, but you would at least see a visual representation of progress of sorts. I'm also open to anything else that would make this run faster if possible. Thank you once again and sorry for bringing an old thread back up. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Macro to check against Columns & Delete Duplicates | cjamps | Excel Programming | 27 | 12-18-2017 06:38 AM |
how to check for duplicated words / cells ? | iSlam Khaled | Word Tables | 11 | 05-05-2015 08:07 PM |
Adding words to spell check dictionary | oakwoodbank | Word | 17 | 02-27-2015 08:09 PM |
Spell check adding words | riweir | Word | 3 | 11-30-2011 09:03 PM |
Edit spell check dic to exclude words? | franklekens | Word | 1 | 07-03-2010 09:57 AM |