Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 01-16-2019, 05:48 AM
sylvio sylvio is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Novice
check for duplicates of a word within next 100 words
 
Join Date: Jan 2017
Posts: 20
sylvio is on a distinguished road
Default check for duplicates of a word within next 100 words

Good day,

Please, help me with a macro which can facilite proofreading of texts.
This macro should look for dublicates of every word in all its forms within next ,let's say, 100 words or 3-4 sentences.
There should be also a list of exclusions, like 'a, an, the, with, to' etc omitted from check.
Found dublicates should be highlighted.



Is it possible?
Reply With Quote
  #2  
Old 01-16-2019, 08:21 PM
Guessed's Avatar
Guessed Guessed is offline check for duplicates of a word within next 100 words Windows 10 check for duplicates of a word within next 100 words Office 2016
Expert
 
Join Date: Mar 2010
Location: Canberra/Melbourne Australia
Posts: 3,932
Guessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant future
Default

Yes, it is possible but not particularly quick. There may be more efficient ways of doing it but my attempt at it resulted in the following code. Note there wasn't any comprehensive error checking done on this.
Code:
Sub DupWordMarks()
  Dim aWord As String, iCounter As Long, iWords As Long, iLen As Long, iTotalWords As Long
  Dim aRng As Range, iEnd As Long, i As Long
    
  iLen = 20   'the size of the range searched forward
  iTotalWords = ActiveDocument.Words.Count
  
  For iCounter = 1 To iTotalWords - 1
    Select Case Trim(ActiveDocument.Words(iCounter))
      Case "you", "a", "the", "can", "for", ".", "to", "and", ",", ";"
        'do nothing
      Case Else
        If iCounter + iLen < iTotalWords Then
          iEnd = iCounter + iLen
        Else
          iEnd = iTotalWords
        End If
        Set aRng = ActiveDocument.Range(ActiveDocument.Words(iCounter).End, ActiveDocument.Words(iEnd).End)
        For i = 1 To aRng.Words.Count
          If LCase(Trim(ActiveDocument.Words(iCounter))) = LCase(Trim(aRng.Words(i))) Then
            aRng.Words(i).HighlightColorIndex = wdPink
            Exit For
          End If
        Next i
    End Select
    'If iCounter = 500 Then Exit Sub   'enable an artificial restraint to stop the code when testing
  Next iCounter
End Sub
__________________
Andrew Lockton
Chrysalis Design, Melbourne Australia
Reply With Quote
  #3  
Old 01-16-2019, 11:17 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Perhaps:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, k As Long, Rng As Range, StrTmp As String, StrFnd As String
Const StrExcl As String = "|,|.|!|:|;|<|>|(|)|{|}|{|}|a|an|the|to|from|with|": j = 100
With ActiveDocument
  For i = 1 To .Range.Words.Count - j
    If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then
      StrFnd = Replace(Trim(.Words(i)), vbCr, "")
      Set Rng = .Range(.Words(i).Start, .Words(i + j).End)
      With .Range(.Words(i).End, .Words(i + j).End)
        With .Find
          .ClearFormatting
          .Replacement.ClearFormatting
          .Text = StrFnd
          .Replacement.Text = ""
          .Forward = True
          .Wrap = wdFindStop
          .Format = False
          .MatchCase = False
          .MatchWholeWord = True
          .MatchWildcards = False
          .MatchAllWordForms = True
          .Execute
        End With
        If .Find.Found = True Then
          Rng.End = .Words.Last.End
          With Rng
            If .Words.Count < j Then
              .Words.First.HighlightColorIndex = wdBrightGreen
              .Words.Last.HighlightColorIndex = wdBrightGreen
            End If
          End With
        End If
      End With
    End If
  Next
End With
Application.ScreenUpdating = True
End Sub
The 100 in j = 100 tells Word what the upper limit for the distance between repeated word is.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]

Last edited by macropod; 01-17-2019 at 12:00 AM. Reason: Code update
Reply With Quote
  #4  
Old 01-17-2019, 01:59 AM
sylvio sylvio is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Novice
check for duplicates of a word within next 100 words
 
Join Date: Jan 2017
Posts: 20
sylvio is on a distinguished road
Default

Guessed, thank you for fast response!

Macropod,

many thanks, the code works as expected except these few points:

1) If there is an inline image OR a complicated field (from a reference software) I get

Run-time error '5610': The Find What text for a Find All Word Forms search can only contain alphabetic letters.

I guess the 'StrFnd' value should be "flattened" to pure text and any images ignored. How to do that?

2) How to avoid highlighting of the spaces after found duplicates?

3) How to add greek/mathematic symbols to the exclusion list?

4) I would like to have a possibility to apply the macros to the selection or to the complete document if there is no selection. Is it possible too?
Reply With Quote
  #5  
Old 01-17-2019, 03:06 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, j As Long, k As Long, RngTxt As Range, RngTmp As Range, StrTmp As String, StrFnd As String
Const StrExcl As String = "|,|.|!|:|;|<|>|(|)|{|}|{|}|a|an|the|to|from|with|": j = 25
With ActiveDocument
  For i = 1 To .Range.Words.Count - j
    If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then
      If .Words(i).InlineShapes.Count = 0 Then
        If .Words(i).Fields.Count = 0 Then
          If Asc(.Words(i).Characters.First) < 128 Then
            StrFnd = Replace(Trim(.Words(i)), vbCr, "")
            Set RngTxt = .Range(.Words(i).Start, .Words(i + j).End)
            With .Range(.Words(i).End, .Words(i + j).End)
              With .Find
                .ClearFormatting
                .Replacement.ClearFormatting
                .Text = StrFnd
                .Replacement.Text = ""
                .Forward = True
                .Wrap = wdFindStop
                .Format = False
                .MatchCase = False
                .MatchWholeWord = True
                .MatchWildcards = False
                .MatchAllWordForms = True
                .Execute
              End With
              If .Find.Found = True Then
                RngTxt.End = .Words.Last.End
                With RngTxt
                  If .Words.Count < j Then
                    Set RngTmp = .Words.First
                    RngTmp.MoveEndWhile " ", -1
                    RngTmp.HighlightColorIndex = wdBrightGreen
                    Set RngTmp = .Words.Last
                    RngTmp.MoveEndWhile " ", -1
                    RngTmp.HighlightColorIndex = wdBrightGreen
                  End If
                End With
              End If
            End With
          End If
        Else
          i = i + .Words(i).Fields(1).Result.Words.Count - 1
        End If
      End If
    End If
  Next
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #6  
Old 01-18-2019, 05:07 PM
sylvio sylvio is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Novice
check for duplicates of a word within next 100 words
 
Join Date: Jan 2017
Posts: 20
sylvio is on a distinguished road
Default

macropod,

many thanks for the updated macros.
But still I get the same error 5610 even on very simple texts...

In the text below it hightlights "republic" which is not dublicated.
The check proceeds up to the first word "states" and then I get the error...

"The U.S. is a federal republic comprising 50 states and the Federal District of Columbia. It originated in the American Revolution, the successful rebellion of the colonies on the eastern coast against British rule in 1775 - 83. The original 13 states that formed the Union drew up a federal constitution in 1787, and George Washington was elected the first president in 1789. In the 19th century the territory of the U.S. was extended across the continent through the westward spread of pioneers and settlers and acquisitions such as that of Texas and California from Mexico in the 1840s. After a long period of isolation in foreign affairs, the U.S. participated on the Allied side in both world wars and emerged from the Cold War as the world's leading military and economic power"
Reply With Quote
  #7  
Old 01-18-2019, 05:19 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

You may need to add more characters (e.g. <>\/?*&^%$#@~) to the exclusions list. Alternatively, you might change:
Code:
If Asc(.Words(i).Characters.First) < 128 Then
to:
Code:
If ((Asc(.Words(i).Characters.First) > 64) And (Asc(.Words(i).Characters.First) <91)) Or ((Asc(.Words(i).Characters.First) > 96) And (Asc(.Words(i).Characters.First) <123))Then
in which case, you could delete:
|,|.|!|:|;|<|>|(|)|{|}|{|}
from:
Const StrExcl As String
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #8  
Old 01-19-2019, 06:10 PM
sylvio sylvio is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Novice
check for duplicates of a word within next 100 words
 
Join Date: Jan 2017
Posts: 20
sylvio is on a distinguished road
Default

Many thanks, I have changed the code according to your suggestion, but set j = 100.
Exclusion list looks like this now: |a|an|as|and|of|on|the|to|from|for|with|in|as|it|

I have checked the macros on the text with several dublicates, in total 98 words:
"The US is a federal republic comprising 50 states and state the Federal District of Columbia. It originated in the American Revolution, the successful rebellion of the colonies on the eastern coast against British rule in 1775-83. The original 13 states that formed the Union drew up a federal constitution in 1787, and Texas George Washington is elected the first president in 1789. In the 19th century the territory of the US was extended across the continent through the westward spread of pioneers and settlers and acquisitions such as that of Texas and California from the Mexico state."

In the attached image you will see that the macros does not find duplicates of "state" and "is". Manually through Find, Find all word forms, they all can be found easily.
Attached Images
File Type: jpg Capture.JPG (103.4 KB, 42 views)
Reply With Quote
  #9  
Old 01-19-2019, 10:41 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim RngTxt As Range, RngFnd As Range, RngTmp As Range
Dim StrTmp As String, StrTxt As String
Dim i As Long, j As Long, k As Long
Const StrExcl As String = "|a|an|and|from|in|is|of|the|to|with|": j = 100
With ActiveDocument
  For i = 1 To .Range.Words.Count - 1
    If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then
      If .Words(i).InlineShapes.Count = 0 Then
        If .Words(i).Fields.Count = 0 Then
          StrTxt = .Words(i).Characters.First
          If ((Asc(StrTxt) > 64) And (Asc(StrTxt) < 91)) Or ((Asc(StrTxt) > 96) And (Asc(StrTxt) < 123)) Then
            StrTxt = Replace(Trim(.Words(i)), vbCr, "")
            Set RngTxt = .Range(.Words(i).Start, .Range.End)
            With RngTxt
              If .Words.Count > j Then .MoveEnd Unit:=wdWord, Count:=-(.Words.Count - j)
            End With
            Set RngFnd = .Range(.Words(i).End, RngTxt.End)
            With .Range(.Words(i).End, RngTxt.End)
              With .Find
                .ClearFormatting
                .Replacement.ClearFormatting
                .Text = StrTxt
                .Replacement.Text = ""
                .Forward = True
                .Wrap = wdFindStop
                .Format = False
                .MatchCase = False
                .MatchWholeWord = True
                .MatchWildcards = False
                .MatchAllWordForms = True
                .Execute
              End With
              If .Find.Found = True Then
                Set RngTmp = RngTxt.Words.First
                RngTmp.MoveEndWhile " ", -1
                RngTmp.HighlightColorIndex = wdBrightGreen
                Set RngTmp = .Words.Last
                RngTmp.MoveEndWhile " ", -1
                RngTmp.HighlightColorIndex = wdBrightGreen
              End If
            End With
          End If
        Else
          i = i + .Words(i).Fields(1).Result.Words.Count - 1
        End If
      End If
    End If
  Next
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #10  
Old 01-21-2019, 05:25 AM
sylvio sylvio is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Novice
check for duplicates of a word within next 100 words
 
Join Date: Jan 2017
Posts: 20
sylvio is on a distinguished road
Default

Thank you, it works better now!
A little problem I found making a check of a real scientific text.
If a word is a mixture of Latin letters and digits (CO2) or contains both Latin and Greek symbols (Tχ), then again I get the error 5610.

Also, how to apply the macro only to a selection, not to the complete document?
Reply With Quote
  #11  
Old 01-21-2019, 01:19 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

You can jump over the compound words causing the 5160 error by changing:
.Execute
to:

On Error Resume Next
.Execute
On Error GoTo 0

Limiting the code to Selections will take more work. It would have been nice if you'd said up front that's what you wanted to do.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #12  
Old 01-21-2019, 03:30 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

To limit operations to a Selection:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim RngTxt As Range, RngFnd As Range, RngTmp As Range, RngSel As Range
Dim StrTmp As String, StrTxt As String
Dim i As Long, j As Long, k As Long
Const StrExcl As String = "|a|an|and|from|in|is|of|the|to|with|": j = 100
Set RngSel = Selection.Range
With ActiveDocument
  For i = .Range(0, Selection.Start + 1).Words.Count To .Range(0, Selection.End).Words.Count - 1
    If InStr(StrExcl, "|" & LCase(Trim(.Words(i))) & "|") = 0 Then
      If .Words(i).InlineShapes.Count = 0 Then
        If .Words(i).Fields.Count = 0 Then
          StrTxt = .Words(i).Characters.First
          If ((Asc(StrTxt) > 64) And (Asc(StrTxt) < 91)) Or ((Asc(StrTxt) > 96) And (Asc(StrTxt) < 123)) Then
            StrTxt = Replace(Trim(.Words(i)), vbCr, "")
            Set RngTxt = .Range(.Words(i).Start, .Range.End)
            With RngTxt
              If .Words.Count > j Then .MoveEnd Unit:=wdWord, Count:=-(.Words.Count - j)
            End With
            Set RngFnd = .Range(.Words(i).End, RngTxt.End)
            With .Range(.Words(i).End, RngTxt.End)
              With .Find
                .ClearFormatting
                .Replacement.ClearFormatting
                .Text = StrTxt
                .Replacement.Text = ""
                .Forward = True
                .Wrap = wdFindStop
                .Format = False
                .MatchCase = False
                .MatchWholeWord = True
                .MatchWildcards = False
                .MatchAllWordForms = True
                On Error Resume Next
                .Execute
                On Error GoTo 0
              End With
              If .Find.Found = True Then
              If .InRange(RngSel) = False Then Exit For
                Set RngTmp = RngTxt.Words.First
                RngTmp.MoveEndWhile " ", -1
                RngTmp.HighlightColorIndex = wdBrightGreen
                Set RngTmp = .Words.Last
                RngTmp.MoveEndWhile " ", -1
                RngTmp.HighlightColorIndex = wdBrightGreen
              End If
            End With
          End If
        Else
          i = i + .Words(i).Fields(1).Result.Words.Count - 1
        End If
      End If
    End If
  Next
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #13  
Old 01-22-2019, 04:24 PM
sylvio sylvio is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Novice
check for duplicates of a word within next 100 words
 
Join Date: Jan 2017
Posts: 20
sylvio is on a distinguished road
Default

Dear macropod,

thank you very much for your patience and help.
I guess now the macros works as it should.

Just the last wish, which can be ignored without any problem actually.
It would be helpful to see the same highlight color of similar/duplicated words.
No idea how difficult to realize this, if too much, let's stop now
Because I understand that a random coloration of, let's say, three words will lead to the same highlight color of only two last words, the first will be colored by another color.
The macros should somehow check every word whether it has already a highlight, if not and if there is a duplicate then apply random highlight to the word and its duplicate, if yes apply the same color to the duplicate...
Reply With Quote
  #14  
Old 01-22-2019, 08:04 PM
macropod's Avatar
macropod macropod is offline check for duplicates of a word within next 100 words Windows 7 64bit check for duplicates of a word within next 100 words Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Although that would be possible, consistency of highlighting would require a major re-write of the code. And, in any event, there's only 15 highlights one could use for this, meaning they'd have to be recycled once you get beyond that.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #15  
Old 06-25-2023, 02:10 PM
Harvi007 Harvi007 is offline check for duplicates of a word within next 100 words Windows 11 check for duplicates of a word within next 100 words Office 2019
Novice
 
Join Date: Aug 2022
Posts: 11
Harvi007 is on a distinguished road
Question Just a thought

Hi,

Thank you for the Code, it works a treat with something I'm working on, but as you previously stated, it runs quite slow.

The document I'm working on is about 60'000 words, I left this macro running. but 9 hours later it's still looking like it's running.

Testing on smaller text, it works quicker, so my thought was, could we some how modify the code to run per page in the document, update what it's found and then repeat for the remaining pages?

I accept it would take a little longer, but you would at least see a visual representation of progress of sorts.

I'm also open to anything else that would make this run faster if possible.

Thank you once again and sorry for bringing an old thread back up.
Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
check for duplicates of a word within next 100 words Macro to check against Columns & Delete Duplicates cjamps Excel Programming 27 12-18-2017 06:38 AM
check for duplicates of a word within next 100 words how to check for duplicated words / cells ? iSlam Khaled Word Tables 11 05-05-2015 08:07 PM
check for duplicates of a word within next 100 words Adding words to spell check dictionary oakwoodbank Word 17 02-27-2015 08:09 PM
check for duplicates of a word within next 100 words Spell check adding words riweir Word 3 11-30-2011 09:03 PM
Edit spell check dic to exclude words? franklekens Word 1 07-03-2010 09:57 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 11:02 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft