Thread: [Solved] Create list of unique words
View Single Post
 
Old 09-24-2015, 11:20 PM
macropod's Avatar
macropod macropod is offline Windows 7 64bit Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,342
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Try the following. In columns B & C you should get sorted count of the words in column A.

The macro also has an exclusion list, so that various words and phrases can be excluded. That list (defined by the words & phrases in the StrExcl string variable) holds the words and phrases to be omitted from the concordance. Any phrases should be inserted into the exclusions list ahead of any of the single-word exclusions (so that conflicts don’t occur).

The code also strips out trailing apostrophes, with the result that some possessive word forms may look a bit odd.

Code:
Sub WordFrequencyCounter()
Application.ScreenUpdating = False
Dim StrWrds As String, StrTmp As String
Dim i As Long, j As Long, k As Long, l As Long, r As Long
'Define the exlusions list
Const StrExcl As String = "a,am,an,and,are,as,at,b,be,but,by,c,can,cm," & _
          "d,did,do,does,e,eg,en,eq,etc,f,for,g,get,go,got,h,has,have," & _
          "he,her,him,how,i,ie,if,in,into,is,it,its,j,k,l,m,me,mi," & _
          "mm,my,n,na,nb,no,not,o,of,off,ok,on,one,or,our,out,p,q" & _
          ",r,re,s,she,so,t,the,their,them,they,this,t,to,u,v,via," & _
          "vs,w,was,we,were,who,will,with,would,x,y,yd,you,your,z"
With ActiveSheet
  r = .UsedRange.Range("A" & .Cells.SpecialCells(xlCellTypeLastCell).Row).Row
  For i = 1 To r
    StrTmp = Trim(.Cells(i, 1).Text)
    If StrTmp <> "" Then StrWrds = StrWrds & " " & StrTmp
  Next
  'Strip out unwanted characters. Amongst others, hyphens and formatted single quotes are retained at this stage
  For i = 1 To 255
    Select Case i
      'To strip out numbers, delete ', 58'
      Case 1 To 35, 37 To 38, 40 To 43, 45, 47, 58 To 64, 91 To 96, 123 To 127, 129 To 144, 147 To 149, 152 To 162, 164, 166 To 171, 174 To 191, 247
      StrWrds = Replace(StrWrds, Chr(i), " ")
    End Select
  Next
  'Delete any periods or commas at the end of a word. Formatted numbers are thus retained.
  StrWrds = Replace(Replace(Replace(Replace(StrWrds, Chr(44) & Chr(32), " "), Chr(44) & vbCr, " "), Chr(46) & Chr(32), " "), Chr(46) & vbCr, " ")
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrWrds = Replace(Replace(Replace(Replace(StrWrds, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Convert to lowercase
  StrWrds = " " & LCase(Trim(StrWrds)) & " "
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    While InStr(StrWrds, " " & Split(StrExcl, ",")(i) & " ") > 0
      StrWrds = Replace(StrWrds, " " & Split(StrExcl, ",")(i) & " ", " ")
    Wend
  Next
  'Clean up any duplicate spaces
  While InStr(StrWrds, "  ") > 0
    StrWrds = Replace(StrWrds, "  ", " ")
  Wend
  StrWrds = " " & Trim(StrWrds) & " "
  If Trim(StrWrds) = "" Then Exit Sub
  j = UBound(Split(StrWrds, " "))
  l = j: r = 1
  .Cells(r, 2).Value = "Word"
  .Cells(r, 3).Value = "Count"
  For i = 1 To j
    'Find how many occurences of each word there are in the column
    StrTmp = Split(StrWrds, " ")(1)
    While InStr(StrWrds, " " & StrTmp & " ") > 0
      StrWrds = Replace(StrWrds, " " & StrTmp & " ", " ")
    Wend
    'Calculate the number of words replaced
    k = l - UBound(Split(StrWrds, " "))
    r = r + 1
    'Update the output range
    .Cells(r, 2).Value = StrTmp
    .Cells(r, 3).Value = k
    l = UBound(Split(StrWrds, " "))
    If l = 1 Then Exit For
    DoEvents
  Next
  With .Sort
    With .SortFields
      .Clear
      .Add Key:=Range("C2:C" & r), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
      .Add Key:=Range("B2:B" & r), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
    End With
    .SetRange Range("B1:C" & r)
    .Header = xlYes
    .MatchCase = False
    .Orientation = xlTopToBottom
    .SortMethod = xlPinYin
    .Apply
  End With
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote