![]() |
|
#1
|
|||
|
|||
![]()
Hello!
I want to create TOC of a word document. Someone on a polish forum help me with this, but partially. In this moment "my" code output the following results: words, frequency of occurence, and page number. Code:
Option Explicit Option Compare Text Sub Test() 'based on: http://www.vbaexpress.com/kb/getarticle.php?kb_id=727 'based on: http://www.authorsden.com/categories/article_top.asp?catid=20&id=39307 Dim lngRes As Long lngRes = WordCountAndPages(ThisDocument) Application.ScreenRefresh MsgBox "There were " & CStr(lngRes) & " different words ", vbOKOnly, "Finished" End Sub Function WordCountAndPages(SourceDoc As Word.Document, _ Optional ByVal sExcludes = "[the][a][of][is][to][for][by][be][and][are]", _ Optional ByVal lmaxwords As Long = 50000) As Long ' "[ale][ani][aby][do][od][czy][za][ze][przed][po]" 'excludes for example for polish On Error GoTo WordCountAndPages_Error Dim NewDoc As Word.Document Dim TmpRange As Word.Range Dim aWord As Object '---------------------------------------- Dim tmpName As String Dim strSingleWord As String Dim lngCurrentPage As Long Dim lngPageCount As Long Dim lngWordNum As Long 'Number of unique words Dim lngttlwds As Long 'Total words in the document Dim j As Long Dim k As Long Dim w As Long Dim bTmpFound As Boolean 'Temporary flag ReDim arrWordList(1 To 1) As String 'Array to hold unique words ReDim arrWordCount(1 To 1) As Long 'Frequency counter for unique words ReDim arrPageW(1 To 1) As String 'Pages unique words lngCurrentPage = 1 'we started to count from a first page to next pages With SourceDoc If ActiveDocument.FullName <> .FullName Then SourceDoc.Activate ' because below selection Set TmpRange = .Range ' document's page count (maybe must refresh) lngPageCount = .Content.ComputeStatistics(wdStatisticPages) 'we counted number of pages End With '-------------- ' The item in the Words collection includes both the word and the spaces after the word ' The Count property for this collection in a document returns the number of items in the main story only. ' Also, the Count property includes punctuation and paragraph marks in the total. lngttlwds = TmpRange.Words.Count ' SourceDoc.Words.Count '--------------------- System.Cursor = wdCursorWait Do Until lngCurrentPage > lngPageCount If lngCurrentPage = lngPageCount Then TmpRange.End = SourceDoc.Range.End 'last page (there won't be a next page) Else 'Find the beginning of the next page 'Must use the Selection object. The Range.Goto method will not work on a page Selection.GoTo wdGoToPage, wdGoToAbsolute, lngCurrentPage + 1 'page, next page 'Set the end of the range to the point between the pages TmpRange.End = Selection.Start End If '------------------------------------ For Each aWord In TmpRange.Words ' 160 is non breaking space strSingleWord = LCase(Replace(Trim(aWord.Text), Chr(160), "")) 'lower character to upper character 'maybe remove lower case form code, like this: (Replace(Trim(aWord.Text), Chr(160), "")) Select Case True Case Len(strSingleWord) = 1 'number of characters in single word ' if =1 then ignore "a" "&" if = 2 ignore for example: "an" "of" "is" "to" "by" "be" Case strSingleWord < "a" Or strSingleWord > "z" 'maybe remove this line form code Case InStr(1, sExcludes, "[" & strSingleWord & "]", vbTextCompare) Case Else bTmpFound = False For j = 1 To lngWordNum If StrComp(arrWordList(j), strSingleWord, vbTextCompare) = 0 Then arrWordCount(j) = arrWordCount(j) + 1 If (arrPageW(j) & "," Like "*," & CStr(lngCurrentPage) & ",*") = False Then arrPageW(j) = arrPageW(j) & "," & CStr(lngCurrentPage) End If bTmpFound = True Exit For End If Next j If Not bTmpFound Then lngWordNum = lngWordNum + 1 ReDim Preserve arrWordList(1 To lngWordNum) ReDim Preserve arrWordCount(1 To lngWordNum) ReDim Preserve arrPageW(1 To lngWordNum) arrWordList(lngWordNum) = strSingleWord arrWordCount(lngWordNum) = 1 arrPageW(lngWordNum) = arrPageW(lngWordNum) & "," & CStr(lngCurrentPage) End If If lngWordNum > lmaxwords - 1 Then MsgBox "Too many words.", vbOKOnly Exit For End If End Select lngttlwds = lngttlwds - 1 StatusBar = "Remaining: " & lngttlwds & ", Unique: " & lngWordNum Next aWord '------------------------------------ lngCurrentPage = lngCurrentPage + 1 'move to the next page TmpRange.Collapse wdCollapseEnd 'go to the next page Loop '------------------------------------ If lngWordNum > 0 Then tmpName = SourceDoc.AttachedTemplate.FullName 'output results Set NewDoc = Application.Documents.Add(Template:=tmpName, NewTemplate:=False) Selection.ParagraphFormat.TabStops.ClearAll Application.ScreenUpdating = False With Selection For j = 1 To lngWordNum .TypeText Text:=arrWordList(j) & vbTab & CStr(arrWordCount(j)) & vbTab & Mid(arrPageW(j), 2) & vbNewLine Next j End With Set TmpRange = NewDoc.Range TmpRange.ConvertToTable Separator:=wdSeparateByTabs With NewDoc.Tables(1) .Sort ExcludeHeader:=False, _ FieldNumber:="Kolumna 1", SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", _ FieldNumber3:="", _ CaseSensitive:=False, LanguageID:=wdPolish, IgnoreDiacritics:=False .Borders(wdBorderTop).LineStyle = wdLineStyleNone .Borders(wdBorderLeft).LineStyle = wdLineStyleNone .Borders(wdBorderBottom).LineStyle = wdLineStyleNone .Borders(wdBorderRight).LineStyle = wdLineStyleNone .Borders(wdBorderHorizontal).LineStyle = wdLineStyleNone .Borders(wdBorderVertical).LineStyle = wdLineStyleNone .Borders(wdBorderDiagonalDown).LineStyle = wdLineStyleNone .Borders(wdBorderDiagonalUp).LineStyle = wdLineStyleNone End With End If WordCountAndPages = lngWordNum '--------------------- WordCountAndPages_Exit: On Error Resume Next Set aWord = Nothing Set TmpRange = Nothing Set NewDoc = Nothing System.Cursor = wdCursorNormal Application.ScreenUpdating = True Exit Function WordCountAndPages_Error: MsgBox "Błąd : ( " & Err.Number & " ) " & Err.Description & vbCrLf & _ "Procedura : " & "WordCountAndPages", vbExclamation Resume WordCountAndPages_Exit End Function 1-3, 5, 6-8. Now the page numbers looks like this: 1,2,3,5,6,7,8. When the page numbers are consecutive (sequential), beetwen a page numbers I need to add a hyphen (dashes). Solution for this problem, I found at below page, but for an awk language: http://docstore.mik.ua/orelly/unix/s...12/combine.idx and also: http://docstore.mik.ua/orelly/unix3/sedawk/ch12_02.htm In another column I also need a number of page numbers. For example, when a some word appear on pages 1,3,5,7, in output results in another column will be 4. And the last case: my code found all words in document. I need also search for the keywords, for example this code do it: http://microsoft-personal-applicatio...3/Default.aspx I have also a some problems with some lines of my code: strSingleWord < "a" Or strSingleWord > "z" 'fist character of word must be a letter, this line ignore a special word, like "3w" "4you" "3ware" "3love" "2day" , and so on... I have also problem with some words, like: "@SCHL" "m&m", and so on..., something line of code ignore this words. Anyone help me with this? You can also visit my thread at polish forum: http://www.coderscity.pl/ftopic35919.html Thanks for your attention! Regards Last edited by markos97; 10-27-2010 at 06:31 AM. |
![]() |
Tags |
index, numbers, page |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom Table of Contents | NJPhillips | Word | 1 | 06-11-2010 12:39 PM |
![]() |
erika | Word | 1 | 06-10-2009 10:40 AM |
Table of contents automation through VB 6.0 | Developer | Word | 0 | 06-09-2009 06:30 AM |
Table of Contents Formatting | Rick5150 | Word | 1 | 03-16-2009 11:10 AM |