|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Looking for VBA code that can FIND and SELECT any text based on its properties
Short Story I am looking for VBA code to scan through a Word document and select any text that meets specific parameters like (Font = Arial, Text Size= 12, All Caps = True, Text = Bold… and so on.) and then highlight it. Long story I am looking for help finishing a macro I have been trying to create for a while now. The overall purpose is to extract the titles from manuals that I have and compare them to a list I have, to confirm they match. I have a portion of the codes I would like to combine, but I am missing the last piece and I cannot find it online or figure it out. I tried finding a way to run the “select text with similar formatting” but it would not record and I could not find anything online that would work. The steps I have managed so far are as follows: 1. Merge all the manuals into one document. 2. Find and replace the ^p with a “space” (because I would like the title on 1 line) 3. This is the missing step - Select and highlight all the titles (just the titles and stores item No.) in the newly merged document. 4. Extract all the highlighted text onto a new Word document. 5. Add them to my excel list to compare them. I have created or found everything I need except the 3rd step. Any help would be fantastic because I am not sure where else to look. I will include an attachment that is a template for the manuals I would be reviewing, so I would be combining multiple copies of the file and looking to extract just the titles. Also if someone knows of a better or more efficient method to get the same result please point me in the right direction. I know this is a long post but I wanted to make sure I was as clear as possible, thank you! |
#2
|
||||
|
||||
In my experience it is best to minimise the number of docs involved so I'm not sure why step 4 is needed if you are just going to send it immediately to Excel - just cut out that middleman.
In terms of #3, my suggestion is to add a TOC in the merged doc to collect all the instances of the Title style into a list which can then get extracted to Excel. The TOC field code of {TOC \t "Title,1" \n} will give you a nice list to extract for this purpose. You should try to get away from searching on a large number of font properties because styles are a much better way to work.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#3
|
|||
|
|||
My initial thought was bringing all the manual covers into one document then running a search to extract all the titles was the easiest solution. Technically removing everything that is “not a title” would also work but making that code seemed more complicated. I would totally use that if someone could show me how to write that. As for the properties, I thought that was a reasonable search to isolate the text that makes up the title because that would also be a constant between all the different manual title (Without grabbing other text). Honestly I probably don’t need all those properties, I was mostly using them as an example, but I would also like to learn how so I can get better and creating the codes myself.
I’m not too familiar with the Table of Content coding, is that something I can automate or would I have to add it each time I run it on a new set of manuals? If it will give me a list that I can paste into an Excel column for a comparison I am totally on board! Edit: Forgot to mention, the only reason I was moving everything onto a new page was so I could just CTRL+A, CTRL+C, then paste in Excel. Plus seeing it on a new sheet I would be able to see it all layed out with everything on it's own row. so I can see if any other text was grabbed by mistake. I already have a similar macro that can copy anything that is highlights and add it to a new document, but as you know I am having trouble finding a way to get the text that makes up the title highlighted. Last edited by Mr J; 01-11-2022 at 04:45 PM. Reason: Forgot to mention |
#4
|
||||
|
||||
You can run code from Excel to open all the docx files in a folder and grab the titles before doing the other checks. This avoids the need to work with any other files.
What Excel code do you have already? Can you post a sample of the Excel list you already have?
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#5
|
|||
|
|||
Here is the coding I have per step listed in the original post.
Step 1: Sub CombineAllWordDocs() Dim baseDoc As Document, sFile As String Dim oRng As Range On Error GoTo err_Handler Set baseDoc = Application.Documents.Add sFile = Dir(sPath & "*.doc") 'Loop through all .doc files in that path Do While sFile <> "" Set oRng = baseDoc.Range oRng.Collapse wdCollapseEnd oRng.InsertFile sPath & sFile Set oRng = baseDoc.Range oRng.Collapse wdCollapseEnd oRng.InsertBreak Type:=wdSectionBreakNextPage sFile = Dir DoEvents Loop MsgBox "Process complete" lbl_Exit: Set baseDoc = Nothing Set oRng = Nothing Exit Sub err_Handler: MsgBox Err.Number & vbCr & Err.Description Err.Clear GoTo lbl_Exit End Sub Step 2: Sub Remove_Enter() ' ' Remove_Enter Macro ' ' Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^p" .Replacement.Text = " " .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll End Sub Step 3: Missing Step Step 4: Sub Extract_Highlighted_Text() ' ' Extract_Highlighted_Text Macro ' ' Debug.Print oNum 'This is the output list Dim oDoc As Document Dim s As String With Selection .HomeKey Unit:=wdStory With .Find .ClearFormatting .Text = "" .Highlight = True Do While .Execute s = s & Selection.Text & vbCrLf Loop End With End With Set oDoc = Documents.Add oDoc.Range.InsertAfter s End Sub Step 5: Copy and Paste to Excel |
#6
|
|||
|
|||
I apologize I miss read your last reply and thought you asked for the WORD macros cause I glanced past the part were you asked for the Excel stuff. I don't have any excel macros specific to this request, but I have attached an example of the list I would have.
What I would like to do is to insert a Column between C & D then have the titles inserted there. I could then just compare the cells to confirm they match and make the corrections if needed. |
#7
|
||||
|
||||
This code is an entire module that you should put into your Excel workbook. You will need to add References to:
Microsoft Scripting Runtime Microsoft Word x.x Object Library Then run the Test macro after changing the filepath to point at the folder where your individual Word documents are. NOTE: The vast majority of this code came from a post on StackOverflow by Ryan Code:
Option Explicit Private mobjWordApp As Word.Application Sub Test() ProcessDirectory "C:\temp\" End Sub Property Get WordApp() As Word.Application If mobjWordApp Is Nothing Then Set mobjWordApp = CreateObject("Word.Application") mobjWordApp.Visible = True End If Set WordApp = mobjWordApp End Property Sub CloseWordApp() If Not (mobjWordApp Is Nothing) Then On Error Resume Next mobjWordApp.Quit Set mobjWordApp = Nothing End If End Sub Function GetWordDocument(FileName As String) As Word.Document On Error Resume Next Set GetWordDocument = WordApp.Documents.Open(FileName) If Err.Number = &H80010105 Then CloseWordApp On Error GoTo 0 Set GetWordDocument = WordApp.Documents.Open(FileName) End If End Function Sub ProcessDirectory(PathName As String) Dim fso As New FileSystemObject, objFile As File Dim objFolder As Folder, objWordDoc As Object, aSheet As Worksheet Const aSheetName As String = "DocTitles" On Error Resume Next Set aSheet = ActiveWorkbook.Sheets(aSheetName) On Error GoTo Err_Handler If aSheet Is Nothing Then Set aSheet = ActiveWorkbook.Sheets.Add aSheet.Name = aSheetName End If aSheet.Range("A1").Value = "Title" aSheet.Range("B1").Value = "Filename" Set objFolder = fso.GetFolder(PathName) For Each objFile In objFolder.Files If objFile.Name Like "*.doc*" Then Set objWordDoc = GetWordDocument(objFile.Path) ProcessDocument objWordDoc, aSheet objWordDoc.Close 0, 1 Set objWordDoc = Nothing End If Next Exit_Handler: CloseWordApp Exit Sub Err_Handler: MsgBox "Error " & Err.Number & ": " & Err.Description Resume Exit_Handler 'Resume Next ' or as above End Sub Sub ProcessDocument(objWordDoc As Document, aSheet As Worksheet) Dim aRng As Word.Range, sFound As String, iRow As Integer Set aRng = objWordDoc.Content With aRng.Find .ClearFormatting .Style = "Title" .Text = "" If .Execute = True Then sFound = aRng.Text iRow = aSheet.UsedRange.Rows.Count + 1 sFound = Trim(Replace(sFound, vbCr, " ")) 'replace paragraph marks with a space aSheet.Cells(iRow, 1).Value = sFound aSheet.Cells(iRow, 2).Value = objWordDoc.Name End If End With End Sub
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia Last edited by Guessed; 01-13-2022 at 03:50 PM. |
#8
|
|||
|
|||
That is freaking great!!! I honestly wouldn't have thought to do it from the excel side. However I'm a little embarrassed I was not clear enough. What you made is awesome and unfortunately, the original manual template was made by someone else a long time ago. So the Word file is not really structured properly which is why I’m trying to work with what I am given.
So the very top line the “Maintenance Manual# MMXXXX” is more of a label. The title I am trying to capture is the next couple of lines. However the biggest issue is that this template has been used for a very long time and has been slightly changed over the years, so one of the few consistencies I could think to search for is the Font, the text size (Times New Roman, size = 12). I have attached one more file just showing a couple examples of the different highlighted titles I might receive. I know this is pretty complex especially because I can’t fix what was already created so I have to find work arounds but this is what I’m looking for. Andrew thank you! I am very appreciative of what you have already given me so if I am asking too much I understand. Edit: Just wanted to be clear. In the updated attachment I showed multiple examples of how they would look but there shouldn't ever be more than one title in 1 file. I just showed them that way so you could get a visual but that would not be how I receive them. |
#9
|
||||
|
||||
You can modify the ProcessDocument sub to extract the relevant titles however you need to. I don't have much confidence that searching for Times New Roman 12pt is an approach that will work consistently but changing the code to look for those attributes is easy enough.
Code:
Sub ProcessDocument(objWordDoc As Document, aSheet As Worksheet) Dim aRng As Word.Range, sFound As String, iRow As Integer Set aRng = objWordDoc.Content With aRng.Find .ClearFormatting .Font.Name = "Times New Roman" .Font.Bold = True .Forward = True .Wrap = wdFindStop .Text = "" If .Execute = True Then sFound = aRng.Text iRow = aSheet.UsedRange.Rows.Count + 1 sFound = Trim(Replace(sFound, vbCr, "")) 'replace paragraph marks with a space aSheet.Cells(iRow, 1).Value = sFound aSheet.Cells(iRow, 2).Value = objWordDoc.Name aRng.Start = aRng.End Do While .Execute = True sFound = Trim(Replace(aRng.Text, vbCr, "")) aSheet.Cells(iRow, 1).Value = aSheet.Cells(iRow, 1).Value & " " & sFound aRng.Start = aRng.End Loop End If End With End Sub
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#10
|
|||
|
|||
You sir, are the best! That worked beautifully!
Although I am curious what you think would make it better. I know I suggested searching based on the properties but that is because I couldn't think of a better solution. I am totally open to a more effective option, shoot my original creation was a jumbled mess of word macros before you helped. This is a WAY more elegant solution, so please feel free to give me feedback. I am always trying to learn what I can, but my knowledge is all self taught through forums like this and youtube. If you don't have confidence in it let me know what I should look into. Thanks again though! |
#11
|
||||
|
||||
Macros require consistency in order to work consistently. If ALL your documents have exactly the same setup and formatting of titles never varies (and the same formatting is not used elsewhere). An alternative is to harvest paragraphs 2-4 from each document but again this is an indirect method of grabbing the info.
Creating code that searches for indirect arbitrary conditions (like TNR, 12pt, Bold) means that some documents created by knowledgeable but unaware authors are liable to not return results when they should and others are liable to return other content that they shouldn't. The best solution would be to pre-check and standardise all your documents to ensure that the Title is defined in a non-ambiguous way. Ideally this would make use of a document property or content control. But it could also be searching for a style which is ONLY used for Title information. The point of the macro including the filename column in the output is to provide a way for you to visually identify files which failed the consistency test so you can correct just those ones and make a second pass over the files. If a file doesn't appear in the Excel list then you know it didn't contain any text according to the conditions searched. If it does appear but contains too much text (or not the title info) then the formatting is incorrectly applied.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#12
|
|||
|
|||
You are correct. Myself and one other have really been trying to standardize everything as we get it but there is still a lot out there that we won't be able to touch until it comes across our desk. That's why we try whenever possible to find ways to identify these with the tools we have so we can correct them to weed them out slowly. Which you have been a great help in as well. On a side note, if I wanted to exclude the content in tables from the search how could I modify the code to do that? I thought I managed to find some code but I can't seem to make it work. Then I remembered I'm not looking for MS Word code since this is running in Excel, so I didn't quite know what to search for.
|
#13
|
||||
|
||||
You can test the found ranges to see if they are contained in a table
Code:
If Not aRng.Information(wdWithInTable) Then
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
Tags |
macro find text, select text, vba |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Macro/VBA code to select ALL text in a textbox in microsoft excel and add a new row | jyfuller | Excel Programming | 11 | 06-01-2015 08:49 PM |
Macro to select an { includepicture } field code and format the picture behind text and 100% scale | sanpedro | Word VBA | 3 | 03-30-2015 10:50 PM |
Microsoft Word macro to find text, select all text between brackets, and delete | helal1990 | Word VBA | 4 | 02-05-2015 03:52 PM |
VBA code for Microsoft Word macro — select text and insert footnote | ndnd | Word VBA | 10 | 01-06-2015 01:47 PM |
How to find and select text in a document? | mkhuebner | Word VBA | 8 | 02-04-2014 08:04 PM |