#1
|
|||
|
|||
Regular expressions and field codes
I have a function which uses a regular expression to search through the document text, which needs to make changes to the found text based on their position in the document.
This works fine, except when there's a field code in the document, the positions of the found text is not correct, and I can't figure out what is offsetting the position (I tried checking the length of the field.result vs the field.code, but the amount of offset doesn't match any combination of these lengths) Here is a boiled down version of the code I am using. Code:
Set re = New RegExp re.Pattern = "(TEXT1)( Text2)? \(text3\)( text4)?(?: text5)?" re.IgnoreCase = True re.Global = True txt = ActiveDocument.range.Text If re.TEST(txt) Then 'get all matches Set allmatches = re.Execute(txt) 'look at each match and hilight corresponding range For Each m In allmatches ' Set new Range startPos = m.FirstIndex endPos = startPos + m.Length Set newRNG = ActiveDocument.Range(start:=startPos, End:=endPos) ' This range is NOT correct if there are fields newRNG.Select ' Code here to process found text if (condition) then ' Edit range here end if Next m End If Is there a proper way to do the regular expression search that will allow me to edit the found ranges when necessary? I don't believe I can use a word 'find' using wildcards, since I need to use the SubMatch values (left out here for brevity), not the full found text, and I don't think wildcards would perform the search I need to use. I hope I have explained my issue correctly, please let me know if there is any more information I need to provide. |
#2
|
|||
|
|||
To hopefully better explain the problem, I use a regular expression to do a search for all matches in the document's text. When I go through all of the matches, if there's a field preceding the match, the FirstIndex value doesn't match the position of the found text in the document's range.
e.g. if the first found match is at 168 in the text, if there's a field before position 168, then the match might be at position 228 in the document. |
#3
|
||||
|
||||
Hi Cosmo,
Can you attach a document to a post with some representative data (delete anything sensitive) demonstrating the problem? You do this via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen. I'm not that familiar with RegEx, but may be able to advise you on how to adjust the ranges to account for the fields.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#4
|
|||
|
|||
Thanks for the response. I'm attaching a demo file with a function 'tetingRegEx'.
For the purposes of this test, I have simplified the regular expression to a simple text pattern, but the one I will be using is much more complicated. This doesn't affect the purpose of this test. Running the function should highlight every instance of 'Lorem' within the document. There are 2 fields (a 'CreateDate' field, and a text form field) after the second paragraph. In the paragraphs after these fields, the text highlighted is not the found text. I figured it was due to a discrepancy between the Field.code ( CREATEDATE \@ "M/d/yyyy" \* MERGEFORMAT ) vs Field.result (4/2/2018), but I don't see any correlation between those numbers. e.g. the date field code is 43 characters, the result is 8 characters. But it seems to offset the found range by 46 characters. If I could find out how to calculate the offset (46 for the date) from each field, then I could loop through all fields that preceed the found text and adjust the position. |
#5
|
|||
|
|||
Just found out while experimenting that the text it highlights is different if I have the field codes toggled open (i.e. the range text includes the field code value instead of the field's result value). Calculating the offset at that point was 1 character more than the date field's result value. But doesn't work with more complicated fields, or with text fields.
Oddly, it doesn't work if I toggle the field codes 'on' in the function, only if they were toggled on manually. Not sure why that would be, but it is yet another annoyance. I'll have to experiment some more next week. I would like to solve this problem, but I might have to settle with merging ALL fields in the document to text before running the find function I need to perform. |
#6
|
||||
|
||||
Perhaps:
Code:
Private Function testingRegEx() Dim re As RegExp Dim txt As String Dim allmatches As MatchCollection, m As Match Set re = New RegExp re.Pattern = "(Lorem)" re.IgnoreCase = True re.Global = True txt = ActiveDocument.Range.Text If re.TEST(txt) Then 'get all matches Set allmatches = re.Execute(txt) 'look at each match and hilight corresponding range For Each m In allmatches With oDoc.Range.Find .ClearFormatting .Replacement.ClearFormatting .Text = m .Replacement.Text = "^&" .Replacement.Highlight = True .Forward = True .Wrap = wdFindStop .Execute Replace:=wdReplaceAll End With Next m End If End Function
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
regular expressions in footnotes | loes | Word | 3 | 09-04-2019 07:52 AM |
Word Regular Expressions: zero or more occurences? | tinfanide | Word | 6 | 09-16-2015 03:13 PM |
Regular Expressions: match words within quotes? | tinfanide | Word VBA | 3 | 02-02-2013 10:07 PM |
regular expressions for empty lines | eNGiNe | Word | 1 | 01-21-2013 06:38 AM |
Regular Expressions: [!0-9] does not work??? | tinfanide | Excel Programming | 4 | 05-30-2012 04:09 AM |