![]() |
#1
|
|||
|
|||
![]()
This post is a continuation of a discuss that started here:
https://www.msofficeforums.com/word-...x-pattern.html Batman1, Replying to your: "There is nothing that says that a number that has 4 or 5 digits has to be a word. So, it can be part of a word. Besides, your code will find 3452 in the string "a3452b" (and correctly), so it should find 12456 (1245 also meets the criteria, but we want the longest result possible) in the string "124567". Besides, if data = "13452234567" the code will not find 34522" Starting from your original challenge: "Find in the sequence all numbers consisting of 4 or 5 digits, whose first 4 digits must have at least 1 digit 2" 1. I would not consider "13452234567" to be a number consisting of 4 or 5 digits. I consider that to be a number consisting of 11 digits and my original reply treated it as such. 2. There was nothing in my original reply that implied that the 4 or 5 digit numbers "had" to be a word. 3. Now if one wishes to find 34522 embedded in a larger sequence of numbers that is certainly possible. 4. If our goal is to find 34522 in 13452234567, then do we also want to A) find 45223, 52234, 22345 and 23456 which are also number segments embedded in the larger 11 digit string or B) Find the first match in the sequence and escape or C) Continue looking for matches in even longer strings e.g., "1345223456712345" and return both 34522 and 67123 Code:
Sub ScratchMacroA() Dim oRng As Range Set oRng = ActiveDocument.Range With oRng.Find .Text = "[0-9]{1,}" .MatchWildcards = True While .Execute If InStr(Left(oRng.Text, 4), "2") > 0 Then Select Case Len(oRng.Text) Case Is > 4 Debug.Print Left(oRng.Text, 5) Case Is = 4 Debug.Print oRng.Text End Select End If oRng.End = oRng.Start + 1 Wend End With lbl_Exit: Exit Sub End Sub Sub ScratchMacroB() Dim oRng As Range Set oRng = ActiveDocument.Range With oRng.Find .Text = "[0-9]{1,}" .MatchWildcards = True Do While .Execute If InStr(Left(oRng.Text, 4), "2") > 0 Then Select Case Len(oRng.Text) Case Is > 4 Debug.Print Left(oRng.Text, 5) Exit Do Case Is = 4 Debug.Print oRng.Text Exit Do End Select End If oRng.End = oRng.Start + 1 Loop End With lbl_Exit: Exit Sub End Sub Sub ScratchMacroC() Dim oRng As Range Set oRng = ActiveDocument.Range Dim lngIndex As Long With oRng.Find .Text = "[0-9]{1,}" .MatchWildcards = True While .Execute lngIndex = 1 If InStr(Left(oRng.Text, 4), "2") > 0 Then Select Case Len(oRng.Text) Case Is > 4 Debug.Print Left(oRng.Text, 5) lngIndex = 5 Case Is = 4 Debug.Print oRng.Text lngIndex = 4 End Select End If oRng.End = oRng.Start + lngIndex Wend End With lbl_Exit: Exit Sub End Sub ![]() |
#2
|
|||
|
|||
![]()
Yes, Greg, I was trying to ask similar questions. Does it mean that the code should find all combinations of 4 and 5-digit numbers with 2 in first through fourth positions? Tough task!
|
#3
|
||||
|
||||
![]() Quote:
Quote:
Quote:
Quote:
Code:
Sub test1() Dim oRng As Range Set oRng = ActiveDocument.Range With oRng.Find .text = "aga" Do While .Execute Debug.Print oRng.text Loop End With End Sub We will get the result = "Aga", there is no second result "aga". So after finding one result, further search will occur after this result. Similarly, in my task, further search will occur after the current result, which means it should be C - results = 34522 and 67123 |
#4
|
|||
|
|||
![]() Quote:
In my post #26 I quoted vivek (he later deleted it): "Sometimes regex is more functional than Word standard instruments". And in my opinion there is a code with Regex simpler than Sub ScratchMacroC |
#5
|
|||
|
|||
![]()
Batman1,
English not being your first language, it is understandable that things aren't always are clearly expressed or understood and that is a two way street. Not to dispute you, but I would like to see a RegEx pattern that returns the same result as the Find\Do While Execute in ScatchMacroC. ScratchMacroC is simple to me because and am much more familiar with the Word Find method and range object than I am with RegEx. I have no doubt that with the right RegEx pattern that processing the data would be more efficient. |
#6
|
|||
|
|||
![]() Quote:
Code from Regex e.g. Code:
Sub test() Dim RegEx As Object, Matches As Object, Match As Object Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern ="(?=\d{0,3}2)\d{4,5}" End With Set Matches = RegEx.Execute(ActiveDocument.Range.text) For Each Match In Matches Debug.Print Match.Value Next End Sub |
#7
|
|||
|
|||
![]()
Batman1,
No denying that your RegEx code and pattern is simple compared to the Find method. Both your Test() and my ScratchMacroC() return the same results. But, how do you then process the Match object? Let's say that we not only want to Debug.Print the Match.Value, we want to highlight it in the text as well: Code:
Sub ScratchMacroC() Dim oRng As Range, oRngProcess As Range Set oRng = ActiveDocument.Range Dim lngIndex As Long With oRng.Find .Text = "[0-9]{1,}" .MatchWildcards = True While .Execute lngIndex = 1 Set oRngProcess = Nothing If InStr(Left(oRng.Text, 4), "2") > 0 Then Set oRngProcess = oRng.Duplicate Select Case Len(oRng.Text) Case Is > 4 Debug.Print Left(oRng.Text, 5) oRngProcess.End = oRngProcess.Start + 5 lngIndex = 5 Case Is = 4 Debug.Print oRng.Text oRngProcess.End = oRngProcess.Start + 4 lngIndex = 4 End Select End If oRng.End = oRng.Start + lngIndex If Not oRngProcess Is Nothing Then oRngProcess.HighlightColorIndex = wdBrightGreen Wend End With lbl_Exit: Exit Sub End Sub EDIT: Well, I won't say this is the best way, but I think I may have answered my own question: Code:
Sub RegExMethod() Dim RegEx As Object, Matches As Object, Match As Object Dim oRng As Range Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?=(\d){0,3}2)\d{4,5}" End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches Set oRng = ActiveDocument.Range With oRng.Find .Text = Match.Value If .Execute Then oRng.HighlightColorIndex = wdYellow End If End With Next lbl_Exit: Exit Sub End Sub Now, would you mind breaking down your pattern and explaining how it is constructed? |
#8
|
|||
|
|||
![]() Quote:
Code:
Sub test() Dim RegEx As Object, Matches As Object, Match As Object Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?=\d{0,3}2)\d{4,5}" End With Set Matches = RegEx.Execute(ActiveDocument.Range.text) For Each Match In Matches ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen Debug.Print Match.Value Next End Sub Quote:
.Pattern = "(?=pattern1)pattern2" means "find the result of pattern2 whose initial part is pattern1" It is "zero-width positive lookahead assertion". There is and "zero-width negative lookahead assertion" 1. zero-width positive lookahead assertion "(?=pattern1)pattern2" - find the result of pattern2 whose initial part is pattern1 pattern2(?=pattern1) - find the result of pattern 2, followed (in further data input) by a string of pattern 1 2. zero-width negative lookahead assertion (?!pattern1)pattern2 - find the result of pattern 2 whose initial part is not in the form pattern1 pattern2(?!pattern1) - find a result of the form pattern2 after which (in further input data) there is nothing of the form pattern1 Read about zero-width positive lookahead assertion and zero-width negative lookahead assertion on the internet. |
#9
|
|||
|
|||
![]()
Batman1
I consider myself schooled. Thank you for the lesson. I was reading up on look ahead and look behind assertions last night before reading your reply this morning. I fully understand your pattern now and this my code with my own notes. Code:
Sub RegExMethod() Dim RegEx As Object, Matches As Object, Match As Object Dim oRng As Range Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?=\d{0,3}2)\d{4,5}" 'Positive lookahead. 'y = (?=\d{0,3}2) x = \d{4,5} 'Match second part x only if first part y matches. 'y matches 2, #2, ##2 or ###2 'x matches 2####, #2###, ##2##, and ###2# or 2###, #2##, ##2# and ###2 End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen Next lbl_Exit: Exit Sub End Sub Now, in reading about the look ahead and look behind assertions earlier, I came across an example: Pattern = "(?<=Tom|Jack)Sprat" 'Look behind assertions. - matches "x" only if "x" is preceded by "y". For example, (?<=Jack|Tom)Sprat matches "Sprat" only if it is preceded by "Tom" or "Jack." However, neither "Jack" nor "Tom" is part of the match results. Sample text: Jack Sprat, Johnny Sprat, Tom Sprat, Mary Sprat and Bill Sprat. I learned (after some frustration) that look behind assertions are unfortunately not supported in VBA. So, how do achieve the same results with VBA? Here is a look ahead that comes close, but unlike the look behind described above, it returns both parts y and x. Code:
Sub RegEx() Dim RegEx As Object, Matches As Object, Match As Object Dim oRng As Range Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?=Tom|Jack)\S+ Sprat" End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches 'Returns Jack Sprat and Tom Sprat (Not just Sprat like lookback) ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).Select Next lbl_Exit: Exit Sub End Sub |
#10
|
|||
|
|||
![]() Quote:
As far as I remember, in VBA you can only use look ahead, you can't use look behind. I tested in Excel some time ago. Maybe in Perl, Python, but not in VBA. That's why I didn't mention look behind in the last post Since you can't use look behind, you have to experiment. In this example, you can e.g. Code:
Sub RegEx() Dim RegEx As Object, Matches As Object, Match As Object Dim oRng As Range Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?=Tom|Jack)\S+ (Sprat)" End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches ActiveDocument.Range(Match.FirstIndex + Match.Length - Len(Match.SubMatches(0)), Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen Debug.Print Match.SubMatches(0) Next lbl_Exit: Exit Sub End Sub |
#11
|
|||
|
|||
![]()
The first thing I did when I came home from work today was visiting this thread. Greg and Batman1, I take off my hat to both of you!
|
#12
|
|||
|
|||
![]()
Batman1,
Very interesting. After studying your last (which works perfectly), I thought I would modify slightly just to confirm my understanding. The result is a little simpler for me to understand and will provide notes for myself should I ever need to revisit. Thanks again for your lessons in RegEx: Code:
Sub RegExPsuedoLookBack() Dim RegEx As Object, Matches As Object, Match As Object Dim oRng As Range 'Sample document text: Jack Sprat, Johnny Sprat, Tom Sprat, Mary Sprat and Bill Sprat. 'With RegEx look ahead, you can find all instances of x = "Sprat" preceded by either y = "Jack " or y = "Tom " Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?=Tom|Jack)\S+ Sprat" End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches 'Returns all instances of "Sprat" immediately after "Tom " or "Jack ". Debug.Print Match.Value 'Notice the match return includes the proceeding y qualifyer. Next 'A RegEx look back will return all instances of x = "Sprat" preceded by y = "Jack " or y = "Tom " but the match return only inclues the x value "Sprat" Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "(?<=Tom |Jack )\Sprat" End With On Error GoTo lbl_EH Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches 'If this worked, it would return all instances of "Sprat" immediately after "Tom " or "Jack ". Debug.Print Match.Value Next lbl_ER: 'VBA workaround for RegEx look back. With RegEx .Global = True 'Use a modified look ahead as demonstrated previously. .Pattern = "((?=Tom|Jack)\S+ )Sprat" 'Note the addition of parens "()" around the y element will result in a submatch. End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches 'Returns x = "Jack Sprat" and "Tom Sprat" matches at before. 'However, also returns the submatches "Jack " from the "Jack Sprat" match and "Tom " from the "Tom Sprat" match. 'Use these manipulate the returned range to achieve a VBA pseudo lookback method. ActiveDocument.Range(Match.FirstIndex + Len(Match.Submatches(0)), Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen Next lbl_Exit: Exit Sub lbl_EH: MsgBox "Unfortunatley RegEx look back is not supported in VBA." & vbCr + vbCr _ & "Proceding to work around.", vbInformation + vbOKOnly, "INVALID OPERATION" Resume lbl_ER End Sub Please feel free to further comment, if my notes above could be clarified further. |
#13
|
|||
|
|||
![]() Quote:
With Backreferences: 1. You can use submatch 2. You can refer to an earlier group later in the Pattern () 3. You can use submatch in REPLACE Note 1. You already use it in your code. Note 2. Pattern = "([A-Za-z]+) \1" Pattern = <series1><space><series2>, where <series1> and <series2> are identical E.g. Pattern = "([A-Za-z]+) \1" will find "a a", "John John". If .IgnoreCase = True then it will also find "A a" a Pattern = "([A-Za-z]+) ([A-Za-z]+) \2 \1" will find "Jack Sprat Sprat Jack Note 3. In Repace we don't use \1, \2, … We use $1, $2, … Code:
Sub test2() Dim RegEx As Object, Matches As Object, Match As Object, text As String text = "Fellow John John is a good student. Fellow Jack Jack also" Debug.Print text Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = "([A-Za-z]+) \1" If .test(text) Then text = .Replace(text, "$1") ' in every place in string Text where a result is found, that result is replaced with its $1 Debug.Print text End If End With text = "John Smith is a good student. Jack Sprat also" With RegEx .Global = True .Pattern = "([A-Z][a-z]+) ([A-Z][a-z]+)" If .test(text) Then text = .Replace(text, "$2 $1") Debug.Print text End If End With End Sub |
#14
|
|||
|
|||
![]()
Interesting. Thanks for the additional tips!
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Any work-around for this challenge | RRB | Word | 6 | 08-22-2023 11:47 AM |
Formula challenge 3 criteria | JMC44 | Excel Programming | 0 | 04-04-2015 07:38 PM |
![]() |
e.roberts | Office | 3 | 07-13-2012 01:36 AM |
![]() |
gbaker | Excel Programming | 11 | 06-22-2012 09:39 AM |
![]() |
jpotter2 | Word | 3 | 03-22-2011 02:07 PM |