![]() |
|
#1
|
|||
|
|||
|
This post is a continuation of a discuss that started here:
https://www.msofficeforums.com/word-...x-pattern.html Batman1, Replying to your: "There is nothing that says that a number that has 4 or 5 digits has to be a word. So, it can be part of a word. Besides, your code will find 3452 in the string "a3452b" (and correctly), so it should find 12456 (1245 also meets the criteria, but we want the longest result possible) in the string "124567". Besides, if data = "13452234567" the code will not find 34522" Starting from your original challenge: "Find in the sequence all numbers consisting of 4 or 5 digits, whose first 4 digits must have at least 1 digit 2" 1. I would not consider "13452234567" to be a number consisting of 4 or 5 digits. I consider that to be a number consisting of 11 digits and my original reply treated it as such. 2. There was nothing in my original reply that implied that the 4 or 5 digit numbers "had" to be a word. 3. Now if one wishes to find 34522 embedded in a larger sequence of numbers that is certainly possible. 4. If our goal is to find 34522 in 13452234567, then do we also want to A) find 45223, 52234, 22345 and 23456 which are also number segments embedded in the larger 11 digit string or B) Find the first match in the sequence and escape or C) Continue looking for matches in even longer strings e.g., "1345223456712345" and return both 34522 and 67123 Code:
Sub ScratchMacroA()
Dim oRng As Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "[0-9]{1,}"
.MatchWildcards = True
While .Execute
If InStr(Left(oRng.Text, 4), "2") > 0 Then
Select Case Len(oRng.Text)
Case Is > 4
Debug.Print Left(oRng.Text, 5)
Case Is = 4
Debug.Print oRng.Text
End Select
End If
oRng.End = oRng.Start + 1
Wend
End With
lbl_Exit:
Exit Sub
End Sub
Sub ScratchMacroB()
Dim oRng As Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "[0-9]{1,}"
.MatchWildcards = True
Do While .Execute
If InStr(Left(oRng.Text, 4), "2") > 0 Then
Select Case Len(oRng.Text)
Case Is > 4
Debug.Print Left(oRng.Text, 5)
Exit Do
Case Is = 4
Debug.Print oRng.Text
Exit Do
End Select
End If
oRng.End = oRng.Start + 1
Loop
End With
lbl_Exit:
Exit Sub
End Sub
Sub ScratchMacroC()
Dim oRng As Range
Set oRng = ActiveDocument.Range
Dim lngIndex As Long
With oRng.Find
.Text = "[0-9]{1,}"
.MatchWildcards = True
While .Execute
lngIndex = 1
If InStr(Left(oRng.Text, 4), "2") > 0 Then
Select Case Len(oRng.Text)
Case Is > 4
Debug.Print Left(oRng.Text, 5)
lngIndex = 5
Case Is = 4
Debug.Print oRng.Text
lngIndex = 4
End Select
End If
oRng.End = oRng.Start + lngIndex
Wend
End With
lbl_Exit:
Exit Sub
End Sub
|
|
#2
|
|||
|
|||
|
Yes, Greg, I was trying to ask similar questions. Does it mean that the code should find all combinations of 4 and 5-digit numbers with 2 in first through fourth positions? Tough task!
|
|
#3
|
||||
|
||||
|
Quote:
Quote:
Quote:
Quote:
Code:
Sub test1()
Dim oRng As Range
Set oRng = ActiveDocument.Range
With oRng.Find
.text = "aga"
Do While .Execute
Debug.Print oRng.text
Loop
End With
End Sub
We will get the result = "Aga", there is no second result "aga". So after finding one result, further search will occur after this result. Similarly, in my task, further search will occur after the current result, which means it should be C - results = 34522 and 67123 |
|
#4
|
|||
|
|||
|
Quote:
In my post #26 I quoted vivek (he later deleted it): "Sometimes regex is more functional than Word standard instruments". And in my opinion there is a code with Regex simpler than Sub ScratchMacroC |
|
#5
|
|||
|
|||
|
Batman1,
English not being your first language, it is understandable that things aren't always are clearly expressed or understood and that is a two way street. Not to dispute you, but I would like to see a RegEx pattern that returns the same result as the Find\Do While Execute in ScatchMacroC. ScratchMacroC is simple to me because and am much more familiar with the Word Find method and range object than I am with RegEx. I have no doubt that with the right RegEx pattern that processing the data would be more efficient. |
|
#6
|
|||
|
|||
|
Quote:
Code from Regex e.g. Code:
Sub test()
Dim RegEx As Object, Matches As Object, Match As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern ="(?=\d{0,3}2)\d{4,5}"
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.text)
For Each Match In Matches
Debug.Print Match.Value
Next
End Sub
|
|
#7
|
|||
|
|||
|
Batman1,
No denying that your RegEx code and pattern is simple compared to the Find method. Both your Test() and my ScratchMacroC() return the same results. But, how do you then process the Match object? Let's say that we not only want to Debug.Print the Match.Value, we want to highlight it in the text as well: Code:
Sub ScratchMacroC()
Dim oRng As Range, oRngProcess As Range
Set oRng = ActiveDocument.Range
Dim lngIndex As Long
With oRng.Find
.Text = "[0-9]{1,}"
.MatchWildcards = True
While .Execute
lngIndex = 1
Set oRngProcess = Nothing
If InStr(Left(oRng.Text, 4), "2") > 0 Then
Set oRngProcess = oRng.Duplicate
Select Case Len(oRng.Text)
Case Is > 4
Debug.Print Left(oRng.Text, 5)
oRngProcess.End = oRngProcess.Start + 5
lngIndex = 5
Case Is = 4
Debug.Print oRng.Text
oRngProcess.End = oRngProcess.Start + 4
lngIndex = 4
End Select
End If
oRng.End = oRng.Start + lngIndex
If Not oRngProcess Is Nothing Then oRngProcess.HighlightColorIndex = wdBrightGreen
Wend
End With
lbl_Exit:
Exit Sub
End Sub
EDIT: Well, I won't say this is the best way, but I think I may have answered my own question: Code:
Sub RegExMethod()
Dim RegEx As Object, Matches As Object, Match As Object
Dim oRng As Range
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?=(\d){0,3}2)\d{4,5}"
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = Match.Value
If .Execute Then
oRng.HighlightColorIndex = wdYellow
End If
End With
Next
lbl_Exit:
Exit Sub
End Sub
Now, would you mind breaking down your pattern and explaining how it is constructed? |
|
#8
|
|||
|
|||
|
Quote:
Code:
Sub test()
Dim RegEx As Object, Matches As Object, Match As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?=\d{0,3}2)\d{4,5}"
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.text)
For Each Match In Matches
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen
Debug.Print Match.Value
Next
End Sub
Quote:
.Pattern = "(?=pattern1)pattern2" means "find the result of pattern2 whose initial part is pattern1" It is "zero-width positive lookahead assertion". There is and "zero-width negative lookahead assertion" 1. zero-width positive lookahead assertion "(?=pattern1)pattern2" - find the result of pattern2 whose initial part is pattern1 pattern2(?=pattern1) - find the result of pattern 2, followed (in further data input) by a string of pattern 1 2. zero-width negative lookahead assertion (?!pattern1)pattern2 - find the result of pattern 2 whose initial part is not in the form pattern1 pattern2(?!pattern1) - find a result of the form pattern2 after which (in further input data) there is nothing of the form pattern1 Read about zero-width positive lookahead assertion and zero-width negative lookahead assertion on the internet. |
|
#9
|
|||
|
|||
|
Batman1
I consider myself schooled. Thank you for the lesson. I was reading up on look ahead and look behind assertions last night before reading your reply this morning. I fully understand your pattern now and this my code with my own notes. Code:
Sub RegExMethod()
Dim RegEx As Object, Matches As Object, Match As Object
Dim oRng As Range
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?=\d{0,3}2)\d{4,5}"
'Positive lookahead.
'y = (?=\d{0,3}2) x = \d{4,5}
'Match second part x only if first part y matches.
'y matches 2, #2, ##2 or ###2
'x matches 2####, #2###, ##2##, and ###2# or 2###, #2##, ##2# and ###2
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen
Next
lbl_Exit:
Exit Sub
End Sub
Now, in reading about the look ahead and look behind assertions earlier, I came across an example: Pattern = "(?<=Tom|Jack)Sprat" 'Look behind assertions. - matches "x" only if "x" is preceded by "y". For example, (?<=Jack|Tom)Sprat matches "Sprat" only if it is preceded by "Tom" or "Jack." However, neither "Jack" nor "Tom" is part of the match results. Sample text: Jack Sprat, Johnny Sprat, Tom Sprat, Mary Sprat and Bill Sprat. I learned (after some frustration) that look behind assertions are unfortunately not supported in VBA. So, how do achieve the same results with VBA? Here is a look ahead that comes close, but unlike the look behind described above, it returns both parts y and x. Code:
Sub RegEx()
Dim RegEx As Object, Matches As Object, Match As Object
Dim oRng As Range
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?=Tom|Jack)\S+ Sprat"
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
'Returns Jack Sprat and Tom Sprat (Not just Sprat like lookback)
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).Select
Next
lbl_Exit:
Exit Sub
End Sub
|
|
#10
|
|||
|
|||
|
Quote:
As far as I remember, in VBA you can only use look ahead, you can't use look behind. I tested in Excel some time ago. Maybe in Perl, Python, but not in VBA. That's why I didn't mention look behind in the last post Since you can't use look behind, you have to experiment. In this example, you can e.g. Code:
Sub RegEx()
Dim RegEx As Object, Matches As Object, Match As Object
Dim oRng As Range
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?=Tom|Jack)\S+ (Sprat)"
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
ActiveDocument.Range(Match.FirstIndex + Match.Length - Len(Match.SubMatches(0)), Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen
Debug.Print Match.SubMatches(0)
Next
lbl_Exit:
Exit Sub
End Sub
|
|
#11
|
|||
|
|||
|
The first thing I did when I came home from work today was visiting this thread. Greg and Batman1, I take off my hat to both of you!
|
|
#12
|
|||
|
|||
|
Batman1,
Very interesting. After studying your last (which works perfectly), I thought I would modify slightly just to confirm my understanding. The result is a little simpler for me to understand and will provide notes for myself should I ever need to revisit. Thanks again for your lessons in RegEx: Code:
Sub RegExPsuedoLookBack()
Dim RegEx As Object, Matches As Object, Match As Object
Dim oRng As Range
'Sample document text: Jack Sprat, Johnny Sprat, Tom Sprat, Mary Sprat and Bill Sprat.
'With RegEx look ahead, you can find all instances of x = "Sprat" preceded by either y = "Jack " or y = "Tom "
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?=Tom|Jack)\S+ Sprat"
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
'Returns all instances of "Sprat" immediately after "Tom " or "Jack ".
Debug.Print Match.Value 'Notice the match return includes the proceeding y qualifyer.
Next
'A RegEx look back will return all instances of x = "Sprat" preceded by y = "Jack " or y = "Tom " but the match return only inclues the x value "Sprat"
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "(?<=Tom |Jack )\Sprat"
End With
On Error GoTo lbl_EH
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
'If this worked, it would return all instances of "Sprat" immediately after "Tom " or "Jack ".
Debug.Print Match.Value
Next
lbl_ER:
'VBA workaround for RegEx look back.
With RegEx
.Global = True
'Use a modified look ahead as demonstrated previously.
.Pattern = "((?=Tom|Jack)\S+ )Sprat" 'Note the addition of parens "()" around the y element will result in a submatch.
End With
Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
'Returns x = "Jack Sprat" and "Tom Sprat" matches at before.
'However, also returns the submatches "Jack " from the "Jack Sprat" match and "Tom " from the "Tom Sprat" match.
'Use these manipulate the returned range to achieve a VBA pseudo lookback method.
ActiveDocument.Range(Match.FirstIndex + Len(Match.Submatches(0)), Match.FirstIndex + Match.Length).HighlightColorIndex = wdBrightGreen
Next
lbl_Exit:
Exit Sub
lbl_EH:
MsgBox "Unfortunatley RegEx look back is not supported in VBA." & vbCr + vbCr _
& "Proceding to work around.", vbInformation + vbOKOnly, "INVALID OPERATION"
Resume lbl_ER
End Sub
Please feel free to further comment, if my notes above could be clarified further. |
|
#13
|
|||
|
|||
|
Quote:
With Backreferences: 1. You can use submatch 2. You can refer to an earlier group later in the Pattern () 3. You can use submatch in REPLACE Note 1. You already use it in your code. Note 2. Pattern = "([A-Za-z]+) \1" Pattern = <series1><space><series2>, where <series1> and <series2> are identical E.g. Pattern = "([A-Za-z]+) \1" will find "a a", "John John". If .IgnoreCase = True then it will also find "A a" a Pattern = "([A-Za-z]+) ([A-Za-z]+) \2 \1" will find "Jack Sprat Sprat Jack Note 3. In Repace we don't use \1, \2, … We use $1, $2, … Code:
Sub test2()
Dim RegEx As Object, Matches As Object, Match As Object, text As String
text = "Fellow John John is a good student. Fellow Jack Jack also"
Debug.Print text
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.Pattern = "([A-Za-z]+) \1"
If .test(text) Then
text = .Replace(text, "$1") ' in every place in string Text where a result is found, that result is replaced with its $1
Debug.Print text
End If
End With
text = "John Smith is a good student. Jack Sprat also"
With RegEx
.Global = True
.Pattern = "([A-Z][a-z]+) ([A-Z][a-z]+)"
If .test(text) Then
text = .Replace(text, "$2 $1")
Debug.Print text
End If
End With
End Sub
|
|
#14
|
|||
|
|||
|
Interesting. Thanks for the additional tips!
|
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Any work-around for this challenge | RRB | Word | 6 | 08-22-2023 11:47 AM |
| Formula challenge 3 criteria | JMC44 | Excel Programming | 0 | 04-04-2015 07:38 PM |
Challenge for a genius ?
|
e.roberts | Office | 3 | 07-13-2012 01:36 AM |
Sorting Challenge
|
gbaker | Excel Programming | 11 | 06-22-2012 09:39 AM |
Word Challenge
|
jpotter2 | Word | 3 | 03-22-2011 02:07 PM |