Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Closed Thread
 
Thread Tools Display Modes
  #1  
Old 12-30-2024, 07:56 AM
mstde mstde is offline Regex-pattern Windows 11 Regex-pattern Office 2019
Novice
Regex-pattern
 
Join Date: Dec 2024
Posts: 2
mstde is on a distinguished road
Default Regex-pattern

Hi,

i wouild like to use a regex-pattern to identify names in a given text. The structure of the text is always the same. The section with the name in a sentence starts with a comma, the comes the name, another comma and further text.
I cannot figure out, how to stop at the first comma after the name:

some text some text, Pep Guardiola, some text some text, some text some text.

my pattern is: ,\s.*, - but it matches up to the third coma: , Pep Guardiola, some text some text,

I tried it on regex101.com for a while, but i cannot figure it out
How is the pattern to be formulated, to match just , Pep Guardiola, ?



Any help is appreciated.

Regards

Michael
  #2  
Old 12-30-2024, 03:13 PM
macropod's Avatar
macropod macropod is offline Regex-pattern Windows 10 Regex-pattern Office 2016
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,384
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Why not use Word's wildcard Find method, for which you could have:
Find = , [A-Z][a-z]@ [A-Z][a-z]@,
This ensures you'll only find a comma, followed by a space, then a proper-case word, followed by a space, then another proper-case word, followed by a comma.

A macro equivalent to italicize all such text (since I don't know what you want to do with what you find) would be:
Code:
Sub Demo()
Application.ScreenUpdating = False
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Italic = True
    .Text = ", [A-Z][a-z]@ [A-Z][a-z]@,"
    .Replacement.Text = "^&"
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
  End With
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
  #3  
Old 12-31-2024, 03:33 AM
mstde mstde is offline Regex-pattern Windows 11 Regex-pattern Office 2019
Novice
Regex-pattern
 
Join Date: Dec 2024
Posts: 2
mstde is on a distinguished road
Default

Hi,
thank you very much for your replies.

@macropod: in my example the name consisted of just two words, but there can be other variants, too:

, Hans-Gerd Müller,
, Thomas Gerd Müller,
, Beate van Ackeren,
, Louise van der Ackeren,
, Gerd Müller-Lüdenscheid,
, Hans Gerd Müller-Lüdenscheid,

All of them have in common, that the first delimeter is a comma followed by a space and an uppercase letter and the delimeter at the end of all variants is the second comma, so i went for regex to catch all.

Would your method find these variants, too?

Regards Michael
  #4  
Old 12-31-2024, 04:01 AM
macropod's Avatar
macropod macropod is offline Regex-pattern Windows 10 Regex-pattern Office 2016
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,384
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by mstde View Post
@macropod: in my example the name consisted of just two words, but there can be other variants, too:
..

Would your method find these variants, too?
No, it wouldn't, because it was written to exclude anything other than the two-word kind of example you specified.

If you want something that will catch all the examples you have now given, you could use:
Find =, [A-Z][!,]@ [A-Z][!,]@,

This will capture all instances of a comma, followed by a space, then a proper-case word, any number of characters other than a comma, finally a space and another proper-case word (hyphenated or otherwise) before a comma.

Again, the macro equivalent to italicize all such text (since I still don't know what you want to do with what you find) would be:
Code:
Sub Demo()
Application.ScreenUpdating = False
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Italic = True
    .Text = ", [A-Z][!,]@ [A-Z][!,]@,"
    .Replacement.Text = "^&"
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
  End With
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
  #5  
Old 01-01-2025, 01:18 PM
vivka vivka is offline Regex-pattern Windows 7 64bit Regex-pattern Office 2016
Expert
 
Join Date: Jul 2023
Posts: 293
vivka is on a distinguished road
Default

While Paul's code does the job perfectly, I can propose a regex code to match any number of only first-upper-cased-letter words between two commas:
Code:
,(\s[A-Z][a-z]+)+,
I am really grateful to Paul (and gmayor, gmaxey, guessed, Italophil, to name just some of the experts on this forum) who are real vba gurus who kindly and patiently share their immense knowledge.

Last edited by vivka; 01-02-2025 at 12:45 PM.
  #6  
Old 01-02-2025, 10:14 AM
gmaxey gmaxey is offline Regex-pattern Windows 10 Regex-pattern Office 2019
Expert
 
Join Date: May 2010
Location: Brasstown, NC
Posts: 1,601
gmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nice
Default

Vivka,


Thank you for your kind words. I was not able to get you suggestion to work. While it will match:


Test, Joe Smith, and test, test.
or
Test, Bob Miller, and test, text.
It will not match any of the examples given in post #3.


Consider:
Code:
Sub ScratchMacro()
'A basic Word Macro coded by Gregory K. Maxey
Dim strPattern As String
Dim RegEx As RegExp, Matches As Object, Match As Object
Dim oPar As Paragraph
  'strPattern = ",(\s[A-Z][a-z]+)+," 'Vivka's
  strPattern = ",\s[A-Z][^,]*," 'Revised
  Set RegEx = CreateObject("vbscript.regexp")
  For Each oPar In ActiveDocument.Range.Paragraphs
    With RegEx
      .Pattern = strPattern
      Set Matches = RegEx.Execute(oPar.Range.Text)
      For Each Match In Matches
        Debug.Print Match.Value
      Next
    End With
  Next oPar
lbl_Exit:
  Exit Sub
End Sub
__________________
Greg Maxey
Please visit my web site at http://www.gregmaxey.com/
  #7  
Old 01-02-2025, 12:21 PM
vivka vivka is offline Regex-pattern Windows 7 64bit Regex-pattern Office 2016
Expert
 
Join Date: Jul 2023
Posts: 293
vivka is on a distinguished road
Default

Thank you, Greg, for your improvements! I tested my code only on the range made up of words that begin with a capital letter and that are delimited by space. Live and learn (this is my motto). Each your Scratch macro makes me scratch the back of my head in amazement.
  #8  
Old 01-02-2025, 12:59 PM
batman1 batman1 is offline Regex-pattern Windows 11 Regex-pattern Office 2013
Advanced Beginner
 
Join Date: Jan 2025
Posts: 57
batman1 is on a distinguished road
Default

Quote:
Originally Posted by gmaxey View Post
Test, Joe Smith, and test, test.
or
Test, Bob Miller, and test, text.
It will not match any of the examples given in post #3.

For data structured as in the attached photo, the sample code is below
This is just an example. Everyone should make corrections for their needs
Code:
Sub ScratchMacro()
Dim RegEx As Object, Matches As Object, Match As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .Pattern = ", [A-Z][^, ]+( [A-Za-z][^,]+){1,},"
    End With
    Set Matches = RegEx.Execute(ActiveDocument.Range.Text)
    For Each Match In Matches
        Debug.Print Match.Value
    Next
End Sub
Attached Images
File Type: png regex.png (28.9 KB, 25 views)
  #9  
Old 01-03-2025, 09:21 AM
gmaxey gmaxey is offline Regex-pattern Windows 10 Regex-pattern Office 2019
Expert
 
Join Date: May 2010
Location: Brasstown, NC
Posts: 1,601
gmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nice
Default

Batman1,


Thanks for the post. I see that your suggested pattern "requires" two or more words between the commas to return a match. So


, John, won't match but
, John smith, would.


A RegEx pattern master I am not. I wonder if it is possible to construct the pattern such that the last word (the word before the ending comma) must be captitalized e.g.,


,Gerd van Ackerman, matches
,Gerd van ackeramn, would not
__________________
Greg Maxey
Please visit my web site at http://www.gregmaxey.com/
  #10  
Old 01-03-2025, 10:19 AM
batman1 batman1 is offline Regex-pattern Windows 11 Regex-pattern Office 2013
Advanced Beginner
 
Join Date: Jan 2025
Posts: 57
batman1 is on a distinguished road
Default

Quote:
Originally Posted by gmaxey View Post
Batman1,


Thanks for the post. I see that your suggested pattern "requires" two or more words between the commas to return a match. So


, John, won't match but
, John smith, would.

If you read post #1, #2, #3, #4 you will understand why I wrote: Last name-first name with 2, 3, 4 ... parts, so there should be at least 2 parts.


Quote:

A RegEx pattern master I am not. I wonder if it is possible to construct the pattern such that the last word (the word before the ending comma) must be captitalized e.g.,


,Gerd van Ackerman, matches
,Gerd van ackeramn, would not
With my code ",Gerd van Ackerman," does NOT match. After the first comma there MUST be a space. This is a requirement of this thread.

The following code meets your requirements. I can't write a simpler pattern.

Code:
Sub ScratchMacro()
Dim RegEx As Object, Matches As Object, Match As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .Pattern = ", [A-Z][^, ]+( [^, ]+)*( [A-Z][^,]+)?,"
    End With
    Set Matches = RegEx.Execute(ActiveDocument.Range.text)
    For Each Match In Matches
        Debug.Print Match.Value
    Next
End Sub
  #11  
Old 01-03-2025, 11:33 AM
gmaxey gmaxey is offline Regex-pattern Windows 10 Regex-pattern Office 2019
Expert
 
Join Date: May 2010
Location: Brasstown, NC
Posts: 1,601
gmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nice
Default

Batmat1,


Yes, correct. Omitted space after first comma was typo. Now, testing with your latest version: ", [A-Z][^, ]+( [^, ]+)*( [A-Z][^,]+)?,"
when applied to the following example returns the same matches as my last version: ",\s{A-Z][^,]*,"



Specifically the last instance ", John smith," is returned as a match. How could we prevent that? If you don't mind, can you explain what each part of your pattern is intended to perform?


For others following, with mine it is
1. "," match a comma
2. "\s" match a space
3. "[A-Z]" match a capital letter A to Z
4. "[^,]*" match any characters excluding a comma one or more times
5. "," match a comma
Attached Images
File Type: jpg Example.jpg (92.8 KB, 18 views)
__________________
Greg Maxey
Please visit my web site at http://www.gregmaxey.com/
  #12  
Old 01-03-2025, 02:36 PM
batman1 batman1 is offline Regex-pattern Windows 11 Regex-pattern Office 2013
Advanced Beginner
 
Join Date: Jan 2025
Posts: 57
batman1 is on a distinguished road
Default

Quote:
Originally Posted by gmaxey View Post
Batmat1,


Specifically the last instance ", John smith," is returned as a match. How could we prevent that?

Code:
Sub ScratchMacro()
Dim RegEx As Object, Matches As Object, Match As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .Pattern = ", [A-Z][^, ]+(|( [^, ]+)* [A-Z][^,]+),"
    End With
    Set Matches = RegEx.Execute(ActiveDocument.Range.text)
    For Each Match In Matches
        Debug.Print Match.Value
    Next
End Sub
The author of the thread did not really provide criteria for the input data. If we have requirements for the form of the results, we must also specify the form of the input data. If the input data can be any, we must provide all characters accepted between commas. See that the given code finds the result ", Beate 123-van4 Ackeren," and that is not a surname and name, right?

The code below accepts only characters in CONST characters. Tested with data as in the picture.
Code:
Sub ScratchMacro()
Const characters As String = "[A-Za-zü\-]"
Dim RegEx As Object, Matches As Object, Match As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .Pattern = ", [A-Z]" & characters & "+(|( " & characters & "+)* [A-Z]" & characters & "+),"
    End With

    Set Matches = RegEx.Execute(ActiveDocument.Range.text)
    For Each Match In Matches
        Debug.Print Match.Value
    Next
End Sub

Quote:

If you don't mind, can you explain what each part of your pattern is intended to perform?


For others following, with mine it is
1. "," match a comma
2. "\s" match a space
3. "[A-Z]" match a capital letter A to Z
4. "[^,]*" match any characters excluding a comma one or more times
5. "," match a comma
2. "\s" match a comma, TAB, form-feed, .... equivalent with "[ \f\n\r\t\v]"
Attached Images
File Type: png regex.png (37.2 KB, 25 views)
  #13  
Old 01-04-2025, 06:37 AM
gmaxey gmaxey is offline Regex-pattern Windows 10 Regex-pattern Office 2019
Expert
 
Join Date: May 2010
Location: Brasstown, NC
Posts: 1,601
gmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nicegmaxey is just really nice
Default

Thanks. Not really sure how the "|" works but your pattern appears to work.
__________________
Greg Maxey
Please visit my web site at http://www.gregmaxey.com/
  #14  
Old 01-05-2025, 04:20 AM
vivka vivka is offline Regex-pattern Windows 7 64bit Regex-pattern Office 2016
Expert
 
Join Date: Jul 2023
Posts: 293
vivka is on a distinguished road
Default

Hi! I support Greg's idea about the pipe symbol or upright slash (|), which means "or". It seems redundant because the asterisk (*) that follows the capturing group means zero or unlimited times. So,
Code:
, [A-Z][A-Za-zü\-]+( [A-Za-zü\-]+)* [A-Z][A-Za-zü\-]+,
is OK.
Besides, this also appears to work, although all situations can't be predicted:
Code:
,\s[A-Z][^0-9,]* [A-Z][A-Za-zü\-]+,
  #15  
Old 01-05-2025, 06:00 AM
batman1 batman1 is offline Regex-pattern Windows 11 Regex-pattern Office 2013
Advanced Beginner
 
Join Date: Jan 2025
Posts: 57
batman1 is on a distinguished road
Default

Quote:
Originally Posted by vivka View Post
Hi! I support Greg's idea about the pipe symbol or upright slash (|), which means "or". It seems redundant because the asterisk (*) that follows the capturing group means zero or unlimited times. So,
Code:
, [A-Z][A-Za-zü\-]+( [A-Za-zü\-]+)* [A-Z][A-Za-zü\-]+,
is OK.

I read Greg's post (#9) and understood that he wants to find ", John," in ", John, some text some text", i.e. find the result of the name even when there is 1 word between the characters "". That's why I gave this pattern and not another. If Greg really wants 1 word, then your pattern will not meet his requirement - it will not find ", John," in ", John, some text some text"


It is not without reason that I gave this pattern and not another one
Closed Thread

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex-pattern Capture group in RegEx alex100 Word VBA 1 01-02-2021 02:39 PM
Regex-pattern Regex over 700 matches in a long doc totoMSOF Word VBA 19 03-11-2019 01:28 PM
Regex-pattern Using VB Regex feature, I tried to replace 'the' and 'this' with 'that' but got screwed abdan Word VBA 3 01-18-2019 09:38 PM
Macro help regex subspace3 Word VBA 1 10-15-2014 09:53 AM
Regex in Word: Replaced strings are in disorder chgeiselmann Word 0 04-26-2009 11:33 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 11:38 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft