![]() |
|
#1
|
|||
|
|||
![]()
Hi,
i wouild like to use a regex-pattern to identify names in a given text. The structure of the text is always the same. The section with the name in a sentence starts with a comma, the comes the name, another comma and further text. I cannot figure out, how to stop at the first comma after the name: some text some text, Pep Guardiola, some text some text, some text some text. my pattern is: ,\s.*, - but it matches up to the third coma: , Pep Guardiola, some text some text, I tried it on regex101.com for a while, but i cannot figure it out ![]() How is the pattern to be formulated, to match just , Pep Guardiola, ? Any help is appreciated. Regards Michael |
#2
|
||||
|
||||
![]()
Why not use Word's wildcard Find method, for which you could have:
Find = , [A-Z][a-z]@ [A-Z][a-z]@, This ensures you'll only find a comma, followed by a space, then a proper-case word, followed by a space, then another proper-case word, followed by a comma. A macro equivalent to italicize all such text (since I don't know what you want to do with what you find) would be: Code:
Sub Demo() Application.ScreenUpdating = False With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Replacement.Font.Italic = True .Text = ", [A-Z][a-z]@ [A-Z][a-z]@," .Replacement.Text = "^&" .Forward = True .Wrap = wdFindContinue .Format = True .MatchWildcards = True .Execute Replace:=wdReplaceAll End With End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#3
|
|||
|
|||
![]()
Hi,
thank you very much for your replies. @macropod: in my example the name consisted of just two words, but there can be other variants, too: , Hans-Gerd Müller, , Thomas Gerd Müller, , Beate van Ackeren, , Louise van der Ackeren, , Gerd Müller-Lüdenscheid, , Hans Gerd Müller-Lüdenscheid, All of them have in common, that the first delimeter is a comma followed by a space and an uppercase letter and the delimeter at the end of all variants is the second comma, so i went for regex to catch all. Would your method find these variants, too? Regards Michael |
#4
|
||||
|
||||
![]() Quote:
If you want something that will catch all the examples you have now given, you could use: Find =, [A-Z][!,]@ [A-Z][!,]@, This will capture all instances of a comma, followed by a space, then a proper-case word, any number of characters other than a comma, finally a space and another proper-case word (hyphenated or otherwise) before a comma. Again, the macro equivalent to italicize all such text (since I still don't know what you want to do with what you find) would be: Code:
Sub Demo() Application.ScreenUpdating = False With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Replacement.Font.Italic = True .Text = ", [A-Z][!,]@ [A-Z][!,]@," .Replacement.Text = "^&" .Forward = True .Wrap = wdFindContinue .Format = True .MatchWildcards = True .Execute Replace:=wdReplaceAll End With End With Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#5
|
|||
|
|||
![]()
While Paul's code does the job perfectly, I can propose a regex code to match any number of only first-upper-cased-letter words between two commas:
Code:
,(\s[A-Z][a-z]+)+, Last edited by vivka; 01-02-2025 at 12:45 PM. |
#6
|
|||
|
|||
![]()
Vivka,
Thank you for your kind words. I was not able to get you suggestion to work. While it will match: Test, Joe Smith, and test, test. or Test, Bob Miller, and test, text. It will not match any of the examples given in post #3. Consider: Code:
Sub ScratchMacro() 'A basic Word Macro coded by Gregory K. Maxey Dim strPattern As String Dim RegEx As RegExp, Matches As Object, Match As Object Dim oPar As Paragraph 'strPattern = ",(\s[A-Z][a-z]+)+," 'Vivka's strPattern = ",\s[A-Z][^,]*," 'Revised Set RegEx = CreateObject("vbscript.regexp") For Each oPar In ActiveDocument.Range.Paragraphs With RegEx .Pattern = strPattern Set Matches = RegEx.Execute(oPar.Range.Text) For Each Match In Matches Debug.Print Match.Value Next End With Next oPar lbl_Exit: Exit Sub End Sub |
#7
|
|||
|
|||
![]()
Thank you, Greg, for your improvements! I tested my code only on the range made up of words that begin with a capital letter and that are delimited by space. Live and learn (this is my motto). Each your Scratch macro makes me scratch the back of my head in amazement.
|
#8
|
|||
|
|||
![]() Quote:
For data structured as in the attached photo, the sample code is below This is just an example. Everyone should make corrections for their needs Code:
Sub ScratchMacro() Dim RegEx As Object, Matches As Object, Match As Object Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = ", [A-Z][^, ]+( [A-Za-z][^,]+){1,}," End With Set Matches = RegEx.Execute(ActiveDocument.Range.Text) For Each Match In Matches Debug.Print Match.Value Next End Sub |
#9
|
|||
|
|||
![]()
Batman1,
Thanks for the post. I see that your suggested pattern "requires" two or more words between the commas to return a match. So , John, won't match but , John smith, would. A RegEx pattern master I am not. I wonder if it is possible to construct the pattern such that the last word (the word before the ending comma) must be captitalized e.g., ,Gerd van Ackerman, matches ,Gerd van ackeramn, would not |
#10
|
|||
|
|||
![]() Quote:
If you read post #1, #2, #3, #4 you will understand why I wrote: Last name-first name with 2, 3, 4 ... parts, so there should be at least 2 parts. Quote:
The following code meets your requirements. I can't write a simpler pattern. Code:
Sub ScratchMacro() Dim RegEx As Object, Matches As Object, Match As Object Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = ", [A-Z][^, ]+( [^, ]+)*( [A-Z][^,]+)?," End With Set Matches = RegEx.Execute(ActiveDocument.Range.text) For Each Match In Matches Debug.Print Match.Value Next End Sub |
#11
|
|||
|
|||
![]()
Batmat1,
Yes, correct. Omitted space after first comma was typo. Now, testing with your latest version: ", [A-Z][^, ]+( [^, ]+)*( [A-Z][^,]+)?," when applied to the following example returns the same matches as my last version: ",\s{A-Z][^,]*," Specifically the last instance ", John smith," is returned as a match. How could we prevent that? If you don't mind, can you explain what each part of your pattern is intended to perform? For others following, with mine it is 1. "," match a comma 2. "\s" match a space 3. "[A-Z]" match a capital letter A to Z 4. "[^,]*" match any characters excluding a comma one or more times 5. "," match a comma |
#12
|
|||
|
|||
![]() Quote:
Code:
Sub ScratchMacro() Dim RegEx As Object, Matches As Object, Match As Object Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = ", [A-Z][^, ]+(|( [^, ]+)* [A-Z][^,]+)," End With Set Matches = RegEx.Execute(ActiveDocument.Range.text) For Each Match In Matches Debug.Print Match.Value Next End Sub The code below accepts only characters in CONST characters. Tested with data as in the picture. Code:
Sub ScratchMacro() Const characters As String = "[A-Za-zü\-]" Dim RegEx As Object, Matches As Object, Match As Object Set RegEx = CreateObject("VBScript.RegExp") With RegEx .Global = True .Pattern = ", [A-Z]" & characters & "+(|( " & characters & "+)* [A-Z]" & characters & "+)," End With Set Matches = RegEx.Execute(ActiveDocument.Range.text) For Each Match In Matches Debug.Print Match.Value Next End Sub Quote:
|
#13
|
|||
|
|||
![]()
Thanks. Not really sure how the "|" works but your pattern appears to work.
|
#14
|
|||
|
|||
![]()
Hi! I support Greg's idea about the pipe symbol or upright slash (|), which means "or". It seems redundant because the asterisk (*) that follows the capturing group means zero or unlimited times. So,
Code:
, [A-Z][A-Za-zü\-]+( [A-Za-zü\-]+)* [A-Z][A-Za-zü\-]+, Besides, this also appears to work, although all situations can't be predicted: Code:
,\s[A-Z][^0-9,]* [A-Z][A-Za-zü\-]+, |
#15
|
|||
|
|||
![]() Quote:
I read Greg's post (#9) and understood that he wants to find ", John," in ", John, some text some text", i.e. find the result of the name even when there is 1 word between the characters "". That's why I gave this pattern and not another. If Greg really wants 1 word, then your pattern will not meet his requirement - it will not find ", John," in ", John, some text some text" It is not without reason that I gave this pattern and not another one |
![]() |
Thread Tools | |
Display Modes | |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
![]() |
alex100 | Word VBA | 1 | 01-02-2021 02:39 PM |
![]() |
totoMSOF | Word VBA | 19 | 03-11-2019 01:28 PM |
![]() |
abdan | Word VBA | 3 | 01-18-2019 09:38 PM |
Macro help regex | subspace3 | Word VBA | 1 | 10-15-2014 09:53 AM |
Regex in Word: Replaced strings are in disorder | chgeiselmann | Word | 0 | 04-26-2009 11:33 AM |