#1
|
||||
|
||||
Find whether string contains hyperlink and Process the same
As part of long process, I want to find following string:
Code:
EP [0-9]{5,} [0-9A-Z]{1,2} if in doc, there is EP 2614045 B1 with hyperlink https://data.epo.org/publication-ser...45&ki=B1&lg=en Code:
Sub EPHYPERLINKDOWN() Dim myURL As String, s As String, s1 As String, StrTxt As String myURL = "https://data.epo.org/publication-server/document?cc=EP&pn=2614045&ki=B1&lg=en" Dim HttpReq As Object, oStrm As Object Set HttpReq = CreateObject("Microsoft.XMLHTTP") HttpReq.Open "GET", myURL, False HttpReq.send s = HttpReq.responseText Set HttpReq = Nothing Set DocNew = Documents.Add With DocNew .Range.Text = s With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Forward = True .Wrap = wdFindContinue .Format = False .MatchWildcards = True .Text = "https://data.epo.org/publication-server/pdf-document*.pdf*.pdf" .Replacement.Text = "" .Execute End With s1 = Replace(.Text, "amp;", "") End With .Close Set DocNew = Nothing Application.ScreenUpdating = True MsgBox (s1) End With End Sub So, I learned middle part but don't know how to run first part and last one. |
#2
|
||||
|
||||
Quote:
EP[!0-9]@[0-9]{7}[!0-9]@B1 Even then, all you'll have found is the part of the hyperlink containing EP 2614045 B1 - not the whole hyperlink. As for your actual Find expression, that doesn't seem to have anything to do with the hyperlink you say you want to find, so I can't really understand what you're trying to do. You also say you want to: Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#3
|
||||
|
||||
Please let me make it clearer.
I have in my document already hyperlinked numbers. One being EP 2614045 B1 having hyperlink https://data.epo.org/publication-ser...45&ki=B1&lg=en There are also other numbers with hyperlinks. I just want to search for EP number. If EP number have already hyperlink then I want to use that hyperlink in my winhttp request. winhttp request will give me one other hyperlink. I want to use that hyperlink to insert hyperlink to that EP number in place of original Hyperlink. |
#4
|
||||
|
||||
Perhaps you could attach a document to a post showing exactly how the content is displayed in it, plus what it is you want to do with that content. As I said, the hyperlinks you have posted cannot be found via the "EP [0-9]{5,} [0-9A-Z]{1,2}" wildcard expression.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#5
|
||||
|
||||
I think I am not good at explaining. That paid off. allowed me to try on my own. I found bit of solution by browsing my earlier Ques.
I recorded macro and then used solution from post to yield following solution. Code:
Sub EPHYPERLINKDOWN() Dim myURL As String, s As String, s1 As String, StrTxt As String Dim HttpReq As Object, oStrm As Object Dim oRng As Range Set oRng = ActiveDocument.Range With oRng.Find Do While .Execute(FindText:="EP [0-9]{5,} [0-9A-Z]{1,2}", MatchWildcards:=True) If Selection.Hyperlinks.Count > 0 Then s = Selection.Hyperlinks(1).Name End If Loop End With MsgBox (s) myURL = s Set HttpReq = CreateObject("Microsoft.XMLHTTP") HttpReq.Open "GET", myURL, False HttpReq.send s = HttpReq.responseText Set HttpReq = Nothing Set DocNew = Documents.Add With DocNew .Range.Text = s With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Forward = True .Wrap = wdFindContinue .Format = False .MatchWildcards = True .Text = "https://data.epo.org/publication-server/pdf-document*.pdf*.pdf" .Replacement.Text = "" .Execute End With s1 = Replace(.Text, "amp;", "") End With .Close Set DocNew = Nothing Application.ScreenUpdating = True MsgBox (s1) End With End Sub Its not looping through all instances Other problem with this code here is that I want to remove original hyperlink with what I get here as s1 at last (I use msgbox to see weather code is working or not). file |
#6
|
||||
|
||||
Try:
Code:
Sub EPHYPERLINKDOWN() Application.ScreenUpdating = False Dim StrTxt As String, HttpReq As Object, i As Long Set HttpReq = CreateObject("Microsoft.XMLHTTP") With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Text = "EP [0-9]{5,} [0-9A-Z]{1,2}" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = True .Execute End With Do While .Find.Found If .Hyperlinks.Count > 0 Then HttpReq.Open "GET", .Hyperlinks(1).Name, False HttpReq.Send StrTxt = HttpReq.ResponseText i = InStr(StrTxt, "https://data.epo.org/publication-server/pdf-document") StrTxt = Mid(StrTxt, i, Len(StrTxt) - i) i = InStr(InStr(StrTxt, ".pdf") + 3, StrTxt, ".pdf") StrTxt = Replace(Left(StrTxt, i + 3), "amp;", "") .Hyperlinks(1).Address = StrTxt End If .Collapse wdCollapseEnd .Find.Execute Loop End With Set HttpReq = Nothing Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#7
|
||||
|
||||
Code was working fine till yesterday, but european patent office have changed their website overnight.
I will have to study the website now for the change have have made. |
#8
|
||||
|
||||
It would have been helpful had you advised that beforehand. Try:
Code:
Sub EPHYPERLINKDOWN() Application.ScreenUpdating = False Dim StrTxt As String, HttpReq As Object, i As Long Set HttpReq = CreateObject("Microsoft.XMLHTTP") With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Text = "EP [0-9]{5,} [0-9A-Z]{1,2}" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = True .Execute End With Do While .Find.Found If .Hyperlinks.Count > 0 Then HttpReq.Open "GET", .Hyperlinks(1).Name, False HttpReq.Send StrTxt = HttpReq.ResponseText i = InStr(StrTxt, "https://data.epo.org/publication-server/pdf-document") If i > 0 Then StrTxt = Mid(StrTxt, i, Len(StrTxt) - i) i = InStr(InStr(StrTxt, ".pdf") + 3, StrTxt, ".pdf") If i > 0 Then StrTxt = Replace(Left(StrTxt, i + 3), "amp;", "") .Hyperlinks(1).Address = StrTxt End If End If End If .Collapse wdCollapseEnd .Find.Execute Loop End With Set HttpReq = Nothing Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#9
|
||||
|
||||
I will take care from next time.
Whenever I see opportunity of automation, I get excited and ask desperately, and forget to mention the conditions in fully. As I have edited the post, You can now see what had happened with me. EOP have changed the website core of the website still works. I now need to generate hyperlink by my self from found text. I will update Accordingly. I think with this code will solve my problem. |
#10
|
||||
|
||||
One last Question in this thread.
I want to extract link similarly from following website. view-source:https://patentscope.wipo.int/search/...b=PCTDocuments Text I want is in following formate "https://mirror.patentscope.wipo.int/patentscope/docservicepdf_pct_mirror/id00000031470708/PAMPH/WO2015177801.pdf" I can find it using Code:
"https://mirror.patentscope.wipo.int/patentscope/docservicepdf_pct_mirror/*.pdf" |
#11
|
||||
|
||||
As this has answered my original question, Can be marked as solved, as the code does the job.
|
#12
|
||||
|
||||
Please don't do that - changing posts that way after they've been replied to destroys the flow of the conversation in the thread. My reply to that post now doesn't make sense.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#13
|
||||
|
||||
Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#14
|
||||
|
||||
Don't know how but this works. If you can just give hint what this does, it would be grateful.
Code:
StrTxt = HttpReq.responseText i = InStr(StrTxt, "https://mirror.patentscope.wipo.int/patentscope/docservicepdf_pct_mirror/") If i > 0 Then StrTxt = Mid(StrTxt, i, Len(StrTxt) - i) i = InStr(StrTxt, ".pdf") If i > 0 Then StrTxt = Left(StrTxt, i + 3) |
#15
|
||||
|
||||
You could find out what these functions do by selecting them and pressing F1 in the VBA help. For example:
• The InStr function returns the starting position of one string within another. Since StrTxt holds the string returned by HttpReq.ResponseText: InStr(StrTxt, https://mirror.patentscope.wipo.int/...df_pct_mirror/) tells us where in that text the string 'https://mirror.patentscope.wipo.int/patentscope/docservicepdf_pct_mirror/' begins. That value is stored as a number in the variable 'i'. If 'i' = 0, the text wasn't found. • The Mid function retrieves the part of a string beginning between one location and another. Thus, StrTxt = Mid(StrTxt, i, Len(StrTxt) - i) keeps just that part of StrTxt that begins at position 'i' (the number returned by InStr) in StrTxt to the end of that string.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
Tags |
word vba |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Multiple Hyperlink to a single string | PRA007 | Word VBA | 7 | 11-09-2015 04:29 PM |
find a set of characters in a string and return a 0 (zero) if not found | MaineLady | Excel | 2 | 11-05-2015 03:23 PM |
Why is this Find string not working | TechEd | Word VBA | 5 | 07-05-2014 08:12 PM |
Find and replace a string of text | errtu | Word | 1 | 01-31-2013 02:09 PM |
Bad view when using Find and Find & Replace - Word places found string on top line | paulkaye | Word | 4 | 12-06-2011 11:05 PM |