Can I Extract webdata to word based on the pattern in the microsoft word document
For example:
I have doccument like this
I want to find numbers in the documents and extract data from google patents based on pattern
In case of cell having Multiple numbers I want first line to be searched only for pattern
In case of WO numbers I want to extract title from google patents
In case of Other like US, EP and CN I want Only Claims to be extracted from google patents.
there will be some numbers also for which google patent link might not work. I want code to Ignore Them
The pattern for Numbering to google patent link is as follow
I learned from this forum how to convert numbers to link
Code:
Sub AddGPHLink()
Dim oRng As Range
Dim strLink As String
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:="([USEPCNAWO]{2}) ([0-9]{4,}) ([A-Z0-9]{1,2})", MatchWildcards:=True)
strLink = oRng.Text
strLink = Replace(strLink, Chr(32), "")
strLink = "https://www.google.co.in/patents/" & strLink & "?cl=en"
ActiveDocument.Hyperlinks.Add Anchor:=oRng, _
Address:=strLink, _
TextToDisplay:=oRng.Text
oRng.End = oRng.Fields(1).Result.End
oRng.Collapse 0
Loop
End With
lbl_Exit:
Set oRng = Nothing
Exit Sub
End Sub
This can be used for target link
this is the webpage of exemplary googlepatent page
https://www.google.co.in/patents/EP2431370B1?cl=en
I want to extract data from web to word document like this
on webpage there are two fields I want to target
first goes like this
on line 3 it is written like that
<html style="height: 100%;"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents</title><script src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/cb=gapi.loaded_0" async=""></script><script>(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?bnew Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(e){}};this.tick("start",null,a)}var a;window.performance&&(a=window.performance.timing );var f=a?new e(a.responseStart):new e;window.jstiming={Timer:e,load:f};if(a){var c=a.navigationStart,d=a.responseStart;0<c&&d>=c&&( window.jstiming.srt=d-c)}if(a){var b=window.jstiming.load;0<c&&d>=c&&(b.tick("_wtsrt" ,void 0,c),b.tick("wtsrt_",
I want
"Optically active diamine derivative and process for producing the same"
to extract to word as title
Second is claims
there are two fields in claim
1. claim number
claim nubers are having two types of claim
independent claims having pattern like this
<li class="claim"> <div id="c-en-0001" num="0001" class="claim">
dependent claims having pattern like this
</li> <li class="claim-dependent"> <div id="c-en-0002" num="0002" class="claim">
2. claim text
<div class="claim-text">A process for producing a compound represented by formula (II):
<chemistry id="chem0047" num="0047"> <div class="patent-image"> <a href="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0047.png"> <img id="ib0047" file="imgb0047.tif" wi="40" he="40" img-content="chem" img-format="tif" src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0047.png" class="patent-full-image" width="160" height="160" alt="Figure imgb0047"> </a> </div> <attachments> <attachment idref="chem0047" attachment-type="cdx" file="CDX"> </attachment> <attachment idref="chem0047" attachment-type="mol" file="MOL"> </attachment> </attachments> </chemistry>
(wherein Y represents -COR, wherein R represents a C1-C8 alkoxy group, a C6-C14 aryloxy group, a C2-C8 alkenyloxy group, a C7-C26 aralkyloxy group, or a di(C1-C6 alkyl)amino group; and R<sup>1</sup> represents a C2-C7 alkoxycarbonyl group), which comprises treating a compound represented by formula (I):
<chemistry id="chem0048" num="0048"> <div class="patent-image"> <a href="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0048.png"> <img id="ib0048" file="imgb0048.tif" wi="26" he="34" img-content="chem" img-format="tif" src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0048.png" class="patent-full-image" width="104" height="136" alt="Figure imgb0048"> </a> </div> <attachments> <attachment idref="chem0048" attachment-type="cdx" file="CDX"> </attachment> <attachment idref="chem0048" attachment-type="mol" file="MOL"> </attachment> </attachments> </chemistry>
(wherein Y has the same meaning as defined above) in a solvent with aqueous ammonia or a solution of ammonia in C1-C4 alcohol and, subsequently, with a di(C1-C6 alkyl) dicarbonate.</div>
I would like to have two different macro or same macro to choose between what i want in term of claims weather dependent of independent
Claim text is followed by claim number which also governs wether claims is depenedant of indepedent
I want claim numbering to be plain text
Is it possible through word macro?
Word Files can be found from here
Sorry for not attaching them in attachements as it is forbidden on my PC.
https://sites.google.com/site/rahula...edirects=0&d=1
https://sites.google.com/site/rahula...edirects=0&d=1