![]() |
#1
|
||||
|
||||
![]()
Can I Extract webdata to word based on the pattern in the microsoft word document
For example: I have doccument like this ![]() I want to find numbers in the documents and extract data from google patents based on pattern In case of cell having Multiple numbers I want first line to be searched only for pattern In case of WO numbers I want to extract title from google patents In case of Other like US, EP and CN I want Only Claims to be extracted from google patents. there will be some numbers also for which google patent link might not work. I want code to Ignore Them The pattern for Numbering to google patent link is as follow I learned from this forum how to convert numbers to link Code:
Sub AddGPHLink() Dim oRng As Range Dim strLink As String Set oRng = ActiveDocument.Range With oRng.Find Do While .Execute(FindText:="([USEPCNAWO]{2}) ([0-9]{4,}) ([A-Z0-9]{1,2})", MatchWildcards:=True) strLink = oRng.Text strLink = Replace(strLink, Chr(32), "") strLink = "https://www.google.co.in/patents/" & strLink & "?cl=en" ActiveDocument.Hyperlinks.Add Anchor:=oRng, _ Address:=strLink, _ TextToDisplay:=oRng.Text oRng.End = oRng.Fields(1).Result.End oRng.Collapse 0 Loop End With lbl_Exit: Set oRng = Nothing Exit Sub End Sub this is the webpage of exemplary googlepatent page https://www.google.co.in/patents/EP2431370B1?cl=en ![]() I want to extract data from web to word document like this ![]() on webpage there are two fields I want to target first goes like this on line 3 it is written like that <html style="height: 100%;"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents</title><script src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/cb=gapi.loaded_0" async=""></script><script>(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?bnew Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(e){}};this.tick("start",null,a)}var a;window.performance&&(a=window.performance.timing );var f=a?new e(a.responseStart):new e;window.jstiming={Timer:e,load:f};if(a){var c=a.navigationStart,d=a.responseStart;0<c&&d>=c&&( window.jstiming.srt=d-c)}if(a){var b=window.jstiming.load;0<c&&d>=c&&(b.tick("_wtsrt" ,void 0,c),b.tick("wtsrt_", I want "Optically active diamine derivative and process for producing the same" to extract to word as title Second is claims there are two fields in claim 1. claim number claim nubers are having two types of claim independent claims having pattern like this <li class="claim"> <div id="c-en-0001" num="0001" class="claim"> dependent claims having pattern like this </li> <li class="claim-dependent"> <div id="c-en-0002" num="0002" class="claim"> 2. claim text <div class="claim-text">A process for producing a compound represented by formula (II): <chemistry id="chem0047" num="0047"> <div class="patent-image"> <a href="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0047.png"> <img id="ib0047" file="imgb0047.tif" wi="40" he="40" img-content="chem" img-format="tif" src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0047.png" class="patent-full-image" width="160" height="160" alt="Figure imgb0047"> </a> </div> <attachments> <attachment idref="chem0047" attachment-type="cdx" file="CDX"> </attachment> <attachment idref="chem0047" attachment-type="mol" file="MOL"> </attachment> </attachments> </chemistry> (wherein Y represents -COR, wherein R represents a C1-C8 alkoxy group, a C6-C14 aryloxy group, a C2-C8 alkenyloxy group, a C7-C26 aralkyloxy group, or a di(C1-C6 alkyl)amino group; and R<sup>1</sup> represents a C2-C7 alkoxycarbonyl group), which comprises treating a compound represented by formula (I): <chemistry id="chem0048" num="0048"> <div class="patent-image"> <a href="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0048.png"> <img id="ib0048" file="imgb0048.tif" wi="26" he="34" img-content="chem" img-format="tif" src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0048.png" class="patent-full-image" width="104" height="136" alt="Figure imgb0048"> </a> </div> <attachments> <attachment idref="chem0048" attachment-type="cdx" file="CDX"> </attachment> <attachment idref="chem0048" attachment-type="mol" file="MOL"> </attachment> </attachments> </chemistry> (wherein Y has the same meaning as defined above) in a solvent with aqueous ammonia or a solution of ammonia in C1-C4 alcohol and, subsequently, with a di(C1-C6 alkyl) dicarbonate.</div> I would like to have two different macro or same macro to choose between what i want in term of claims weather dependent of independent Claim text is followed by claim number which also governs wether claims is depenedant of indepedent I want claim numbering to be plain text Is it possible through word macro? Word Files can be found from here Sorry for not attaching them in attachements as it is forbidden on my PC. https://sites.google.com/site/rahula...edirects=0&d=1 https://sites.google.com/site/rahula...edirects=0&d=1 |
Tags |
macro, website, word 2010 |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract VBA code to save in Word document | Dave T | Word VBA | 4 | 01-26-2015 08:41 PM |
![]() |
Maxwell314 | Excel | 3 | 12-08-2014 06:17 PM |
![]() |
iliauk | Word | 3 | 11-08-2013 04:37 PM |
Is there a way to extract various text in Word? | barnkeeper410 | Word | 4 | 07-08-2013 10:58 PM |
![]() |
donlincolnmsof | Word VBA | 12 | 06-19-2012 05:21 PM |