Microsoft Office Forums

Go Back   Microsoft Office Forums > >

 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #1  
Old 10-12-2015, 11:31 PM
PRA007's Avatar
PRA007 PRA007 is offline Extract Webdata from word Windows 7 32bit Extract Webdata from word Office 2010 32bit
Competent Performer
Extract Webdata from word
 
Join Date: Dec 2014
Location: Ahmedabad, Gujrat, India
Posts: 145
PRA007 is on a distinguished road
Post

Can I Extract webdata to word based on the pattern in the microsoft word document

For example:

I have doccument like this



I want to find numbers in the documents and extract data from google patents based on pattern

In case of cell having Multiple numbers I want first line to be searched only for pattern
In case of WO numbers I want to extract title from google patents
In case of Other like US, EP and CN I want Only Claims to be extracted from google patents.

there will be some numbers also for which google patent link might not work. I want code to Ignore Them

The pattern for Numbering to google patent link is as follow

I learned from this forum how to convert numbers to link


Code:
Sub AddGPHLink()
Dim oRng As Range
Dim strLink As String
    Set oRng = ActiveDocument.Range
    With oRng.Find
        Do While .Execute(FindText:="([USEPCNAWO]{2}) ([0-9]{4,}) ([A-Z0-9]{1,2})", MatchWildcards:=True)
            strLink = oRng.Text
            strLink = Replace(strLink, Chr(32), "")
            strLink = "https://www.google.co.in/patents/" & strLink & "?cl=en"
            ActiveDocument.Hyperlinks.Add Anchor:=oRng, _
                                          Address:=strLink, _
                                          TextToDisplay:=oRng.Text
            oRng.End = oRng.Fields(1).Result.End
            oRng.Collapse 0
        Loop
    End With
lbl_Exit:
    Set oRng = Nothing
    Exit Sub
End Sub
This can be used for target link

this is the webpage of exemplary googlepatent page

https://www.google.co.in/patents/EP2431370B1?cl=en



I want to extract data from web to word document like this



on webpage there are two fields I want to target


first goes like this


on line 3 it is written like that
<html style="height: 100%;"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents</title><script src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/cb=gapi.loaded_0" async=""></script><script>(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?bnew Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(e){}};this.tick("start",null,a)}var a;window.performance&&(a=window.performance.timing );var f=a?new e(a.responseStart):new e;window.jstiming={Timer:e,load:f};if(a){var c=a.navigationStart,d=a.responseStart;0<c&&d>=c&&( window.jstiming.srt=d-c)}if(a){var b=window.jstiming.load;0<c&&d>=c&&(b.tick("_wtsrt" ,void 0,c),b.tick("wtsrt_",

I want
"Optically active diamine derivative and process for producing the same"
to extract to word as title

Second is claims
there are two fields in claim



1. claim number
claim nubers are having two types of claim

independent claims having pattern like this
<li class="claim"> <div id="c-en-0001" num="0001" class="claim">

dependent claims having pattern like this
</li> <li class="claim-dependent"> <div id="c-en-0002" num="0002" class="claim">


2. claim text
<div class="claim-text">A process for producing a compound represented by formula (II):
<chemistry id="chem0047" num="0047"> <div class="patent-image"> <a href="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0047.png"> <img id="ib0047" file="imgb0047.tif" wi="40" he="40" img-content="chem" img-format="tif" src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0047.png" class="patent-full-image" width="160" height="160" alt="Figure imgb0047"> </a> </div> <attachments> <attachment idref="chem0047" attachment-type="cdx" file="CDX"> </attachment> <attachment idref="chem0047" attachment-type="mol" file="MOL"> </attachment> </attachments> </chemistry>
(wherein Y represents -COR, wherein R represents a C1-C8 alkoxy group, a C6-C14 aryloxy group, a C2-C8 alkenyloxy group, a C7-C26 aralkyloxy group, or a di(C1-C6 alkyl)amino group; and R<sup>1</sup> represents a C2-C7 alkoxycarbonyl group), which comprises treating a compound represented by formula (I):
<chemistry id="chem0048" num="0048"> <div class="patent-image"> <a href="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0048.png"> <img id="ib0048" file="imgb0048.tif" wi="26" he="34" img-content="chem" img-format="tif" src="./Patent EP1925611A1 - Optically active diamine derivative and process for producing the same - Google Patents_files/imgb0048.png" class="patent-full-image" width="104" height="136" alt="Figure imgb0048"> </a> </div> <attachments> <attachment idref="chem0048" attachment-type="cdx" file="CDX"> </attachment> <attachment idref="chem0048" attachment-type="mol" file="MOL"> </attachment> </attachments> </chemistry>
(wherein Y has the same meaning as defined above) in a solvent with aqueous ammonia or a solution of ammonia in C1-C4 alcohol and, subsequently, with a di(C1-C6 alkyl) dicarbonate.</div>

I would like to have two different macro or same macro to choose between what i want in term of claims weather dependent of independent

Claim text is followed by claim number which also governs wether claims is depenedant of indepedent

I want claim numbering to be plain text

Is it possible through word macro?

Word Files can be found from here
Sorry for not attaching them in attachements as it is forbidden on my PC.

https://sites.google.com/site/rahula...edirects=0&d=1

https://sites.google.com/site/rahula...edirects=0&d=1
Reply With Quote
 

Tags
macro, website, word 2010



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract VBA code to save in Word document Dave T Word VBA 4 01-26-2015 08:41 PM
Extract Webdata from word Need to extract two word domains from a list (BULK) Maxwell314 Excel 3 12-08-2014 06:17 PM
Extract Webdata from word How to Extract key data from word iliauk Word 3 11-08-2013 04:37 PM
Is there a way to extract various text in Word? barnkeeper410 Word 4 07-08-2013 10:58 PM
Extract Webdata from word Extract phone number from word file donlincolnmsof Word VBA 12 06-19-2012 05:21 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 03:09 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft