#1
|
||||
|
||||
Extract Data From Text file based on Pattern
I have Table like this in word
In this table there are tow types of row Row having multiple numbers Row having single number In both the case I would like to find number in first line only I have a large .txt file (800 mb) containing text having formate. 8232394 06774483 N 19850910 19870818 19910818 EXP. 8309716 06774483 N 19850910 19870818 19910319 REM. 4687262 06908244 N 19860917 19870818 19990815 EXP. 4687262 06908244 N 19860917 19870818 19990309 REM. 4687262 06908244 N 19860917 19870818 19950221 M184 4687262 06908244 N 19860917 19870818 19910108 M173 4687262 06908244 N 19860917 19870818 19880802 ASPN 4687263 06868897 N 19860527 19870818 19990128 M185 4687263 06868897 N 19860527 19870818 19950509 RMPN 4687263 06868897 N 19860527 19870818 19950509 ASPN 4687263 06868897 N 19860527 19870818 19950119 M184 4687263 06868897 N 19860527 19870818 19910311 ASPN 4687263 06868897 N 19860527 19870818 19910124 M173 4687264 06882047 N 19860703 19870818 19990815 EXP. 4687264 06882047 N 19860703 19870818 19990309 REM. 4687264 06882047 N 19860703 19870818 19950503 RMPN 4687264 06882047 N 19860703 19870818 19950503 ASPN 4687264 06882047 N 19860703 19870818 19950119 M184 4687264 06882047 N 19860703 19870818 19910311 ASPN RE45781 14176526 N 20140210 20151027 20150929 ASPN RE45786 14260890 N 20140424 20151027 20150929 ASPN RE45790 14454285 Y 20140807 20151103 20151008 ASPN RE45793 13445791 N 20120412 20151103 20151006 ASPN there are three important column here Column I is having Patent Numbers Column 5 is having event Date column 6 is having event status code I want to search in ms word US [0-9]{7} or US [0-9,]{9} or US RE[0-9]{5} and use number after US (without coma in second case) to crawl in .txt file and extract data from last column in space separated file. for example if my word table is having US 8,309,716 B2 I would like to search corresponding number i.e. 8309716 from .txt file and extract data at last column for example in this case EXP. .txt file sometimes contain Numbers somewhere else, I just want to search in column 1 only. in case of multiple result during search in column 1, I would like to keep search having latest column 5 date. I want to extract event code from the text file to Word table column 3 After finding event code I want to replace it with corresponding event from another word document having following two column Finally the table should look like this. |
#2
|
||||
|
||||
An 800MB text file is way too large to process - I'm not even sure Word could open it. The published limit is 512MB with no more than 32MB of text. I suggest you use another app to filter the data to eliminate everything that doesn't match your US/RE pattern. You may even need to break those up into smaller chunks.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#3
|
||||
|
||||
Extract Data from Text file based on Pattern Match
I was able to cut the large file to small based on the pattern. I used word wild card search to find patent numbers and made a txt file neming small
I found answers from superuser to cut the file for what I want http://superuser.com/questions/99066...search-pattern grep -f small_list.txt big_file.txt Now I have made the resultant file in excel having only 3 column 1st column containing Patent numbers, second for event date and last for event code. Is it possible with small files to do the above task I also created word document having table with same as excel file if thats easy to be used. |
#4
|
||||
|
||||
Re: Extract Data From Text file based on Pattern.
I have Table like this in word (main.docx)
My search cell may contain multiple numbers. I would like to search for first line only in every case. I have another database word/Excel (database1.docx/xlsx) file containing following table. 4818798 19970406 EXP. 4818798 19961112 REM. 4818798 19921116 ASPN 4818798 19920810 M183 4894153 19980118 EXP. 4894153 19970826 REM. 4894153 19930715 M283 4896965 20020130 EXP. 4896965 20010821 REM. 4896965 19970203 M184 4896965 19930630 M183 5021423 19990606 EXP. 5021423 19981229 REM. 5021423 19941117 M183 5107211 19960421 EXP. 5107211 19951128 REM. 5117870 19960602 EXP. 5117870 19960109 REM. 5185937 20010218 EXP. Column one contains patent nubers Column two contains event date in yyyymmdd formate Column three contains event code in short formate. I Want to search (i): US ([3-9][0-9,RE]{7}) in my word document and search for corresponding \1 result in database1 file and put column 3 ^13 column 2 data in (i+1) cell. In case of multiple result from database file I would like to consider event code with latest event date. Generally it is first entry but if possible I would like to consider the date as filter as in future USPTO may change the formate. After putting the event code in target cell (i+1) I would like to replace it with real event from another two column database2 word/excel file(database2.docx/xlsx. I just want to find and replace Event code with real events as in table. Files can be found at: https://sites.google.com/site/rtsk2015/fo I am not able to attach it in my office as it is forbidden because of cyberoam. Sorry for setting up strict requrement as I am at learning stage. Last edited by PRA007; 10-26-2015 at 10:38 PM. Reason: Editing Requested by admin |
#5
|
||||
|
||||
It would be helpful if you posted an actual Word document and an actual Excel worksheet, rather than expecting others to recreate what they think you might be working with, only to find out it's something different.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#6
|
||||
|
||||
Files are uploaded and link to file is edited in above post.
I am not able to upload files because of cyberoam in my office. Files can be found at: https://sites.google.com/site/rtsk2015/fo Last edited by PRA007; 10-27-2015 at 03:06 AM. |
#7
|
||||
|
||||
Try:
Code:
Sub GetPatentStatus() Application.ScreenUpdating = False Dim Tbl As Table, i As Long, j As Long, k As Long, lRow As Long, ArrFnd Dim xlApp As Object, xlWkBk As Object, xlWkSht As Object, xlRng As Object Dim bStrt As Boolean, bFnd As Boolean, bOpen As Boolean, bBar As Boolean, bFit As Boolean Dim StrTxt As String, StrWkBkNm As String, StrFnd As String, StrWkSht As String 'Word Find expressions ArrFnd = Array("US [0-9]{7}", "US [0-9,]{9}", "US RE[0-9]{5}") 'Excel constants for use with late binding Const xlCellTypeLastCell As Long = 11: Const xlValues As Long = -4163 Const xlWhole As Long = 1: Const xlByRows As Long = 1 'Excel workbook name & path StrWkBkNm = "C:\Users\" & Environ("Username") & "\Downloads\Database1.xlsx" 'Excel worksheet name StrWkSht = "Sheet1" If Dir(StrWkBkNm) = "" Then MsgBox "Cannot find the designated workbook: " & StrWkBkNm, vbExclamation Exit Sub End If bStrt = False ' Flag to record if we start Excel, so we can close it later. bOpen = False ' Flag to record if we open the workbook, so we can close it later. ' Test whether Excel is already running. On Error Resume Next Set xlApp = GetObject(, "Excel.Application") 'Start Excel if it isn't running If xlApp Is Nothing Then Set xlApp = CreateObject("Excel.Application") If xlApp Is Nothing Then MsgBox "Can't start Excel.", vbExclamation Exit Sub End If ' Record that we've started Excel. bStrt = True End If On Error GoTo 0 With xlApp 'Hide our Excel session if we started it If bStrt = True Then .Visible = False 'Check if the workbook is open. For Each xlWkBk In .Workbooks If xlWkBk.FullName = StrWkBkNm Then ' It's open Set xlWkBk = xlWkBk bOpen = True Exit For End If Next ' If not open by the current user. If bOpen = False Then ' Check if another user has it open. If IsFileLocked(StrWkBkNm) = True Then ' Report and exit if true MsgBox "The Excel workbook is in use." & vbCr & "Please try again later.", vbExclamation, "File in use" GoTo ErrExit End If ' The file is available, so open it. Set xlWkBk = .Workbooks.Open(FileName:=StrWkBkNm) If xlWkBk Is Nothing Then MsgBox "Cannot open:" & vbCr & StrWkBkNm, vbExclamation GoTo ErrExit End If End If On Error Resume Next Set xlWkSht = xlWkBk.Sheets(StrWkSht) On Error GoTo 0 If xlWkSht Is Nothing Then MsgBox "Cannot find the worksheet named: '" & StrWkSht & "' in:" & vbCr & StrWkBkNm, vbExclamation GoTo ErrExit End If With xlWkSht.UsedRange lRow = .Cells.SpecialCells(xlCellTypeLastCell).Row End With End With ' Store current Status Bar status, then switch on bBar = Application.DisplayStatusBar Application.DisplayStatusBar = True With ActiveDocument For Each Tbl In .Tables With Tbl bFit = .AllowAutoFit .AllowAutoFit = False j = .Rows.Count For i = 1 To j Application.StatusBar = "Processing row " & i & " of " & j StrTxt = "" With .Cell(i, 2).Range.Paragraphs(1).Range 'Find the references For k = 0 To UBound(ArrFnd) StrFnd = ArrFnd(k): bFnd = False With .Find .ClearFormatting .Replacement.ClearFormatting .Format = False .Forward = True .Wrap = wdFindStop .MatchWildcards = True .Text = StrFnd .Execute End With If .Find.Found Then StrTxt = Split(.Text, " ")(1): bFnd = True With xlWkSht Set xlRng = .Range("A1:A" & lRow).Find(What:=StrTxt, LookIn:=xlValues, _ LookAt:=xlWhole, SearchOrder:=xlByRows, MatchCase:=True, SearchFormat:=False) If Not xlRng Is Nothing Then StrTxt = .Cells(xlRng.Row, 3).Text Else StrTxt = "" End If End With End If If bFnd = True Then Exit For Next End With With .Cell(i, 3).Range If Len(.Text) > 2 Then .InsertBefore StrTxt & vbCr Else .InsertBefore StrTxt End If End With Next .AllowAutoFit = bFit End With Next End With ' Clear the Status Bar Application.StatusBar = False ' Restore original Status Bar status Application.DisplayStatusBar = bBar MsgBox "Finished!", vbInformation ErrExit: If Not xlWkBk Is Nothing Then If bOpen = False Then xlWkBk.Close If Not xlApp Is Nothing Then If bStrt = True Then xlApp.Quit Set xlRng = Nothing: Set xlWkSht = Nothing: Set xlWkBk = Nothing: Set xlApp = Nothing Application.ScreenUpdating = True End Sub Function IsFileLocked(strFileName As String) As Boolean On Error Resume Next Open strFileName For Binary Access Read Write Lock Read Write As #1 Close #1 IsFileLocked = Err.Number Err.Clear End Function
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#8
|
||||
|
||||
WORKED.... still throws error '91' on some occasion. I will work on solution one given by you. Can we make code not to disturb content already present in target cell. I mean can we paste the resulting content to be pasted before the text present in the content.
I made Big mistake in pinning image for final.docx as from my office while pasting the images it is not visible. The image shown in above question for final.docx is actually of database 2 word file. Any way after finding event code I may use simple recorded macro to convert event code to actual event without bothering about database 2 file. |
#9
|
||||
|
||||
You could try changing:
.Cell(i, 3).Range.Text = StrTxt to: .Cell(i, 3).Range.InsertBefore StrTxt & vbcr
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
||||
|
||||
After trial and error I could find why my error '91' is still there.
Coincidently document I provided contains all the occurrences of word find in the corresponding database files. In my day to day word there may be some US numbers that may not be present in excel document. In those cases I want to go ahead without adding anything. Is there any way to skip if the search not foud |
#11
|
||||
|
||||
Quote:
That's what the code already does.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#12
|
||||
|
||||
It worked perfect for first sample files I provided.
Please forgive me for providing new sample files as it seems the only solution for now. In my current sample file there are all the elements (or say possible problems). For my current file till raw 4 your code works fine after that when it comes to raw 5 it shows error. I thought error is because US RE44048 is not present in the database 1 file. I don't know what the problem is but it shows I got temporary solution. I recorded macro to remove US [0-9]{5} and US [0-9]{8,11} in a temporary It works fine for now. I Replaced column 3 then with original document. Code:
Run-time Error '91': Object Variable or with block variable not set. I don't have any clue whats going on now. Please find my files at: https://sites.google.com/site/rtsk2015/fo Download: Main.docx Download: Database1.xlsx Last edited by PRA007; 10-31-2015 at 04:40 AM. |
#13
|
||||
|
||||
I've updated the original macro.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#14
|
||||
|
||||
That is Huge Help.
I have Now complete solution (from MSoffice forum and superuser.com) how to get Patent fees status from USPTO. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to Extract Data from table based on pattern. | PRA007 | Word Tables | 4 | 03-17-2015 11:05 PM |
Retrieving data from data base based on text selection | capitala | PowerPoint | 0 | 12-10-2014 08:10 AM |
Inserting text from one word file into another based on an excel input | jmaxcy | Excel | 14 | 11-01-2013 04:07 PM |
Inserting text from one word file into another based on an excel input | jmaxcy | Word | 3 | 11-01-2013 01:26 AM |
Is there a way to do this? (automatically enter text based on form data) | TIKKI555 | Word | 0 | 05-26-2010 09:21 AM |