Microsoft Office Forums Extract Line of Text w/ specific characters up to the paragraph character, send to Excel

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 07-15-2015, 08:57 AM
dmarie123 dmarie123 is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2007
Novice
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel
 
Join Date: Dec 2012
Location: New Yuk
Posts: 23
dmarie123 is on a distinguished road
Default Extract Line of Text w/ specific characters up to the paragraph character, send to Excel

Hi all,



Long time no see . I swear I've seen similar questions but I can't find them now...

So, I receive the below data (I've "x"ed out the sensitive stuff) in a 380ish page report every day. There are about 1,000 instances of the text paragraph example. I need to extract the first line of each instance and send it to Excel so then I can import it into Access and build a query to search for businesses that have certain words in their names. However, getting the first line of text out of Word and into Excel so that I can then import it into Access is not as simple as I thought it would be. I found this, but it doesn't work at my end and I don't understand how to limit it by the paragraph character or get it to Excel.

Example:
150710000XXX CORP NAME, LLC
LAW : XXX LLC *FILER ADDRESS / PROCESS ADDRESS*
COUNTY : XXXX XXXXXXX NAME
EFF. DATE: 07/10/2015 123 XXXXXX ST. #XX
DUR. DATE: NEW YORK, NY XXXXX

I'd like to be able to extract the first line (bolded above) that holds that ID number, the corporation name and then ends at a paragraph return, and send those to Excel, two columns per record, one for the ID and one for the corporation name. The corp names don't always end in LLC so the paragraph return after the ID number should grab everything.

-The first six digits of the number beginning with 150710 above indicate the year, month and day. I'd like to leave the option open to enter either 4 or 6 digits, if we have to specify this now, because we will probably combine the reports for each week in to a single document.

-There is a paragraph character (^p) at the end of the line I want to extract because I was able to "find" them using Ctrl+F. There is not always a paragraph character preceding the 150710XXXXXXX number.

-Additionally there are three spaces between the ID Number and the corporation name.

Based on my limited knowledge of VBA I'd say a pop-up box that requests the prefix I want to search for would be best? Then, hypothetically, Word would use the entered prefix with a wild card to find all the instances that contain that prefix, extracting the line of text to the paragraph return. Ultimately those would all go into an Excel spreadsheet that I could then put in to Access so that I can build a query to search the keywords we're looking for as opposed to the poor soul that is currently Ctrl+Fing her way through almost 400 pages every day.

I'm using MS Office 2013. Going to update my profile right now.

Thank you in advance for your assistance.
Donna
Reply With Quote
  #2  
Old 07-15-2015, 05:34 PM
macropod's Avatar
macropod macropod is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 19,577
macropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to behold
Default

The lines you want can probably be found using a wildcard Find, something along the lines of:
Find = <[0-9]{4,6}[!^13]{4,}
but we'd probably need more of a description of what the lines to be found contain than what you've provided so far. In your example:
150710000XXX CORP NAME, LLC
I can understand what the first six digits (150710) represent, but then you have a further three 0s followed by what I suppose are just three random alpha characters followed by what appears to be just a single space before the corporation name. After the date, are there always three digits followed by three alpha characters, all as an unbroken string? Are there actually three spaces before the corporation name, or just those three alpha characters?
__________________
Cheers,
Paul Edstein
[MS MVP - Word]
Reply With Quote
  #3  
Old 07-16-2015, 09:49 PM
dmarie123 dmarie123 is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 8 Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2013
Novice
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel
 
Join Date: Dec 2012
Location: New Yuk
Posts: 23
dmarie123 is on a distinguished road
Default

Hi Paul,

I was hoping you'd be the one to post. The last "x"s represent the ID number assigned by the state. The ID number that contains the 1507 prefix seems to be a standard length in all the reports I've seen so far but I don't know if the number of characters would change if all of a sudden there was a surge in corporations for that day.

After that number there are three blank spaces before the corporation name. At the end of every line containing the above there is a return. I wasn't sure if it was possible to find a prefix and then somehow search to "the end of the line".

Donna
Reply With Quote
  #4  
Old 07-16-2015, 10:09 PM
macropod's Avatar
macropod macropod is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 19,577
macropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to behold
Default

In that case, the lines you want can probably be found using a wildcard Find, along the lines of:
Find = <[0-9]{7,9}[ ]{3}[!^13]{1,}
If you do some testing with that, you should be able to confirm that it correctly matches all your extract records with no false matches.

As for transferring the data to Access, I really don't see why you need to involve Excel - I'd have thought one just export from Word to a delimited text file and import that directly into Access?
__________________
Cheers,
Paul Edstein
[MS MVP - Word]
Reply With Quote
  #5  
Old 07-16-2015, 11:02 PM
gmayor's Avatar
gmayor gmayor is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2010 32bit
Expert
 
Join Date: Aug 2014
Posts: 2,950
gmayor is just really nicegmayor is just really nicegmayor is just really nicegmayor is just really nice
Default

While Paul is a whizz with wildcards, based on your comments I would do it differently. If the required paragraphs always begin with four or 6 numeric characters, the required paragraphs can be obtained by analysing those characters at the start of each paragraph e.g.

Code:
Dim oPara As Paragraph
Dim oRng As Range
    For Each oPara In ActiveDocument.Paragraphs
        If IsNumeric(Left(oPara.Range, 4)) Or IsNumeric(Left(oPara.Range, 6)) Then
            Set oRng = oPara.Range
            oRng.End = oRng.End - 1
            'do what you want here with orng.text
            MsgBox oRng.Text
        End If
    Next oPara
__________________
Graham Mayor - MS MVP (Word) (2002-2019)
Visit my web site for more programming tips and ready made processes www.gmayor.com
Reply With Quote
  #6  
Old 07-16-2015, 11:42 PM
macropod's Avatar
macropod macropod is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 19,577
macropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to behold
Default

I was thinking of something like:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim StrTmp As String, StrRec As String, StrDat As String
Dim i As Long, j As Long, x As Long, StrFlNm As String
StrFlNm = "C:\Users\" & Environ("UserName") & "\Documents\CorpData.txt"
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "<[0-9]{7,9}[ ]{3}[!^13]{1,}"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found = True
    x = x + 1
    StrRec = .Text
    StrTmp = Split(StrRec, " ")(0)
    i = Len(StrRec): j = Len(StrTmp)
    StrDat = StrDat & Chr(34) & StrTmp & Chr(34) & vbTab & _
      Chr(34) & Trim(Right(StrRec, i - j)) & Chr(34) & vbCrLf
    .Collapse wdCollapseEnd
    .Find.Execute
  Loop
  If Len(StrDat) > 1 Then
    Close #1
    StrDat = Left(StrDat, Len(StrDat) - 1)
    Open StrFlNm For Output As #1
    Print #1, StrDat
    Close #1
  End If
End With
StatusBar = "Done! The output for " & x & " records is now in: " & StrFlNm
Application.ScreenUpdating = True
End Sub
This produces a tab-delimited text file and the whole process should be much faster than looping through every paragraph.
To produce a csv file instead (e.g. for import into Excel), simply change CorpData.txt to CorpData.csv and change vbTab to ",".
__________________
Cheers,
Paul Edstein
[MS MVP - Word]
Reply With Quote
  #7  
Old 07-17-2015, 05:24 AM
dmarie123 dmarie123 is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 8 Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2013
Novice
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel
 
Join Date: Dec 2012
Location: New Yuk
Posts: 23
dmarie123 is on a distinguished road
Default

I'm on my way in to the office now and will try both!!! THANK YOU THANK YOU THANK YOU.
Reply With Quote
  #8  
Old 07-19-2015, 08:56 PM
dmarie123 dmarie123 is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 8 Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2013
Novice
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel
 
Join Date: Dec 2012
Location: New Yuk
Posts: 23
dmarie123 is on a distinguished road
Default

Hi Paul,

Thank you for your help!! Once again MS Office is singing because of the macro you wrote. Initially when I ran the macro it wasn't extracting anything but I played around with the .text line and figured it out. It was my description to you that was not accurate, I used "XXX" which was interpreted as characters, I should've said digits. I'm happy I figured it out .

Now I have more questions, please let me know if I should start a new thread and mark this one solved...
So, I've processed about 10 of the reports with the macro and then queried the imported text file (which is AWESOME) in Access. I have around 45 search terms in the Access query but I'm only getting hits on about 2% of the names so I'm wondering if it would be better/possible to search what the macro extracts?

When you helped me with this: https://www.msofficeforums.com/word-...xtensions.html, the macro used a spreadsheet to search and replace which I think could be applied here? How would I build wildcards in to that? For example, if I want to search for "* MD*", so a space plus "MD", with wildcards at either end but outside the space that precedes the "M".

There are only 5 parameters in Access that need a wildcard then a space then the word to be searched and then another wildcard. Maybe I could have Word use an Excel spreadsheet for concrete search words and then still use Access for the 5 different ones? Throwing ideas out, not sure what the best approach is...

This is similar to the bulk find replace macro in that my string got to long, that what lead me to search and ultimately brought me to the world of VBA haha. If you think it's better to use the current macro and import to Access I trust your advice.

This is what is working for me right now, I changed the second curly bracket to "12":
Code:
Sub CorpData()
Application.ScreenUpdating = False
Dim StrTmp As String, StrRec As String, StrDat As String
Dim i As Long, j As Long, x As Long, StrFlNm As String
StrFlNm = "C:\Users\" & Environ("UserName") & "\Documents\CorpData.txt"
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "<[0-9]{12}[!^13]{1,}"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found = True
    x = x + 1
    StrRec = .Text
    StrTmp = Split(StrRec, " ")(0)
    i = Len(StrRec): j = Len(StrTmp)
    StrDat = StrDat & Chr(34) & StrTmp & Chr(34) & vbTab & _
      Chr(34) & Trim(Right(StrRec, i - j)) & Chr(34) & vbCrLf
    .Collapse wdCollapseEnd
    .Find.Execute
  Loop
  If Len(StrDat) > 1 Then
    Close #1
    StrDat = Left(StrDat, Len(StrDat) - 1)
    Open StrFlNm For Output As #1
    Print #1, StrDat
    Close #1
  End If
End With
StatusBar = "Done! The output for " & x & " records is now in: " & StrFlNm
Application.ScreenUpdating = True
End Sub
Reply With Quote
  #9  
Old 07-19-2015, 09:03 PM
dmarie123 dmarie123 is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 8 Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2013
Novice
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel
 
Join Date: Dec 2012
Location: New Yuk
Posts: 23
dmarie123 is on a distinguished road
Default

Hi Graham,

Thank you for this option. The pop-up window doesn't work for this specific issue because I get about 1500 results per report so that's a lot of times to hit the "enter" key . Initially I thought the pop-up was the best option but there aren't any other 12 digit numbers in the report, lucky for me.

I was working all weekend so I didn't have much time to work with what you wrote for me but I did want to say thank you and also thank you for all the work you do on your blog. http://www.gmayor.com/replace_using_wildcards.htm is what helped me figure out why I couldn't get Paul's macro to extract what I wanted. You guys are great!! Every time I submit something I learn a ton. Once I have a chance to "play" with what you wrote I'll be back with more questions Thanks again!
Donna
Reply With Quote
  #10  
Old 07-20-2015, 12:07 AM
gmayor's Avatar
gmayor gmayor is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2010 32bit
Expert
 
Join Date: Aug 2014
Posts: 2,950
gmayor is just really nicegmayor is just really nicegmayor is just really nicegmayor is just really nice
Default

The message box was merely an indication that the macro had recovered the correct piece of text. In a real world situation it would not be required. You would use the value shown displayed in the text box i.e. oRng.Text, to add the required content to the CSV fille.
__________________
Graham Mayor - MS MVP (Word) (2002-2019)
Visit my web site for more programming tips and ready made processes www.gmayor.com
Reply With Quote
  #11  
Old 07-20-2015, 12:16 AM
macropod's Avatar
macropod macropod is offline Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Windows 7 64bit Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 19,577
macropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to beholdmacropod is a splendid one to behold
Default

Hi Donna,
Quote:
Originally Posted by dmarie123 View Post
So, I've processed about 10 of the reports with the macro and then queried the imported text file (which is AWESOME) in Access. I have around 45 search terms in the Access query but I'm only getting hits on about 2% of the names so I'm wondering if it would be better/possible to search what the macro extracts?

When you helped me with this: https://www.msofficeforums.com/word-...xtensions.html, the macro used a spreadsheet to search and replace which I think could be applied here? How would I build wildcards in to that? For example, if I want to search for "* MD*", so a space plus "MD", with wildcards at either end but outside the space that precedes the "M".
I think you meant this thread: https://www.msofficeforums.com/word-...cel-range.html

In that macro, you'd change:
.MatchCase = True
.MatchWholeWord = True
to:
.MatchWildcards = True
Naturally, you'd need a workbook with valid wildcard Find/Replace expressions, too.
Quote:
There are only 5 parameters in Access that need a wildcard then a space then the word to be searched and then another wildcard. Maybe I could have Word use an Excel spreadsheet for concrete search words and then still use Access for the 5 different ones? Throwing ideas out, not sure what the best approach is...
I can't really give any advice on what you'd need for Access, as I have no practical experience with that application.
Quote:
This is what is working for me right now, I changed the second curly bracket to "12
Given what you said earlier, about the dates, you should probably have changed:
.Text = "<[0-9]{7,9}[ ]{3}[!^13]{1,}"
to:
.Text = "<[0-9]{10,12} [!^13]{1,}"
__________________
Cheers,
Paul Edstein
[MS MVP - Word]
Reply With Quote
Reply

Tags
excel 2013, extract word to excel, word 2013

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Macro to Insert text into the beginning on specific paragraphs unless the paragraph is blank caboy Word VBA 2 04-01-2015 07:00 AM
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel How can select from a specific character to another character mohsen.amiri Word 2 02-19-2015 11:38 PM
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel How to insert paragraph character after every 500 characters? aditya_bokade Word VBA 21 05-09-2014 03:56 AM
Extract Line of Text w/ specific characters up to the paragraph character, send to Excel Replace paragraph-marks (line-breaks) in tables with a character-string Aztec Word VBA 2 04-02-2013 10:52 PM
How to import a text file but skip the first line regardless of characters? omahadivision Excel Programming 7 02-01-2013 08:30 PM


All times are GMT -7. The time now is 10:19 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2019 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft