Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 03-03-2019, 04:17 AM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default Regex over 700 matches in a long doc


Hello,
I have to modify more than 700 biblio references to put them into footnotes in a document of hundred pages. So, I'm looking at VBA Word, but I 'm almost totally newbie. Documentation seems hard to find, especially as regards to basis (object model, etc.).
Here it is what I've tried and the major problem I'm facing with described after. The sketch of the code used is as follows:

Code:
Sub Replace_ref_1()

' Definition of all variables
' ...

Set docRange = ActiveDocument.Range

' Definition of the Regex needed  
With regEx
   ... ' Definition with required options etc. => OK
End With

Options.Overtype = False ' Insert and not delete
ActiveDocument.TrackRevisions = False ' Out of tracking changes mode 
' Regex matchs 
Set regFound = regEx.Execute(docRange)

' Loop over results, going by the last (highest index)
For cpt = regFound.Count To 1 Step -1
    refText = regFound(cpt - 1)
    Selection.SetRange Start:=regFound.Item(cpt - 1).FirstIndex, End:=regFound.Item(cpt - 1).FirstIndex
    Selection.Collapse Direction:=wdCollapseStart
    Selection.Footnotes.Add Range:=Selection.Range, Text:=LTrim(refText)
Next cpt

MsgBox "Number of matchs processed = " & regFound.Count

End Sub
The idea is to use the index of each match (which is, if I understand it well, the number of the begining character of the matched string) to insert a footnote at this place.
I meet the following problem: numbers (mark) of footnotes are not inserted at the correct place. In general, a few to some dozens of characters before expected. I thought it was due to the fact that adding footnotes implied adding characters to the text, and so the index couldn't be OK. That's why I tried processing from the last to the first: cf. "downto" loop. (I checked before this try that matches were in increasing index by writing them in a file.) But this is not better...

I noticed also that for a portion of text with only 3 matches, it's OK. But as soon as the text is too long, it doesn't work, either with the first or the last matches. I also noticed that it is better if I remove the summary (which lies at the begining), but even with that "trick", for a portion of doc with 133 matches, only the 57 first are OK, and all that follow are shifted (without any change in format between the 57th and 58th match...).

I guess there may be a problem in my understanding of index, or, moreover, in the portion of text included in the indexes...

I would be glad if someone could help me about that.

Thanks .
Reply With Quote
  #2  
Old 03-03-2019, 01:49 PM
macropod's Avatar
macropod macropod is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

You're unlikely to need regex for this. That said, without a sample document showing the kind of content you're trying to process, we'd only be guessing.

Can you attach a document to a post with some representative data (delete anything sensitive)? You do this via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #3  
Old 03-03-2019, 02:49 PM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default

The pattern regex is as follows:
.Pattern = " \(\[[A-Z][A-Z0-9-&*]*\](,[^\)]+|)\)"

Here are two files for example (don't look at the text, I took it from Internet to have an english doc...):
1) input_example.doc, with a list of references in the text, the form of which is of the pattern above (e.g. " ([ABC], p. 145)");
2) output_example.doc, wich is the same doc with references in footnotes.

I get the output example using my code, but for that I had to remove the sumary, and to ensure there were not too many references. Otherwise, I'd get the behavior described in the first post...

Well, even if this work doesn't require regex, I'm afraid I'm going to meet the same kind of problem concerning exact positioning of footnotes. So any kind of help appreciated .

EDIT: 1) Moreover, I'd like to understand what's wrong in my code, in case of...
2) Correction in regex (replace * by + and switch ) with \) at the end).
Attached Files
File Type: doc input_example.doc (52.5 KB, 9 views)
File Type: doc output_example.doc (54.0 KB, 9 views)
Reply With Quote
  #4  
Old 03-03-2019, 03:02 PM
macropod's Avatar
macropod macropod is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim StrNt As String
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "\(\[[!\(\[]@\][!\(]@\)"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    StrNt = .Text
    .Text = vbNullString
    .Footnotes.Add .Duplicate, , StrNt
    .Find.Execute
  Loop
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #5  
Old 03-03-2019, 03:28 PM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default

It seems to work well for the sample I tried, with a third of my whole doc and the sumary, but it didn't work with the entire doc (e.g. too long: more than 5 minutes without stopping).
But I'll make another try later, and I'll come back after studying the code because I guess I'll have many questions...

Thanks for the help.
Reply With Quote
  #6  
Old 03-03-2019, 03:38 PM
macropod's Avatar
macropod macropod is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by totoMSOF View Post
It seems to work well for the sample I tried, with a third of my whole doc and the sumary, but it didn't work with the entire doc (e.g. too long: more than 5 minutes without stopping).
Taking more than 5 minutes is hardly evidence of "it didn't work". Inserting a DoEvents command in the loop and triggering it periodically might speed things up, though, by giving Word some breathing space for housekeeping. For example:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, StrNt As String
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "\(\[[!\(\[]@\][!\(]@\)"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    i = i + 1
    StrNt = .Text
    .Text = vbNullString
    .Footnotes.Add .Duplicate, , StrNt
    If i Mod 50 = 0 Then DoEvents
    .Find.Execute
  Loop
End With
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #7  
Old 03-03-2019, 11:45 PM
macropod's Avatar
macropod macropod is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Cross-posted at: https://stackoverflow.com/questions/...-in-a-long-doc
For cross-posting etiquette, please read: http://www.excelguru.ca/content.php?184
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #8  
Old 03-04-2019, 04:17 AM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default


I am sorry for the "cross-posted". I didn't know the habits, but I've just read the etiquette and from now on, I will mention it explicitely if I feel I have to use it.
(And indeed, I asked the question in a French forum but didn't get answer, so I thought it would be hard to get one if I didn't multiply the sites...)
Reply With Quote
  #9  
Old 03-04-2019, 12:33 PM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default

"Taking more than 5 minutes is hardly evidence of "it didn't work"."
I agree, that's why I pointed out that it was in terms of duration. I only could compare with my try, which doesn't put footnotes marks at the correct place but did this bad job in less than 2 minutes.
Well, I made another try with the first code, and after 5 minutes, I had a warning from Windows that memory was not enough. I ignored it, and my computer almost crashed (I hardly could close Word which showed an occupation of 5 Go RAM...).

The last code with DoEvents is working since 10 minutes and the application is still inaccessible. I had to stop and the file is KO... :-(
Reply With Quote
  #10  
Old 03-04-2019, 02:09 PM
macropod's Avatar
macropod macropod is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

With 700+ footnotes to create, it's not going to happen in the blink of an eye. There may also be other aspects of your document you haven't told us about (e.g. 'Track Changes' is 'on') that might compromise the macro. We can only work with what we're given.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #11  
Old 03-04-2019, 03:03 PM
Guessed's Avatar
Guessed Guessed is offline Regex over 700 matches in a long doc Windows 10 Regex over 700 matches in a long doc Office 2016
Expert
 
Join Date: Mar 2010
Location: Canberra/Melbourne Australia
Posts: 3,932
Guessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant future
Default

If the code is running into memory issues but appearing to do the right thing then you could maybe modify the counter to exit the sub after 50 replacements.

If the resulting doc is fine then save the doc and run it again and see if that is also OK.

After a couple of successful runs, you could change the number to 100 or higher. Starting the macro 7 times is a whole lot better than never getting a result.

Perhaps throwing in a Save would clear the memory instead of doing the DoEvents.
__________________
Andrew Lockton
Chrysalis Design, Melbourne Australia
Reply With Quote
  #12  
Old 03-07-2019, 07:15 AM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default

"With 700+ footnotes to create, it's not going to happen in the blink of an eye. There may also be other aspects of your document you haven't told us about (e.g. 'Track Changes' is 'on') that might compromise the macro. We can only work with what we're given."

Yes, of course. As I said, I compare with the regex code I used and which doesn't turn my computer down and act (with the major problem explained above...).

You can find (see the address below) the exact version of the document I used as regards presentation, style, text, etc., that I anonymized today (took a long time! Does a tool exist for this purpose?).

http://dl.free.fr/fGzSid05k
(The file exceeds the 500 Ko allowed by this forum.)
Reply With Quote
  #13  
Old 03-07-2019, 07:23 AM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default

@Guessed: I already thought about it but with automation, i.e. making a loop on the doc to run the code I did with regex for each slice of 100 pages. But the thing is that I'm beginner with VBA Word and it's hard to deal with range and selection (does anybody know a good introduction to this kind of basis? ). So, what happens is that the indexes I get are from the beginning of the selection (e.g. from page 200 to page 300), but the footnotes are placed as if the selection started at the beginning of my doc !

But I keep thinking about it, and if someone can answer one of the initial question I asked, that is how index (in regex object) represents exactly, and how it works, it would be very pleasant for me. Indeed, I am not far from what I am looking for, so I'd like to understand what is wrong. And then I will be able to concentrate on the Find approach...

Thanks .
Reply With Quote
  #14  
Old 03-07-2019, 11:03 AM
totoMSOF totoMSOF is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010
Novice
Regex over 700 matches in a long doc
 
Join Date: Mar 2019
Posts: 11
totoMSOF is on a distinguished road
Default

After some tries: I can see the number of pages increase very quickly, so there is a problem here. I d'ont think it is due to the number of footnotes added, because for 17 footnotes only, I get 9 pages more displayed!
I have to investigate what happens here...

Otherwise, I can see the mouse pointer which starts to freeze as number of matches processed increases...
Reply With Quote
  #15  
Old 03-07-2019, 03:28 PM
macropod's Avatar
macropod macropod is offline Regex over 700 matches in a long doc Windows 7 64bit Regex over 700 matches in a long doc Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,956
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

The Find expression is having issues with strings like ([ABC]). It's fine with strings like ([ABC]x), however. Change the Find expression to:
.Text = "\(\[[!\(\[]@\]*\)"
and the code should run to completion in less than 1 minute.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
Reply

Tags
regex, replace

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex over 700 matches in a long doc Using VB Regex feature, I tried to replace 'the' and 'this' with 'that' but got screwed abdan Word VBA 3 01-18-2019 09:38 PM
How to compare 2 Excel sheets for 100+ matches? dylansmith Excel 5 05-22-2017 09:09 PM
Macro help regex subspace3 Word VBA 1 10-15-2014 09:53 AM
Regex over 700 matches in a long doc Convert RegEx to Word (Devanagari Font Find/Replace) gasyoun Word VBA 9 04-12-2013 04:15 PM
Regex in Word: Replaced strings are in disorder chgeiselmann Word 0 04-26-2009 11:33 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 05:01 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft