Thanks Paul! However, I think I have not made myself clear at all—my mistake, really.
My user case is a little more specific. Since most of the paragraphs I want to extract feature a reference, I thought I would first collect the references to then collect the paragraphs. To do so, I would use the fact that most paragraphs contain a reference written in italics. By finding all italicised text, I would be able to create a collection of search strings, from which I can then remove the duplicates to conduct the search. In this context, I would be looking for strings, not the complete paragraph
The trouble is, as you told me above, searching for italics may be a problem. So I thought I'd replace all italic text with tags, such as the *** discussed above. (Though if there is a way to do this with italics, that would be amazing.) After doing that, I can either look for every string matching the *** tag (or italics, if at all possible) and create a list. Which will in turn allow me to search through my collection of documents. Does it make any sense?
Again, thank you so much for contributing to this.
|