View Single Post
 
Old 11-04-2024, 10:05 AM
skydivetom skydivetom is offline Windows 11 Office 2019
Novice
 
Join Date: Mar 2023
Posts: 9
skydivetom is on a distinguished road
Default Modify Existing REGEX Macro to Extract Acronyms

Hello,

I currently use a REGEX-based macro in MS Word to extract all uppercase words and generate a separate document listing these words in a table. This has been particularly helpful for creating acronym lists, such as those used in appendices. However, I would like to modify the existing macro to reduce "false positives"—specifically, words that are in uppercase but are not actual acronyms.

To illustrate, I have attached two sample documents along with their respective outputs:

1. "Sample Document with Acronyms v01"
Contents: This document contains 10 uppercase acronyms highlighted in green. There is also a mixed-case acronym ("IoT," standing for "Internet of Things") marked in yellow. The current macro does not recognize mixed-case acronyms.

Process:
  • Open the document.
  • Select "View" from the Office ribbon.
  • Navigate to "Macros" > "View Macros" and run "ExtractAcronymsToNewDocument."
  • The macro generates a new Word document listing all found uppercase words in a table, with columns for "Acronym," "Definition" (left blank), and "Page" (indicating the first occurrence).
  • Outcome: This method works well but misses the mixed-case acronym ("IoT").

2. "Sample Document with Acronyms v02"
Contents: This is a copy of v01 with added words ("TABLE OF CONTENTS," "PARA #1," "PARA #2") marked in red. These uppercase headers are not acronyms but are extracted by the macro.
Process: Running the macro results in a new document listing the 10 green acronyms, but also includes the 4 additional uppercase header words (false positives).
Issues with the Current Macro:
Over-Inclusion: The macro extracts all uppercase words, including headers and other non-acronym text.
Under-Inclusion: It does not extract acronyms that use mixed-case formatting, such as "IoT."

Questions:
Q1: How can the REGEX macro be adjusted to only include uppercase words within parentheses, potentially reducing false positives?
Q2: Is there a way to modify the macro to extract mixed-case acronyms like "IoT," in addition to the standard uppercase acronyms?

Even a solution that addresses only Q1 would be highly valuable, as the current macro generates an overwhelming number of false positives in lengthy documents.

Thank you for considering my request, and I appreciate any guidance you can provide!
Attached Files
File Type: zip Sample Documents and Outputs.zip (44.1 KB, 5 views)
Reply With Quote