Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 06-23-2016, 05:36 AM
bbreukelen bbreukelen is offline How to clean useless tags from docx files Mac OS X How to clean useless tags from docx files Office 2016
Novice
How to clean useless tags from docx files
 
Join Date: Jun 2016
Posts: 6
bbreukelen is on a distinguished road
Default How to clean useless tags from docx files

When creating a word document and saving as docx, the docx xml file is filled with information in tags that are not always useful.


For instance, every misspelled word marked in red is surrounded by a proofing tag, replaced text has tags etc etc.

One simple word often takes two sentences to describe in xml.
In order to convert docx I need to get these useless tags removed.
Does word offer a function to cleanup these tags or does anyone know about some tool that runs on a mac and does this?

Thanks!
Reply With Quote
  #2  
Old 06-23-2016, 05:48 AM
Charles Kenyon Charles Kenyon is offline How to clean useless tags from docx files Windows 8 How to clean useless tags from docx files Office 2013
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,533
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

"Convert Word document?"

Save As?
Reply With Quote
  #3  
Old 06-23-2016, 09:05 AM
bbreukelen bbreukelen is offline How to clean useless tags from docx files Mac OS X How to clean useless tags from docx files Office 2016
Novice
How to clean useless tags from docx files
 
Join Date: Jun 2016
Posts: 6
bbreukelen is on a distinguished road
Default

Thanks but that gives me a doc file or whatever. I do need the docx format.
Reply With Quote
  #4  
Old 06-23-2016, 09:38 AM
Charles Kenyon Charles Kenyon is offline How to clean useless tags from docx files Windows 8 How to clean useless tags from docx files Office 2013
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,533
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

You say you are trying to "convert" the document. Convert it to what? I am asking to see if Word's internal conversion process can meet your needs. You can convert to .rtf or .txt which should not have the tags but will not be XML files.

I do not know of any process within Word that will remove those components of its XML files. I would be surprised if there is one.
Reply With Quote
  #5  
Old 06-23-2016, 04:21 PM
macropod's Avatar
macropod macropod is offline How to clean useless tags from docx files Windows 7 64bit How to clean useless tags from docx files Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,467
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Setting the document's proofing to 'don't check' should be sufficient to remove all the proofing tags... If there's replaced text, that suggests the document also has tracked changes that haven't been accepted/rejected. Doing so would eliminate those data and their tags, too.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #6  
Old 06-24-2016, 12:56 AM
bbreukelen bbreukelen is offline How to clean useless tags from docx files Mac OS X How to clean useless tags from docx files Office 2016
Novice
How to clean useless tags from docx files
 
Join Date: Jun 2016
Posts: 6
bbreukelen is on a distinguished road
Default

Disabling proofing helps. Track changes was off, I'll do some more digging to see what else I can disable.

I was hoping that there is a tool somewhere that removes all xml tags that are not needed for the content of formatting.

To clarify, I am using phpdocx to take a docx template, replace lots of variables defined in the template for text and then save it as a PDF document. Replacing variables takes seconds to process because the xml document has all these weird tags in between. Without those tags, phpdocx would just need milliseconds.
Reply With Quote
  #7  
Old 06-24-2016, 01:36 AM
macropod's Avatar
macropod macropod is offline How to clean useless tags from docx files Windows 7 64bit How to clean useless tags from docx files Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,467
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by bbreukelen View Post
To clarify, I am using phpdocx to take a docx template, replace lots of variables defined in the template for text and then save it as a PDF document.
I have no idea what phpdocx is, so that doesn't actually add clarity. Furthermore, it's far from apparent what that might have to do with taking a template and saving it as a PDF file.

Note: there is no such thing as a docx template. Word templates use dot, dotx or dotm extensions, not docx.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #8  
Old 06-24-2016, 02:12 AM
bbreukelen bbreukelen is offline How to clean useless tags from docx files Mac OS X How to clean useless tags from docx files Office 2016
Novice
How to clean useless tags from docx files
 
Join Date: Jun 2016
Posts: 6
bbreukelen is on a distinguished road
Default

phpdocx enables a webserver to take a docx document as input, then manipulate it and spit out another docx file or pdf.

For instance, I could create an invoice in Word with all the formatting etc but instead of the actual content like client details and amounts, I change those to variables like $client_name$ and $invoice_total$. phpdocx calls that a template. Then in the PHP code, I replace those variables with the texts that I need enabling the webserver to create an invoice with all the right information but keeping the flexibility to make quick changes in the look&feel of the document without having to do coding.
Reply With Quote
  #9  
Old 06-24-2016, 02:24 AM
macropod's Avatar
macropod macropod is offline How to clean useless tags from docx files Windows 7 64bit How to clean useless tags from docx files Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,467
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by bbreukelen View Post
phpdocx enables a webserver to take a docx document as input, then manipulate it and spit out another docx file or pdf.
In which case, how is the presence of any tag in the docx file's xml code at all relevant? After all, all you're concerned with is the final PDF and none of the docx xml code would be present in that...
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #10  
Old 06-24-2016, 02:41 AM
bbreukelen bbreukelen is offline How to clean useless tags from docx files Mac OS X How to clean useless tags from docx files Office 2016
Novice
How to clean useless tags from docx files
 
Join Date: Jun 2016
Posts: 6
bbreukelen is on a distinguished road
Default

Phpdocx has to go through all the xml to find variables and replace them.

One of my hundreds of variables in a page is called $c12lvl1$.
But in the docx it is stored as follows:

HTML Code:
<w:r>
  <w:t>$c12</w:t>
</w:r>
<w:r w:rsidR="00323925">
  <w:t>lv1$</w:t>
</w:r>
So a quick search for $c12lvl1$ will not result in anything. Therefore, phpdocx will have to go through the xml and remove all those unneeded tags like the rsidR tag above.
That takes 3-4 seconds per page which makes processing time way too long for a 10-page document.

If I can do that cleanup before saving my docx template I would save 3 seconds per page every time it's generated.
Reply With Quote
  #11  
Old 06-24-2016, 02:43 AM
macropod's Avatar
macropod macropod is offline How to clean useless tags from docx files Windows 7 64bit How to clean useless tags from docx files Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,467
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Does phpdocx have to work with the XML? Can it not work with bookmarks, for example, in the document?
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #12  
Old 06-24-2016, 02:47 AM
bbreukelen bbreukelen is offline How to clean useless tags from docx files Mac OS X How to clean useless tags from docx files Office 2016
Novice
How to clean useless tags from docx files
 
Join Date: Jun 2016
Posts: 6
bbreukelen is on a distinguished road
Default

I guess it has to yes. But I don't know exactly what it does. It's not open source.
Reply With Quote
  #13  
Old 06-24-2016, 02:55 AM
macropod's Avatar
macropod macropod is offline How to clean useless tags from docx files Windows 7 64bit How to clean useless tags from docx files Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 22,467
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

I'd suggest looking into its capacities, then, since working with bookmarks (or even strings such as $c12lvl1$) in the body of the document is quite easy.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to clean useless tags from docx files Run Code on all files and save files as .docx Plokimu77 Word VBA 4 06-05-2016 04:41 PM
How to clean useless tags from docx files Macros now ok in docx files? techwriterrc12 Word VBA 4 05-09-2013 10:47 AM
How to open Docx files? mond_bees Word 12 08-29-2012 03:32 AM
Understanding .docx XML tags for better performance RithanyaLaxmi Word 1 03-08-2010 01:41 AM
Icon for docx files Jazz43 Word 2 10-20-2009 08:34 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 11:22 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft