![]() |
#1
|
|||
|
|||
![]()
When creating a word document and saving as docx, the docx xml file is filled with information in tags that are not always useful.
For instance, every misspelled word marked in red is surrounded by a proofing tag, replaced text has tags etc etc. One simple word often takes two sentences to describe in xml. In order to convert docx I need to get these useless tags removed. Does word offer a function to cleanup these tags or does anyone know about some tool that runs on a mac and does this? Thanks! |
#2
|
|||
|
|||
![]()
"Convert Word document?"
Save As? |
#3
|
|||
|
|||
![]()
Thanks but that gives me a doc file or whatever. I do need the docx format.
|
#4
|
|||
|
|||
![]()
You say you are trying to "convert" the document. Convert it to what? I am asking to see if Word's internal conversion process can meet your needs. You can convert to .rtf or .txt which should not have the tags but will not be XML files.
I do not know of any process within Word that will remove those components of its XML files. I would be surprised if there is one. |
#5
|
||||
|
||||
![]()
Setting the document's proofing to 'don't check' should be sufficient to remove all the proofing tags... If there's replaced text, that suggests the document also has tracked changes that haven't been accepted/rejected. Doing so would eliminate those data and their tags, too.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#6
|
|||
|
|||
![]()
Disabling proofing helps. Track changes was off, I'll do some more digging to see what else I can disable.
I was hoping that there is a tool somewhere that removes all xml tags that are not needed for the content of formatting. To clarify, I am using phpdocx to take a docx template, replace lots of variables defined in the template for text and then save it as a PDF document. Replacing variables takes seconds to process because the xml document has all these weird tags in between. Without those tags, phpdocx would just need milliseconds. |
#7
|
||||
|
||||
![]() Quote:
Note: there is no such thing as a docx template. Word templates use dot, dotx or dotm extensions, not docx.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#8
|
|||
|
|||
![]()
phpdocx enables a webserver to take a docx document as input, then manipulate it and spit out another docx file or pdf.
For instance, I could create an invoice in Word with all the formatting etc but instead of the actual content like client details and amounts, I change those to variables like $client_name$ and $invoice_total$. phpdocx calls that a template. Then in the PHP code, I replace those variables with the texts that I need enabling the webserver to create an invoice with all the right information but keeping the flexibility to make quick changes in the look&feel of the document without having to do coding. |
#9
|
||||
|
||||
![]()
In which case, how is the presence of any tag in the docx file's xml code at all relevant? After all, all you're concerned with is the final PDF and none of the docx xml code would be present in that...
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
|||
|
|||
![]()
Phpdocx has to go through all the xml to find variables and replace them.
One of my hundreds of variables in a page is called $c12lvl1$. But in the docx it is stored as follows: HTML Code:
<w:r> <w:t>$c12</w:t> </w:r> <w:r w:rsidR="00323925"> <w:t>lv1$</w:t> </w:r> That takes 3-4 seconds per page which makes processing time way too long for a 10-page document. If I can do that cleanup before saving my docx template I would save 3 seconds per page every time it's generated. |
#11
|
||||
|
||||
![]()
Does phpdocx have to work with the XML? Can it not work with bookmarks, for example, in the document?
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#12
|
|||
|
|||
![]()
I guess it has to yes. But I don't know exactly what it does. It's not open source.
|
#13
|
||||
|
||||
![]()
I'd suggest looking into its capacities, then, since working with bookmarks (or even strings such as $c12lvl1$) in the body of the document is quite easy.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
![]() |
Plokimu77 | Word VBA | 4 | 06-05-2016 04:41 PM |
![]() |
techwriterrc12 | Word VBA | 4 | 05-09-2013 10:47 AM |
How to open Docx files? | mond_bees | Word | 12 | 08-29-2012 03:32 AM |
Understanding .docx XML tags for better performance | RithanyaLaxmi | Word | 1 | 03-08-2010 01:41 AM |
Icon for docx files | Jazz43 | Word | 2 | 10-20-2009 08:34 PM |