How do I see the XML of my DOCX document?

54

9

I want to see my .docx in its pure XML format.

Various application like internet browsers and visual studio will open the file up in Word for me.

I've also tried renaming the document to .xml extension and it just opens up in notepad showing a bunch of unintelligible text.

RoboShop

Posted 2011-05-02T23:30:43.807

Reputation: 2 788

Answers

90

It's a zipped file. Rename it ending in .zip to view it.

Hello71

Posted 2011-05-02T23:30:43.807

Reputation: 7 636

6...then look at the word/document.xml file beneath it. – Aidan Feldman – 2016-05-25T03:22:48.967

4when I upzip it, then modify document.xml, then zip folder and change to docx it does not open in Word. Say it is corrupted. How can I save modification in DOCX file? – Renat Gatin – 2016-08-03T23:16:15.993

2no need to rename. winrar and 7z can recognize the archive, just right click and select extract – phuclv – 2017-06-06T14:13:31.263

5My mind is blown. How did I not know this... – Captain Hypertext – 2017-07-05T15:45:03.113

On MacOS (High Sierra) the default archive app seemed to have issues with unpacking the file. Using The Unarchiver (https://theunarchiver.com/) helped, and I didn't need to change the file extension or type.

– Mattygabe – 2018-10-11T18:28:56.490

@RenatGatin That deserves to be its own question, not a comment. But you have to use OpenXML & open a WordprocessingDocument, use a using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()) { string docText = sr.ReadToEnd(); } That gets the XML to load into a string, then use a XMLDocument & modify the XMLNodes that you can grab/remove/modify values of/insert new ones, etc. xmlDoc.LoadXml(docText); XmlNodeList nodes = xml.GetElementsByTagName("w:body"); XmlNode bodyNode = nodes[0]; XmlNode firstParagraph = bodyNode.ChildNodes[2];. Write docText = xml.OuterXml; – vapcguy – 2018-11-15T02:45:16.730

Then you need a using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create))) { sw.Write(docText); } to write your changes back out. – vapcguy – 2018-11-15T02:46:30.063

This talks all about it...https://docs.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

– vapcguy – 2018-11-15T02:47:59.133

@RenatGatin you must select all items in the folder and zip, not zip the folder itself. The [Content_Types].xml must be at the root of the zip file – phuclv – 2018-12-12T08:58:01.457

2

Working on macOS, and don't want to install any software to see the XML from your .docx documents? Just open up the terminal and:

cd path/to/your/file.docx
unzip file.docx -d file-content

As mentioned above, .docx files are "disguised" zip files, and unzip is installed by default on macOS. After using it, your file-content folder will contain the various .xml files composing the Word document.

Clorichel

Posted 2011-05-02T23:30:43.807

Reputation: 121

1

I unpacked the zip file en edited the document.xml using Notepad++ (Plugins/XML tools/Check XML syntax now). Notepad++ noticed me at swapped elements, I placed the elements in a more logical order and repeated the steps until no more issues were found. Then I copied all the files directly into zip using Total Commander and finally renamed it back to *.docx. Word happily opened the file.

What I am saying is that if Word still refuses to open the file then there may be some more issues in one or more of xml files. Tip: use IE to quickly check an XML-file. If you see only flat text or even nothing at all, you can bet there is something wrong with the xml.

Victor

Posted 2011-05-02T23:30:43.807

Reputation: 21