Answer a question

I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file, but with the word replaced. How am I supposed to do this?

The docx module has a savedocx that takes 7 inputs:

  • document
  • coreprops
  • appprops
  • contenttypes
  • websettings
  • wordrelationships
  • output

How do I keep everything in my original file the same except for the replaced word?

Answers

As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.

Howewer, here is how you could do it:

First, have a look at the docx tag wiki:

It explains how the docx file can be unzipped: Here's how a typical file looks like:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

Docx only gets one part of the document, in the method opendocx

def opendocx(file):
    '''Open a docx file, return a document XML tree'''
    mydoc = zipfile.ZipFile(file)
    xmlcontent = mydoc.read('word/document.xml')
    document = etree.fromstring(xmlcontent)
    return document

It only gets the document.xml file.

What I recommend you to do is:

  1. get the content of the document with **opendocx*
  2. Replace the document.xml with the advReplace method
  3. Open the docx as a zip, and replace the document.xml content's by the new xml content.
  4. Close and output the zipped file (renaming it to output.docx)

If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐