There exists an open source framework called transpect
.
It's purpose is conversion of complete documents from and to individual formats. (docx
, TeX, html...)
Among others, docx
files are supported as input:
https://github.com/transpect/
If you have git or svn installed, you can simply follow this guide for a basic setup:
http://transpect.github.io/getting-started.html
The guide describes converting the whole docx
document into hub.xml
(basically docbook+css).
Both, MathType
and OMML
equations are translated to MathML
during conversion.
You can simply extract the mml:math
elements from the hub.xml
, using any tool you like.
There exist more transpect modules, to convert the hub.xml
to your desired format completely.
That requires more time for setup, so it depends on what you intend to do with the results.
Background
The old Word Equation Editor was built upon MathType
.
The new Word Equation Editor is based on OMML
.
A MathType
-Equation is displayed as an image in Word.
If you have MathType
installed, a separate window opens if you click to edit the equation.
Else, you cannot edit it.
New equations (OMML
) can be edited directly inside the Word software.
OMML
and MathML
are both XML-formats.
omml2mml.xsl
is a file provided by Microsoft to go from OMML
to MathML
.
transpect
uses a modified variant of it, because the original file has several flaws.
(There also exists mml2omml.xsl
, to go from MathML
to OMML
)
MathType
uses a non-XML-structure for its equations.
MathType
can import MathML
equations, but not OMML
.
MathType
therefore requires the omml2mml.xsl
file to generate MathML
first.
On a side note, MathType
does not always preserve character styles (bold/italic) when exporting to MathML/TeX.
To support all equation-types in Word, and to improve the performance of conversion, transpect
is able to translate MathType
to MathML
.
For your info: I am a contributor to the transpect project.