Clean up Microsoft Word HTML
Microsoft Word generates terrible, sloppy, bloated, proprietary HTML. It’s ugly and near-impossible to hand-edit.
On a tip from Francois Jordaan, I Iearned of a handy way to coerce MS Word into generating decent markup. It’s the Office 2000 HTML Filter 2.0, which adds the File > Export To > Compact HTML… menu. The HTML generated with this function isn’t as clean as something that you or I, the hand-coding geeks of the world, would write, but it’s acceptable. Observe:
- Source Word document with a few sentences and a bulleted list (19,456 bytes)
- Exported HTML document (3507 bytes)
- Exported compact HTML document (1292 bytes)
For the statistically inclined among you, that’s a 63% decrease from the standard to compact HTML versions.
October 14th, 2006 at 4:38 pm
None of the links work!