Although Microsoft products include a function to convert content to HTML, the implementation is not regarded as standards-compliant. A sample of what codes are generated by the Save as Web file option in Microsoft Word can be seen in the following Sample Conversion.
If you need to convert the content to HTML, a better option would be to copy and paste the text into a dedicated Web editor which can remove the Word styles. Sites at Penn includes this feature as does Dreamweaver and other Web editing tools.
Note: Even the Filtered option available on Word for Windows still embeds Microsoft styles which need to be audited.
Accessibility and Usability Issues of Word Generated HTML
- Styles use fixed font sizes, not relative font sizes. Fixed font sizes won’t allow the text to be zoomed in Internet Explorer 6 or earlier.
- The font will probably be fixed to Times New Roman, which is designed for print, not for computer monitors.
- Style sheets are embedded and are time consuming to remove manually. In fact, all Word styles in your template are exported even if they are not used in the original document.
- Word HTML allows designers to specify unusual fonts and/or symbols, which may not be available on all computers.
- If Smart Quotes are turned on, then they will be converted to a Unicode numeric character or left intact. Older browsers and screen readers may not be able to decipher these curly symbols. This issue also affects apostrophes and lengthened hypens.
Sample Conversion of Word to HTML (Filtered Option)
Below is a sample of how even the Filtered Option includes embedded formatting which can be inaccessible and difficult to remove.
This is unformatted text.
View the Code
<p>This is unformatted text</p>
Actual Result (Inaccessible)
(Font is fixed to 12 point, Times New Roman)
This is unformatted normal text.
View the code
<p class=MsoNormal>This is unformatted text</p>
p.MsoNormal, li.MsoNormal, div.MsoNormal
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}