Foreign Language Web Pages and Unicode

Page Content

Synopsis
Unicode Support for Foreign Language

Synopsis

Below are some considerations for the use of Unicode:

Use Unicode encoding whenever possible. This will ensure that text is transferrable between systems including mobile platforms.
Use language tagging to mark content as English or non-English. This ensures both accurate pronunciation in screen readers and enables foreign language spell checking.
Users on a screenreader may need to either purchase foreign language extensions or install additional files.

Unicode Support for Foreign Language

Whenever possible, non-English text (including special characters such as ©, †) should be inserted as is into a document encoded in Unicode.

Unicode is an encoding standard which assigns a numeric code to all characters across multiple scripts including Greek, Cyrillic, Asian scripts, Middle Eastern scripts, ancient scripts and technical symbols.

This standard allows computers around the world to exchange data across multiple languages consistently and without need for custom fonts.

The Penn State Computing with Accents and Symbols page has information about individual language, accent codes and math symbol codes.

Unicode Meta Tag

Note: A Web page encoded as Unicode often has a meta tag resembling the one below in the HEAD section.

<head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> ... </head>

LANG Tag (Attribute)

The LANG tag (i.e. the lang="" attribute) is designed to signal screen readers pronunciation engines to switch to another language. For this reason and other, tagging Web text as being in a particular language is required in WCAG 2.0.

Top of Page