Glossary

A comprehensive reference guide to key concepts, terms, and technologies in Unicode encoding and internationalization.

Jump to Letter

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Term Definitions

ASCII

American Standard Code for Information Interchange (ASCII) is a 7-bit character encoding standard primarily containing English letters, digits, and basic symbols used in early computing.

Last updated: 19999-99-99

ASCII-8 / Extended ASCII

Extended 8-bit character set built on ASCII to include additional graphical characters and national language characters used in Western Europe.

Last update: 9999-99-99

Byte Order Mark (BOM)

U+FEFF character placed at the start of a text stream to signal the encoding of text and endianness (only used with UTF-16 and UTF-32).

Last update: 9999-99-99

Canonical Equivalence

Property where two character sequences have identical meaning and visual appearance but differ in their composition (e.g. decomposed accent characters sequences vs precomposed characters).

Last update: 9999-99-99

Combining Character

Unicode character that when combined with a base character produces a modified glyph (e.g. acute accent mark).

Last update: 9999-99-99

Emoji

Pictographic symbols used for electronic communication, standardized under Unicode as of 2010 with specific encoding, variation, and rendering profiles defined in TR51.

Last update: 2025-09-20

Emoji Variation Sequences (EVS)

Special sequences that specify alternative visual representations of emoji (text vs emoji presentation) using variation selector characters U+FE0E or u+FE0F.

Last update: 2025-09-15

Private Use Area

Code points reserved for internal or proprietary character assignments not specified by Unicode, useful for corporate or national encoding needs.

Last update: 2022-05-22

UTF-8

Unicode Transformation Format 8-bit encoding standard: Variable-width encoding using 1-4 bytes per code point. Backwards compatible with ASCII. Widely used internet standard.

Last update: 2025-08-15

UTF-16

Unicode encoding format using 16-bits code units. Two types: UTF-16 (UCS-2) for characters in BMP and UTF-16 for full Unicode range using surrogates.

Last update: 2025-07-20

UTF-32

Fixed-width encoding using 32-bit code units (4 bytes) per Unicode code point, offering direct code point mapping at the cost of higher memory overhead.

Last update: 2021-10-05

Variation Selector

Unicode code points (U+FE0E and U+FE0F) used to explicitly select between text or emoji presentations of base characters.

Last update: 2025-04-18

Recently Added Terms

Variation Selector

2025-09-28

Emoji

2025-09-25

UTF-8

2025-09-20

Explore More

Dive deeper into the complete glossary and explore Unicode's technical terms with additional documentation and implementation examples.

Download PDF

Full glossary as printable document.

Web Version

Interactive HTML with internal search and navigation.

API Access

Automate queries with programmatic access.

```