Glossary
A comprehensive reference guide to key concepts, terms, and technologies in Unicode encoding and internationalization.
Jump to Letter
Term Definitions
ASCII
American Standard Code for Information Interchange (ASCII) is a 7-bit character encoding standard primarily containing English letters, digits, and basic symbols used in early computing.
Last updated: 19999-99-99
ASCII-8 / Extended ASCII
Extended 8-bit character set built on ASCII to include additional graphical characters and national language characters used in Western Europe.
Last update: 9999-99-99
Byte Order Mark (BOM)
U+FEFF character placed at the start of a text stream to signal the encoding of text and endianness (only used with UTF-16 and UTF-32).
Last update: 9999-99-99
Canonical Equivalence
Property where two character sequences have identical meaning and visual appearance but differ in their composition (e.g. decomposed accent characters sequences vs precomposed characters).
Last update: 9999-99-99
Combining Character
Unicode character that when combined with a base character produces a modified glyph (e.g. acute accent mark).
Last update: 9999-99-99
Emoji
Pictographic symbols used for electronic communication, standardized under Unicode as of 2010 with specific encoding, variation, and rendering profiles defined in TR51.
Last update: 2025-09-20
Emoji Variation Sequences (EVS)
Special sequences that specify alternative visual representations of emoji (text vs emoji presentation) using variation selector characters U+FE0E or u+FE0F.
Last update: 2025-09-15
Private Use Area
Code points reserved for internal or proprietary character assignments not specified by Unicode, useful for corporate or national encoding needs.
Last update: 2022-05-22
UTF-8
Unicode Transformation Format 8-bit encoding standard: Variable-width encoding using 1-4 bytes per code point. Backwards compatible with ASCII. Widely used internet standard.
Last update: 2025-08-15
UTF-16
Unicode encoding format using 16-bits code units. Two types: UTF-16 (UCS-2) for characters in BMP and UTF-16 for full Unicode range using surrogates.
Last update: 2025-07-20
UTF-32
Fixed-width encoding using 32-bit code units (4 bytes) per Unicode code point, offering direct code point mapping at the cost of higher memory overhead.
Last update: 2021-10-05
Variation Selector
Unicode code points (U+FE0E and U+FE0F) used to explicitly select between text or emoji presentations of base characters.
Last update: 2025-04-18
Recently Added Terms
Variation Selector
2025-09-28
Emoji
2025-09-25
UTF-8
2025-09-20
Explore More
Dive deeper into the complete glossary and explore Unicode's technical terms with additional documentation and implementation examples.
Download PDF
Full glossary as printable document.
Web Version
Interactive HTML with internal search and navigation.
API Access
Automate queries with programmatic access.