RFC 4042 · INFORMATIONAL · 2005

UTF-9 and UTF-18 Efficient Transformation Formats of Unicode

Overview

RFC 4042, “UTF-9 and UTF-18 Efficient Transformation Formats of Unicode”, is an Informational document published in April 2005 by M. Crispin. The canonical text is published by the RFC Editor.

Abstract

ISO-10646 defines a large character set called the Universal Character Set (UCS), which encompasses most of the world's writing systems. The same set of codepoints is defined by Unicode, which further defines additional character properties and other implementation details. By policy of the relevant standardization committees, changes to Unicode and amendments and additions to ISO/IEC 10646 track each other, so that the character repertoires and code point assignments remain in synchronization.

The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet.

This document describes a transformation format of Unicode that takes advantage of the nonet so that the format will be storage and computation efficient. This memo provides information for the Internet community.

Abstract as published in the RFC, via rfc-editor.org.

What “Informational” means

Published for the general information of the community. It does not define an IETF standard and carries no standards-track status.

Read this RFC

The canonical text of RFC 4042 is hosted at rfc-editor.org. Available in TXT,HTML.

Other RFCs from 2005

Who Is Online

In total there are 58 users online: 0 registered, 55 guests and 3 bots.

Most users ever online was 1,226 on 13 Jun 2026, 3:56 am.

Bots: Applebot Other Bot SemrushBot

Users active in the past 15 minutes. Total registered members: 354