UTF-9 and UTF-18 Efficient Transformation Formats of Unicode
RFC 4042, “UTF-9 and UTF-18 Efficient Transformation Formats of Unicode”, is an Informational document published in April 2005 by M. Crispin. The canonical text is published by the RFC Editor.
Abstract
ISO-10646 defines a large character set called the Universal Character Set (UCS), which encompasses most of the world's writing systems. The same set of codepoints is defined by Unicode, which further defines additional character properties and other implementation details. By policy of the relevant standardization committees, changes to Unicode and amendments and additions to ISO/IEC 10646 track each other, so that the character repertoires and code point assignments remain in synchronization.
The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet.
This document describes a transformation format of Unicode that takes advantage of the nonet so that the format will be storage and computation efficient. This memo provides information for the Internet community.
What “Informational” means
Published for the general information of the community. It does not define an IETF standard and carries no standards-track status.
The canonical text of RFC 4042 is hosted at rfc-editor.org. Available in TXT,HTML.
- RFC 4041 Requirements for Morality Sections in Routing Area Drafts
- RFC 4043 Internet X.509 Public Key Infrastructure Permanent Identifier
- RFC 4040 RTP Payload Format for a 64 kbit/s Transparent Call
- RFC 4044 Fibre Channel Management MIB
- RFC 4039 Rapid Commit Option for the Dynamic Host Configuration Protocol version 4
- RFC 4045 Extensions to Support Efficient Carrying of Multicast Traffic in Layer-2 Tunneling Protocol
- RFC 4038 Application Aspects of IPv6 Transition
- RFC 4046 Multicast Security Group Key Management Architecture