UTF-16
维库,知识与思想的自由文库
UTF-16是Unicode的其中一個使用方式。 UTF是 Unicode/UCS Transformation Format,即把Unicode轉做某種格式的意思。
它定義於ISO/IEC 10646-1的附錄Q,而RFC2781也定義了相似的做法。
在Unicode基本多文種平面定義的字符(無論是拉丁字母、漢字或其他文字或符號),一律使用2字節儲存。而在辅助平面定義的字符,會以代理對(surrogate pair)的形式,以兩個2字節的值來儲存。
UTF-16比起UTF-8,好處在於大部分字符都以固定長度的字節 (2字節) 儲存,但UTF-16卻無法相容於ASCII編碼。
UTF-16 The term UTF-16 stands for UCS Transformation Format for 16 Planes of Group 00. UTF- 16 is the ISO/IEC encoding that is equivalent to the Unicode Standard with the use of surrogates as described in Chapter 3, Conformance. In UTF-16, each UCS-2 code position represents itself. Non-BMP code positions of ISO/IEC 10646 in planes 1..16 are represented using pairs of special codes. UTF-16 defines the transformation between the UCS-4 code positions in planes 1 to 16 of Group 00 and the pairs of special codes, and is identical to the UTF-16 encoding form defined in the Unicode Standard under definition D35 in Section 3.9, Unicode Encoding Forms. Sample code for transforming UCS-4 into UTF-16 can be found on the Unicode Web site. In ISO/IEC 10646, high-surrogates are called RC-elements from the high-half zone and lowsurrogates are called RC-elements from the low-half zone. Together, they constitute the S (Special) Zone of the BMP. UTF-16 represents the BMP and the next 16 planes. This system should not be an undue limitation because ISO JTC1/SC2/WG2 has no intention of assigning characters outside of planes 1..14, as that would break synchronization with the Unicode Standard. Planes 15 and 16 (000F0000..000FFFFF16 and 00100000..0010FFFF16) are reserved for private use.
[编辑] UTF-16的編碼模式
UTF-16的大尾序和小尾序儲存形式都在用。一般來說,以Macintosh製作或儲存的文字使用大尾序格式,以Microsoft或Linux製作或儲存的文字使用小尾序格式。
為了弄清楚UTF-16文件的大小尾序,在UTF-16文件的開首,都會放置一個U+FEFF字符作為Byte Order Mark (UTF-16LE 以 FF FE 代表,UTF-16BE 以 FE FF 代表),以顯示這個文字檔案是以UTF-16編碼。
以下的例子有三個字符:“朱”、半角逗號、“聿”。
| 使用 UTF-16 編碼的例子 | ||||||
|---|---|---|---|---|---|---|
| 編碼名稱 | 編碼次序 | 編碼 | ||||
| UTF-16LE | 小尾序 | 31 67 | 2C 00 | 7F 80 | ||
| UTF-16BE | 大尾序 | 67 31 | 00 2C | 80 7F | ||
| UTF-16 | 小尾序,包含BOM | FF FE | 31 67 | 2C 00 | 7F 80 | |
| UTF-16 | 大尾序,包含BOM | FE FF | 67 31 | 00 2C | 80 7F | |
[编辑] UTF-16 與 UCS-2 的關係
UTF-16可看成是UCS-2的父集。在沒有辅助平面字符前,UTF-16與UCS-2所指的是同一的意思。但當引入辅助平面字符後,就只稱為UTF-16了。現在若有軟件聲稱自己支援UCS-2編碼,那其實是暗指它不能支援辅助平面字符的委婉語。
[编辑] 外部連結
| Unicode 相關的條目 |
|---|
| ISO 10646 通用字符集 | UTF-7 | UTF-8 | UTF-16 / UCS-2 | UTF-32 / UCS-4 |
| Unicode编码表 | 基本多文種平面 | 辅助平面 | 中日韓統一表意文字 | CJKV | IICore |




