Posted by: Anonymous Coward
on November 02, 2004 08:37 PM
You're confused.
UTF-8, when expanded to "Unicode Transcription Form in 8 bits", can handle all Unicode characters up to 0x10FFFF (IIRC).
UTF-8, when expanded to "UCS Transcription Form in 8 bits" (UCS = ISO-10646-1), can handle all ISO characters up to 0xFFFFFFFF, *although* the ISO and Unicode have agreed to never use these.
So there is no difference between UCS-4 and UTF-8, except that the latter is variable-length, ASCII compatible, not prone to endianness bugs and _can_ imply larger files for some asian scripts.<nobr> <wbr></nobr>//mirabile - http://mirbsd.de/
Re:utf isnt all that its cracked up to be.
Posted by: Anonymous Coward on November 02, 2004 08:37 PMUTF-8, when expanded to "Unicode Transcription Form
in 8 bits", can handle all Unicode characters up to
0x10FFFF (IIRC).
UTF-8, when expanded to "UCS Transcription Form in
8 bits" (UCS = ISO-10646-1), can handle all ISO
characters up to 0xFFFFFFFF, *although* the ISO and
Unicode have agreed to never use these.
So there is no difference between UCS-4 and UTF-8,
except that the latter is variable-length, ASCII
compatible, not prone to endianness bugs and _can_
imply larger files for some asian scripts.<nobr> <wbr></nobr>//mirabile - http://mirbsd.de/
#