Interantional Components for Unicode 1.8 released

37

Author: JT Smith

Ram Viswanadha writes: “ICU from IBM is an open source library released under IBM Public License, provides a Unicode implementation with functions for formatting numbers, dates, times, and currencies according to locale conventions, transliteration, and parsing text in those formats. It provides flexible patterns for formatting messages, where the pattern determines the order of the variable parts of the messages, and the format for each of those variables. These patterns can be stored in resource files for translation to different languages. ICU provides code and data [over 150 locales] to handle the complexities of native language collation, searching, and other processes. It also provides a mechanism for accessing strings from resource files, whereby common strings can be shared across countries that have the same language. Included are more than 100 codepage converters for interaction with non-unicode systems.”
More info at http://oss.software.ibm.com/. Download:
http://oss.software.ibm.com/icu/download/1.8/index .html

Improvements and Features:

  • The collation code is reimplemented to improve performance significantly and to make it compliant with the Unicode Collation Algorithm. For details and current limitations, see the collation design document.
  • Support for all Unicode encodings in the converter API, including:
    UTF-32BE/LE (now complete)
    UTF-7
    SCSU (instead of a separate API)
  • Improvements in ISO-2022 converters; new JIS, JIS7, JIS8 converters
  • More complete set of conversion tables for IBM codepages
  • Additional Unicode string handling functions:
    Unicode string case handling – uppercasing, lowercasing, case folding, case-insensitive compare
    C/C++ string compare in Unicode code point order
    ANSI-C-style string functions like u_strrstr(), u_strtok_r(), u_memcpy(), etc.
    u_sprintf()/u_sscanf() functions in the extra/ustdio library (unsupported)
  • safeClone() functions for collators, converters and break iterators for creating object clones that are safe to be used in different threads
  • Transliterators:
    New Inter-Indic transliterators for nine Indic scripts
    LatinJamo transliterators much improved
    Arabic letter shaping for handling legacy data (no handling yet of the “tail” glyph fragment character)
    decmn tool to take apart ICU common data files (memory-mappable .dat files generated by gencmn)
  • Data for more locales
  • Category:

    • Open Source