Unicode is a core technology for developing and implementing a universal, international language solution. The Unicode Tool Kit has been developed for Teradata customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode.
Developing a global data warehouse has become a strategic business direction for the success in the international marketplace. Among the many technologies available today for globalization, Unicode is a core technology used to develop and implement a universal language solution. However, many Teradata customers may not implement Unicode on existing systems, as the customers have already implemented the Teradata Latin server character set (even for non-Latin1 languages including Chinese and Korean). These customers start experiencing gaps between the legacy data and Unicode data as well as gaps between the existing ANSI applications and Unicode applications. Those gaps will not allow the customer to access leading edge Teradata Unicode applications such as CIM 7.x and TVA 4.x. As of today with TD 15.xx/TTU 15.xx, migration from the Teradata Latin to Unicode may not be an easy task. Here are some limitations to the current Teradata system:
• ALTER TABLE does not support changing the server character set for character data types
• The TRANSLATE() function only works with Japanese
The purpose of this document is to introduce the Unicode tool kit for those customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode. The Unicode tool kit consists of the following components:
1) User Defined Functions (UDF) for migrating code page data to Unicode without import/export
2) cConv for the migration with import/export
3) cMigration and cScript for the table migration on the same system
4) Site-defined session character sets compatible with Windows code pages
5) Access Modules for translation and validation to load code page data or UTF8 via the UTF8 session in Fastload/Multiload/Tpump/TPT
6) Unicode test data and a test application in Java/JDBC
7) Others such as Internationalization Orange Book
What's New in Recent Releases
Updated translation UDFs
* New binary udf_utf8to16v2b.o for udf_utf8to16() for 32K output
* udf_16tow950() for Taiwan
Access modules (only for SUSE Linux and Windows)
* Support 64bit version of the access modules
* Support UTF16LE
* In the access module, updated the logic to remedy malformed UTF8 byte sequence not to consume well-formed bytes
Add an example on how to reload bad rows rejected by DBS due to the translation error 6706
Add udf_find16() for pass-through Unicode characters
Add more examples for pass-through UDFs including fexp and tpt (update)
Minor updates on pass-through doc
New pass-through UDFs are included in the translation UDFs.
pt_16BEHex2Char() -- Convert Unicode Hex values in UTF16-BE form to Unicode character (i.e. Teradata UTF16)
pt_utf8to16() -- convert UTF8 to Teradata UTF16
With those UDFs, Unicode characters including currently unsupported by Teradata can be stored in the Teradata Unicode columns with the right UTF16 values.
However, there are some limitations to access those unsupported characters. Please ask for technical consultation when implementing those functions.
Add a new session character set ISO88592_9A0 for Latin 2
Found a bug TGLOB-1025 in (3) translation UDFs. UDFs returned a wrong character (0x00) as the first character
Add translation UDFs for Oracle for error-free Unicode data movement from Oracle to Teradata