Unicode Tool Kit
About this download

Unicode is a core technology for developing and implementing a universal, international language solution. The Unicode Tool Kit has been developed for Teradata customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode.


Developing a global data warehouse has become a strategic business direction for the success in the international marketplace. Among the many technologies available today for globalization, Unicode is a core technology used to develop and implement a universal language solution. However, many Teradata customers may not implement Unicode on existing systems, as the customers have already implemented the Teradata Latin server character set (even for non-Latin1 languages including Chinese and Korean). These customers start experiencing gaps between the legacy data and Unicode data as well as gaps between the existing ANSI applications and Unicode applications. Those gaps will not allow the customer to access leading edge Teradata Unicode applications.  As of today with TD 16.0/TTU 16.0, migration from the Teradata Latin to Unicode may not be an easy task.  Here are some limitations to the current Teradata system:
• ALTER TABLE does not support changing the server character set for character data types
• The TRANSLATE() function only works with Japanese
The purpose of this document is to introduce the Unicode tool kit for those customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode. The Unicode tool kit consists of the following components:
1) User Defined Functions (UDFs) for migrating code page data to Unicode without import/export
2) Site-defined session character sets compatible with Windows code pages or other standards
3) Access Modules for translation and validation to load code page data or UTF8 via the UTF8 session using Fastload/Multiload/Tpump/TPT
4) Unicode test data and a test application in Java/JDBC
5) Others

What's New in Recent Releases

Date: 2017-9-15
* Updated udf_utf16to16() to process only even number of bytes in UTF16. (ref:RECHAXDVL)
   If odd number of bytes were given, the last byte will be ignored
Date: 2017-8-7
Updated translation UDFs
* Handle zero-length input strings in (4) UDFs (TLN-1240) 
   pt_16BEHex2Char.o, pt_utf8to16v2a.o, pt_utf8to16v2a_s.o, pt_utf8to16v2a_apl.o
* Add a custom version of the udf under 
   ..\04 TranslationUDFs\01 Teradata UDFs\suselinux-x8664\udf_installation\pass-through UDFs\custom versions
Date: 2017-6-22
Translation UDFs for Oracle
* Support HP-UX Itanium 64bit 
Date: 2017-6-15
Access Modules
* Support 64bit version for Redhat Linux and CentOS
Translation UDFs
* Support output characters up to 32K
* Reorganize site-defined session charsets
* Exclude cConv, cMigration and cScript and others from the kit
* Internationalization Orange Book version G01 (2017-6-5)
Date: 2016-9-2
Updated translation UDFs 
* New binary udf_utf8to16v2b.o for udf_utf8to16() for 32K output
* udf_16tow950() for Taiwan
Date: 2016-6-17
Access modules (only for SUSE Linux and Windows)
* Support 64bit version of the access modules
* Support UTF16LE
* In the access module, updated the logic to remedy malformed UTF8 byte sequence not to consume well-formed bytes
Add an example on how to reload bad rows rejected by DBS due to the translation error 6706
Date: 2015-7-9
Add udf_find16() for pass-through Unicode characters
Add more examples for pass-through UDFs including fexp and tpt (update)
Date: 2015-6-26
Add udf_16to16() 
Minor updates on pass-through doc
Date: 2015-6-23
New pass-through UDFs are included in the translation UDFs. 
pt_16BEHex2Char() -- Convert Unicode Hex values in UTF16-BE form to Unicode character (i.e. Teradata UTF16)
pt_utf8to16() -- convert UTF8 to Teradata UTF16
With those UDFs, Unicode characters including currently unsupported by Teradata can be stored in the Teradata Unicode columns with the right UTF16 values. 
However, there are some limitations to access those unsupported characters. Please ask for technical consultation when implementing those functions. 
Add a new session character set ISO88592_9A0 for Latin 2