Unicode Tool Kit 1.6.0.2
About this download

Unicode is a core technology for developing and implementing a universal, international language solution. The Unicode Tool Kit has been developed for Teradata customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode.

Introduction

Developing a global data warehouse has become a strategic business direction for the success in the international marketplace. Among the many technologies available today for globalization, Unicode is a core technology used to develop and implement a universal language solution. However, many Teradata customers may not implement Unicode on existing systems, as the customers have already implemented the Teradata Latin server character set (even for non-Latin1 languages including Chinese and Korean). These customers start experiencing gaps between the legacy data and Unicode data as well as gaps between the existing ANSI applications and Unicode applications. Those gaps will not allow the customer to access leading edge Teradata Unicode applications.  As of today with TD 16.0/TTU 16.0, migration from the Teradata Latin to Unicode may not be an easy task.  Here are some limitations to the current Teradata system:
 
• ALTER TABLE does not support changing the server character set for character data types
• The TRANSLATE() function only works with Japanese
 
The purpose of this document is to introduce the Unicode tool kit for those customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode. The Unicode tool kit consists of the following components:
 
1) User Defined Functions (UDFs) for migrating code page data to Unicode without import/export
2) Site-defined session character sets compatible with Windows code pages or other standards
3) Access Modules for translation and validation to load code page data or UTF8 via the UTF8 session using Fastload/Multiload/Tpump/TPT
4) Unicode test data and a test application in Java/JDBC
5) Others
 

What's New in Recent Releases

 
Date: 2017-8-7
version 1.6.0.2
Updated translation UDFs
* Handle zero-length input strings in (4) UDFs (TLN-1240) 
   pt_16BEHex2Char.o, pt_utf8to16v2a.o, pt_utf8to16v2a_s.o, pt_utf8to16v2a_apl.o
* Add a custom version of the udf under 
   ..\04 TranslationUDFs\01 Teradata UDFs\suselinux-x8664\udf_installation\pass-through UDFs\custom versions
 
Date: 2017-6-22
version 1.6.0.1
Translation UDFs for Oracle
* Support HP-UX Itanium 64bit 
 
Date: 2017-6-15
version 1.6.0.0
Access Modules
* Support 64bit version for Redhat Linux and CentOS
Translation UDFs
* Support output characters up to 32K
Others
* Reorganize site-defined session charsets
* Exclude cConv, cMigration and cScript and others from the kit
* Internationalization Orange Book version G01 (2017-6-5)
 
Date: 2016-9-2
version 1.5.5.0
Updated translation UDFs 
* New binary udf_utf8to16v2b.o for udf_utf8to16() for 32K output
* udf_16tow950() for Taiwan
 
Date: 2016-6-17
version 1.5.4.0
Access modules (only for SUSE Linux and Windows)
* Support 64bit version of the access modules
* Support UTF16LE
* In the access module, updated the logic to remedy malformed UTF8 byte sequence not to consume well-formed bytes
Add an example on how to reload bad rows rejected by DBS due to the translation error 6706
 
Date: 2015-7-9
Version 1.5.3.2
Add udf_find16() for pass-through Unicode characters
Add more examples for pass-through UDFs including fexp and tpt (update)
 
Date: 2015-6-26
Version 1.5.3.1
Add udf_16to16() 
Minor updates on pass-through doc
 
Date: 2015-6-23
Version 1.5.3.0
New pass-through UDFs are included in the translation UDFs. 
pt_16BEHex2Char() -- Convert Unicode Hex values in UTF16-BE form to Unicode character (i.e. Teradata UTF16)
pt_utf8to16() -- convert UTF8 to Teradata UTF16
With those UDFs, Unicode characters including currently unsupported by Teradata can be stored in the Teradata Unicode columns with the right UTF16 values. 
However, there are some limitations to access those unsupported characters. Please ask for technical consultation when implementing those functions. 
 
Add a new session character set ISO88592_9A0 for Latin 2