Unicode Tool Kit 1.7.0.0
About this download

Unicode is a core technology for developing and implementing a universal, international language solution. The Unicode Tool Kit has been developed for Teradata customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode.

Introduction

Developing a global data warehouse has become a strategic business direction for the success in the international marketplace. Among the many technologies available today for globalization, Unicode is a core technology used to develop and implement a universal language solution. However, many Teradata customers may not implement Unicode on existing systems, as the customers have already implemented the Teradata Latin server character set (even for non-Latin1 languages including Chinese and Korean). These customers start experiencing gaps between the legacy data and Unicode data as well as gaps between the existing ANSI applications and Unicode applications. Those gaps will not allow the customer to access leading edge Teradata Unicode applications.  As of today with TD 16.0/TTU 16.0, migration from the Teradata Latin to Unicode may not be an easy task.  Here are some limitations to the current Teradata system:
 
• ALTER TABLE does not support changing the server character set for character data types
• The TRANSLATE() function only works with Japanese
 
The purpose of this document is to introduce the Unicode tool kit for those customers who migrate the Latin server character set to Unicode and build a global data warehouse based on a universal character set Unicode. The Unicode tool kit consists of the following components:
 
1) User Defined Functions (UDFs) for migrating code page data to Unicode without import/export
2) Site-defined session character sets compatible with Windows code pages or other standards
3) Access Modules for translation and validation to load code page data or UTF8 via the UTF8 session using fastload/Multiload/Tpump/TPT
4) Language translation functions translate texts in the database from/to specified languages
5) Unicode test data and a test application in Java/JDBC
6) Others
 

What's New in Recent Releases

Date: 2017-11-3
version 1.7.0.0
Language translation functions
* New database functions translate texts form/to specified languages
Access Module 
* cp2uni_axm.so for Linux supports a new parameter Timeout=xx to avoid hanging while reading data from named pipes
 
Date: 2017-9-15
version 1.6.0.3
* Updated udf_utf16to16() to process only even number of bytes in UTF16. (ref:RECHAXDVL)
   If odd number of bytes were given, the last byte will be ignored
 
Date: 2017-8-7
version 1.6.0.2
Updated translation UDFs
* Handle zero-length input strings in (4) UDFs (TLN-1240) 
   pt_16BEHex2Char.o, pt_utf8to16v2a.o, pt_utf8to16v2a_s.o, pt_utf8to16v2a_apl.o
* Add a custom version of the udf under 
   ..\04 TranslationUDFs\01 Teradata UDFs\suselinux-x8664\udf_installation\pass-through UDFs\custom versions
 
Date: 2017-6-22
version 1.6.0.1
Translation UDFs for Oracle
* Support HP-UX Itanium 64bit 
 
Date: 2017-6-15
version 1.6.0.0
Access Modules
* Support 64bit version for Redhat Linux and CentOS
Translation UDFs
* Support output characters up to 32K
Others
* Reorganize site-defined session charsets
* Exclude cConv, cMigration and cScript and others from the kit
* Internationalization Orange Book version G01 (2017-6-5)
 
Date: 2016-9-2
version 1.5.5.0
Updated translation UDFs 
* New binary udf_utf8to16v2b.o for udf_utf8to16() for 32K output
* udf_16tow950() for Taiwan
 
Date: 2016-6-17
version 1.5.4.0
Access modules (only for SUSE Linux and Windows)
* Support 64bit version of the access modules
* Support UTF16LE
* In the access module, updated the logic to remedy malformed UTF8 byte sequence not to consume well-formed bytes
Add an example on how to reload bad rows rejected by DBS due to the translation error 6706