All Forums Database
pmehrotr 16 posts Joined 12/13
20 Dec 2013
Does one need unicode compression in Tredata 14.0

Iknow i Teradata 13.x, followng syntaxt is provided to implement compression for unicode
CREATE TABLE Customer

(Customer_Account_Number INTEGER,
Customer_Name VARCHAR(50),

Customer_Address CHAR(200) CHARACTER SET UNICODE
COMPRESS USING TransUnicode ToUTF8
DECOMPRESS USING TransUTF8To Unicode);
PAGE
 
But I am reading somehwere in 14.0, Tredata is storing unicode in UTF8, so this compression/decompression should not be required? I am creating brand new tables in 14/0.
 
Appreciate youir response.

M.Saeed Khurram 544 posts Joined 09/12
21 Dec 2013

Hi,
AFAIK, TRANSUNICODETOUTF8 is a TD 13.10 enhancement. These functions are also present in TD 14.0 and used to compress and decomress Unicode. I have found the below reference how Unicode is stored within TD and this documentation is of TD 14.0.
http://www.info.teradata.com/htmlpubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1143_111A/ch05.045.022.html
Can you please share your source of information that TD 14.0 stores unicode as UTF8?
 

Khurram

pmehrotr 16 posts Joined 12/13
21 Dec 2013

http://goldenorbit.wordpress.com/2013/03/09/latin-utf8-and-utf16-with-teradata/
 
 

  • Unicode strings are stored as UTF16 on disk anyway. Yes, space is wasted; that’s why there is an algorithm compression function to just compress UTF16 to UTF8 in version 13. Only version 14 can store UTF8 on disk.
dnoeth 4628 posts Joined 11/04
21 Dec 2013

Only version 14 can store UTF8 on disk

This is obviously wrong.
But the first two sentences are correct :-)
 

Dieter

M.Saeed Khurram 544 posts Joined 09/12
24 Dec 2013

Hi,
I was going through some material and I came to know that TRANSUNICODETOUTF8  can only be used to compress UNICODE columns which contain ASCII LATIN 7 Bit data. So I guess if TD 14 is storing Unicode in UTF8 then it will require the data in ASCII LATIN, else it will store it as UTF16.
 

Khurram

M.Saeed Khurram 544 posts Joined 09/12
24 Dec 2013

Hi
I was going through some material and I came to know that TRANSUNICODETOUTF8 can only be used to compress UNICODE columns which contain ASCII LATIN 7 Bit data. So I guess if TD 14 is to store Unicode in UTF8 then it will require the data in ASCII LATIN, else it will store it as UTF16.
 

Khurram

dnoeth 4628 posts Joined 11/04
24 Dec 2013

Hi Khurram,
TransUnicodeToUTF8 works for any UTF16 character, but if there's a lot of Latin chars it simply compresses better:
Most of the Latin chars are stored in one byte in UTF8 while some of the more exotic chars might need more than 2 bytes.

Dieter

Raja_KT 1246 posts Joined 07/09
24 Dec 2013

In the context of table joins, I feel that we need to be careful so that both  the joining fields are of the same characters, else there will be performance degrade. I have heard quite a number of cases.
Raja

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

dnoeth 4628 posts Joined 11/04
24 Dec 2013

Hi Raja,
this only relates to LATIN vs. UNICODE, of course they hash differently and thus you can't get PI-to-PI joins. But algorithmic compression doesn't change the charset, only the storage (btw, you can't compress a PI column).
Joining on columns with different character sets is a sign of bad database design :-)

Dieter

teradatauser2 236 posts Joined 04/12
28 Jul 2015

Hi Diether,
What i could understabd by reading some manuals is that : Space requirement for Unicode is double than that of latin. For joins, why do we say that they hash differently ? beasue the values in both of them will be different (latin might not be able to store any special characteres whereas Unicaode can). Could you give an example here ?
Can we not use MVC on the Unicode columns ? Can we use the Unicode columns in a where condition and does that perform good ? are there any other issues/considerations while usng unicode columns that we should consider ?
Unfortunately, there is not much details about these in the manuals, could you direct me to one if you have it ?

dnoeth 4628 posts Joined 11/04
29 Jul 2015

When you need characters not covered by LATIN you must switch to UNICODE and then you don't care if comparisons are a bit less efficient.
Regarding different hashes:
Of course a 'bla' hashes the same regardless of the charset, but when you compare or join Latin to Unicode you will notice a TRANSLATE in Explain.
Simply create two table with different charset as PI and Explain a join.
 

Dieter

You must sign in to leave a comment.