All Forums Database
mardan 2 posts Joined 02/14
12 Feb 2014
teradataimporttool charset problem

Hi.
I am importing data from teradata to hadoop with "Teradata Connector for Hadoop (Command Line Edition): Cloudera" v1.2:
http://downloads.teradata.com/download/connectivity/ teradata-connector-for-hadoop-command-line-edition
I have a table like this:

create table testable (

  id int not null,

  value varchar(50),

  text varchar(200),

  PRIMARY KEY (id)

);
And I have inserted this data:

insert into testtable values (1, '#1€', 'aá');

insert into testtable values (2, '#2€', 'eé');
The import job works normally:

export USERLIBTDCH=/usr/lib/tdch/teradata-connector-1.2.jar

hadoop jar $USERLIBTDCH com.teradata.hadoop.tool.TeradataImportTool -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://teradataServer/ DATABASE=test,CHARSET=UTF8 -username dbc -password dbc -jobtype hdfs -fileformat textfile -targetpaths /temp/hdfstable -sourcetable testtable -splitbycolumn id
But the resulting file in hdfs:

1 #1? a?
2 #2? e?
How can I import "special" characters from teradata to hadoop (UTF-8)? If I use the jdbc driver directly (e.g. java program), it works ok. the problem seems to be in the connector...

Raja_KT 1246 posts Joined 07/09
12 Feb 2014

I am also curious for these characters that if we can change from DATABASE=test,CHARSET=UTF8
to DATABASE=test,CHARSET=UTF16. If it works .

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

individuodk 2 posts Joined 01/14
13 Feb 2014

Mardan, We have the same problem, exactly.
Cloudera Connector for Teradata, CDH4, works rightly, but We are interested in using Teradata Connector for Hadoop (Command Line Edition): Cloudera"  because is the Connector recomended in terms of performance.
 
Have you got to import special chars yet?

mardan 2 posts Joined 02/14
13 Feb 2014

With "DATABASE=test,CHARSET=UTF16" I get the same resulting file.
The columns of my teradata table are in unicode chartype:
select columnname,chartype from dbc.columns where tablename =  'testtable';
 ColumnName                                                   CharType 

 ------------------------------------------------------------ -------- 

 id                                                                         0

 text                                                                      2

 value                                                                    2

 

david.craig 73 posts Joined 05/13
16 Feb 2014

What is the hexdump of the '?' in the hdfs file? Is it the UTF-8 0x1A replacement character, or something else? Is there a byte-order-mark, or is UTF-16 assumed to be little endian?   

You must sign in to leave a comment.