All Forums Tools
akd2k6 54 posts Joined 12/14
09 Jul 2015
fast export and mload with unicode column type and value

Hi All,
I am trying to unload data from one table by fexp script to file and then load the file to another same DDL table by mload.
But the records are getting rejected and going to error table when I encountered a column defined as unicode and it has special har.
column 
MERCH_NM_TXT VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
fexp script
-------------

fexp -c UTF8   << EOD

.RUN FILE $TD_SOURCE_LOGIN_FILE;

.LOGTABLE ${SOURCE_UTIL_DB}.${FEXP_UTIL_TBL_NM};

.BEGIN EXPORT

SESSIONS ${TD_UTIL_MAX_SESSN} ${TD_UTIL_MIN_SESSN};

LOCKING $SOURCE_DB.${SOURCE_TAB_NAME} FOR ACCESS

$UNLD_SQL

.EXPORT

        OUTFILE $TARGET_FILE

        MLSCRIPT $MLOAD_SCRIPT;

.END EXPORT;

.LOGOFF;

EOD

 

mload script

cat $mloadscript_file |mload -c "UTF-8"

 

But most of the records are loaded, but few are getting rejected where some -> type char is there which is not exactly it but I can't paste it here.

 

Can you please advise how I can load all data successfull. I think there is no problem with the data, as the data is already present in one table and I am trying to unload load another same ddl table.

 

Fred 1096 posts Joined 08/04
11 Jul 2015

Can you determine the problem character in hex? Teradata won't allow loading translation error characters U+001A or U+FFFD, for example.
 
Search for invalid characters in the source: 
SELECT MERCH_NM_TXT, CHAR2HEXINT(MERCH_NM_TXT) FROM mytable
WHERE TRANSLATE_CHK(MERCH_NM_TXT USING UNICODE_TO_UNICODE_FoldSpace) > 0

akd2k6 54 posts Joined 12/14
13 Jul 2015

"CARGO POR RECLAMACI N IMP"

"0043004100520047004F00200050004F00520020005200450043004C0041004D004100430049FFFD004E00200049004D0050"

The invalid char can not be pasted here.it liiks like arrow

Ivyuan 63 posts Joined 01/10
15 Jul 2015

Hi,
Since UTF8 is used, please double check if the size of the unicode column is tripled in the mload script.
For example, if "Column3 char(10)" is specified in the target table, in mload script, the corresponding FIELD statement should be:
.FIELD c3  *  CHAR(30);
Thanks!
--Ivy.

Fred 1096 posts Joined 08/04
15 Jul 2015

A translation error occurred when loading the data. Teradata will allow you to export the data, but you cannot re-load the error substitution character.
"0043004100520047004F00200050004F00520020005200450043004C0041004D004100430049FFFD004E00200049004D0050"
Looks like this character should have been U+00D3 = Ó so the text would read
"CARGO POR RECLAMACIÓN IMP"
 
 
 

akd2k6 54 posts Joined 12/14
20 Jul 2015

while I am unloading data by fast export and loading with mload, it's working sometimes.but that unload file is not delimited formatted data.
Is there anyway I can read the fast export unloaded file from unix and can load to TD or Oracle any db without using teradata. 
Basically I want to archive those unload file and later if required want to load to any db like oracle, db2 etc. But how I can read that file as the schema of the file is teradata specific.
my unload script- here I want to read the /tmp/table.dat from unix but without any teradata utility.

fexp  << EOD

.RUN FILE /tmp/SOURCE_PD.info;

.LOGTABLE aaa.table_utl;

.BEGIN EXPORT

SESSIONS 8 1;

LOCKING  aaa.table FOR ACCESS

select * from  aaa.table;

.EXPORT

        OUTFILE /tmp/table.dat

        MLSCRIPT /tmp/table.ml;

.END EXPORT;

.LOGOFF;

EOD

 
 
 

feinholz 1234 posts Joined 05/08
21 Jul 2015

You need a file in delimited format.
FastExport does not offer that capability.
If you would like that capability, you shoud switch to TPT (you should be switching to TPT anyway).
 

--SteveF

akd2k6 54 posts Joined 12/14
21 Jul 2015

Steve I tried that with TPT also to unload in delimited file and load it to another table. But there I am facing below issues-
1.when date is less than (1000-01-01) e.x "0001-01-01" it's not unloading correctly and load process failing.
2.while any field value is ''(blank) it's unloading but during loading it's taking as NULL and trying to load nut null field and failing.
3. unicode fields have unicode character. but they are unloading. but while trying to load, they are getting rejected.looks like either CONSUMER or PRODUCER operator not performing correctly for all unicode char.
 
I am using export and DATACONNECTOR_CONSUMER operator for unload and DATACONNECTOR_PRODUCER and UPDATE operator for loading.
My requirement is to create the unload file so that later I can ready without teradata for archival and also load that file to another table.
 
 
 

feinholz 1234 posts Joined 05/08
21 Jul 2015

(Different thread in the Tools section, so was not aware this is the same user/problem as discussed in the other thread.)
 
1. I cannot reproduce; we are still looking at the problem
2. this is expected behavior; you would need to turn on Quoted Data so that blank fields will not be treated as NULL
3. what character set are you using? the DC operator, when writing data, should not care about the content of the data; it does not look at the data; and so if this is not working, this is an issue we need to be aware of, and I would need detailed information (sample data, script, etc.) to vet the issue
 

--SteveF

akd2k6 54 posts Joined 12/14
21 Jul 2015

Hi Steve, what I understood, I have to unload and load both with quoteddata. so I am using this below attributes in unload job variable file.But the script is not creating the file with "" data. It's unloading same as previous. Is there anything I am doing wrong.

/* TPT  FILE_WRITER operator attributes */

,TargetDirectoryPath      = '/tmp/data'

,TargetFileName           = 'da_target.txt'

,TargetFormat             = 'Delimited'

,TargetOpenMode           = 'Write'

,TargetTextDelimiter      = '|'

,QuotedData = 'Yes'

,OpenQuoteMark = '"'

,CloseQuoteMark = '"'

,FileWriterPrivateLogName = 'daataconnector_log'

,FileWriterTraceLevel     = 'All'

,MaxDecimalDigits=38

 

Scipt

USING CHARACTER SET UTF8

DEFINE JOB delimited_file_unload

DESCRIPTION 'Export rows from a Teradata table to a delimited file'

(

  APPLY TO OPERATOR ($DATACONNECTOR_CONSUMER())

  SELECT * FROM OPERATOR ($EXPORT()); 

);

 

 

feinholz 1234 posts Joined 05/08
21 Jul 2015

Your script is using templates (good!).
However, your job variables for QuotedData may not be correct.
Please check the template file for the DataConnector Consumer operator to make sure you are using the correct ones.
Otherwise you may not be enabling the feature.
This thread does not seem to indicate which version of TPT you are using, or which platform.
 

--SteveF

akd2k6 54 posts Joined 12/14
21 Jul 2015

I am using teradata 14.10 and OS is AIX .Also check the $DATACONNECTOR_CONSUMER.txt template and it is the correct atrtribute name-
    VARCHAR OpenQuoteMark           = @DCCOpenQuoteMark,
    VARCHAR OpenQuoteMarkHEX        = @DCCOpenQuoteMarkHEX,
    VARCHAR CloseQuoteMark          = @DCCCloseQuoteMark,
    VARCHAR CloseQuoteMarkHEX       = @DCCCloseQuoteMarkHEX,
    VARCHAR EscapeQuoteDelimiter    = @DCCEscapeQuoteDelimiter,
 
But still it's not working. 
regarding your comment, "Otherwise you may not be enabling the feature." where can I check and enable?

feinholz 1234 posts Joined 05/08
21 Jul 2015

I need to know the version of TPT, not Teradata ("Teradata" refers to the database).
And you need to use the Job Variable names (on the right side of the "=" sign).
Not the attribute names from the left side.
 

--SteveF

feinholz 1234 posts Joined 05/08
21 Jul 2015

When I do this:
CREATE TABLE abc (
F1 DATE FORMAT 'YYYY/MM/DD'
);
INSERT INTO abc ('0001/01/01');
 
and then I use TPT to export the data to a flat file in delimited format, the result I get is this:
2901/01/01
 
I cannot reproduce the problem you are having on exporting the DATE data and only getting "1/01".
 

--SteveF

akd2k6 54 posts Joined 12/14
22 Jul 2015

I am trying to get the log. but the value that you are getting also looks wrong. instead of 0001 it's unloading 2901.
 

feinholz 1234 posts Joined 05/08
22 Jul 2015

That might be a database setting.
In other words, yes 2901 seems wrong, but with "0001", I think the DBS has to differentiate between the year 1901 and 2001.
I will talk to some database folks about why I am getting 2901.
 

--SteveF

akd2k6 54 posts Joined 12/14
10 Feb 2016

I am trying to load data from table to table using TPT template. unload from the below table and loading to same structure table.
my source table desc-

CREATE SET TABLE database.table1 ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      col1 DECIMAL(18,0) NOT NULL,

      col2 VARCHAR(31000) CHARACTER SET UNICODE NOT CASESPECIFIC,

      col3 DECIMAL(18,0),

      col4 VARCHAR(1000) NOT CASESPECIFIC

      )

PRIMARY INDEX xxx ( col1 );

 

while using  below script, it is throwing error :3798()with UTF-8,  9804(Response Row size or Constant Row size overflow) with UTF-16

Script

USING CHARACTER SET UTF16

DEFINE JOB tpt_table_to_table_load

(

  APPLY $INSERT TO OPERATOR($UPDATE()[1] )

  SELECT * FROM OPERATOR($EXPORT() [1]);

);

 

while using  below script, it is rejecting the some records which have unicode chars in column col2 

Script

USING CHARACTER SET ASCII

DEFINE JOB tpt_table_to_table_load

(

  APPLY $INSERT TO OPERATOR($UPDATE()[1] )

  SELECT * FROM OPERATOR($EXPORT() [1]);

);

Please advise how to solve this problem.

feinholz 1234 posts Joined 05/08
10 Feb 2016

What version of TPT are you using?
Can you send me a sample of the data?
Since you indicated the errors were database errors, I am assuming the job ran to completion and the rows in question were placed in the error table. Correct?
And can you please provide me with the entire error message?
 

--SteveF

akd2k6 54 posts Joined 12/14
10 Feb 2016

Steve, I think while I am using "USING CHARACTER SET UTF16" - then it is mulpliplying every cols length with 2 and going beyond 64K.
but if I am using USING CHARACTER SET ASCI, then unicode chars can not load.so as long as we have ascii value in the field col2, it will be loaded.
If we get any unicode char, then it will be rejected.
 
 

akd2k6 54 posts Joined 12/14
10 Feb 2016

I am using Teradata Parallel Transporter Version 14.10.00.10 

 

akd2k6 54 posts Joined 12/14
10 Feb 2016

USING CHARACTER SET UTF-8

$UPDATE: TPT10508: RDBMS error 3798: A column or character expression is larger than the max size.

 

 

USING CHARACTER SET UTF-16

$EXPORT: TPT10508: RDBMS error 9804: Response Row size or Constant Row size overflow.

 

 

If I do not select CHARACTER SET UTF, then it will be default aSCII and records with unicode value is/will be rejected.

 

Fred 1096 posts Joined 08/04
10 Feb 2016

Does col2 really need to be defined as 31000 characters? That would require 93000 bytes of buffer space in UTF8 which clearly exceeds the ~64K limit. In UTF16, col2 itself would need "only" 62000 bytes but by time you add the other columns, length fields, etc. you apparently are just slightly over the limit.

feinholz 1234 posts Joined 05/08
11 Feb 2016

The number of bytes actually exported depends on the "export width" setting in the DBS.
 
In order to better investigate the different errors, is it possible to send me some sample data that you know will cause the failure?
 

--SteveF

akd2k6 54 posts Joined 12/14
23 Feb 2016

Sorry Steve, I do not have the data to share as have some restrictions.
If in a table, out of 5 columns one column char set is declared as Unicode. then in template job we have to use-
USING CHARACTER SET UTF-8 or USING CHARACTER SET UTF-16
if we do not use any statement that means it's ASCII mode. 
Then will that unicode field will be loaded properly with ascii mode ? For me it's getting rejected.
Now if we use UTF8 or UTF16, the size is going beyond 64k as template job is using all field as unicode and multiplying by 2/3.
I think it's a bug and need some improvement here to handle the character set  issue.
As for most of the table we have combination of unicode and ascii,latin charset.
 

feinholz 1234 posts Joined 05/08
23 Feb 2016

When using aclient session character set of ASCII, the unicode field will only be loaded properly if the the data for that column consists of single-byte characters.
 
If the data contains multi-byte characters, then you have to use UTF8 or UTF16. And when you use those client session character sets, then the character data for all character fields could potentially be 1-3 bytes (UTF8), and will be expected to be 2 bytes per character for UTF16. And that is the behavior for all fields.
 
Of course, all single byte ASCII characters will continue to be single byte with a UTF8 client session character set, but the schema must be adjusted (tripled) because TPT has no knowledge of the data and must be prepared for a character of any size (1 to 3 bytes).
 

--SteveF

akd2k6 54 posts Joined 12/14
23 Feb 2016

I did not get the last statement-"Of course, all single byte ASCII characters will continue to be single byte with a UTF8 client session character set, but the schema must be adjusted (tripled) because TPT has no knowledge of the data and must be prepared for a character of any size (1 to 3 bytes)."
In below Table DDL, col2 have special unicode chars and they are loaded into table. will the template job be able to copy table to another same DDL table as I have to use . In my case it's failing with error code
USING CHARACTER SET UTF-8
$UPDATE: TPT10508: RDBMS error 3798: A column or character expression is larger than the max size.
USING CHARACTER SET UTF-16
$EXPORT: TPT10508: RDBMS error 9804: Response Row size or Constant Row size overflow.
Is there any way I can handle this as those data are loaded into one table which teradata allowed and need to copy in another table(dev/COB) 
CREATE SET TABLE database.table1 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      col1 DECIMAL(18,0) NOT NULL,
      col2 VARCHAR(31000) CHARACTER SET UNICODE NOT CASESPECIFIC,
      col3 DECIMAL(18,0),
      col4 VARCHAR(1000) NOT CASESPECIFIC
      )
PRIMARY INDEX xxx ( col1 );
 

feinholz 1234 posts Joined 05/08
23 Feb 2016

UTF8 characters can be 1-, 2-, or 3-bytes in length.
Thus, even with a character set of UTF8, a 7-bit ASCII character is still a single-byte UTF8 character.
 
However, our load/unload products need to account for the largest size possible for data for each column.
And thus, if you use a client session character set of UTF8, the CHAR field sizes must be tripled.
In the table definition you provided, the CHAR field sizes are in terms of "characters". The data, however (and the TPT schema), needs to be specified in terms of "bytes".
 
For the table definition you provided, if you specify a client session character set of UTF8, then your schema will be too large for the data to be loaded into Teradata.
 
And that is the error you are getting.
Do you happen to know what the "export width" setting is on your server?
 

--SteveF

akd2k6 54 posts Joined 12/14
23 Feb 2016

Hi Steve,
in that case, how the data is loaded to the production table using ETL tool like datastage, abinitio? Should not they face similar issue? 
In my case I can see data are loaded using datastage tool.

akd2k6 54 posts Joined 12/14
03 Mar 2016

Is there anyway I can copy the data from table to table with any way ?
As this multi character unicode  value are loaded by ETL tool in source table.
 
Teradata does not have any solution for that?

akd2k6 54 posts Joined 12/14
30 Mar 2016

Is there anyway I can unload and load with translate function to handle the above scenario.?
Or there is no solution for this issue?

feinholz 1234 posts Joined 05/08
30 Mar 2016

You can copy the data if you use a character set of UTF16.
You cannot do it with character set UTF8.
 

--SteveF

akd2k6 54 posts Joined 12/14
30 Mar 2016

hi Steve, with UTF16 also facing same problem.
in UTF16 it will multiply by 2 right? then also it will go beyond 64k(18+31000+18+1000=32036 , 32036*2=64072)
CREATE SET TABLE database.table1 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      col1 DECIMAL(18,0) NOT NULL,
      col2 VARCHAR(31000) CHARACTER SET UNICODE NOT CASESPECIFIC,
      col3 DECIMAL(18,0),
      col4 VARCHAR(1000) NOT CASESPECIFIC
      )
PRIMARY INDEX xxx ( col1 );

feinholz 1234 posts Joined 05/08
30 Mar 2016

Only the CHAR/VARCHAR sizes are doubled. It is only the character data that is affected.
Not numeric.
So, it is 18+62000+18+2000=64036.
And even the DECIMAL fields will be less than 18. I think a DECIMAL(18,0) only uses 8 bytes of storage.
So, it might be 8+62000+8+2000=64016.
The USING data size can exceed 64000 bytes (I believe we can send a max of 64260 to the DBS).
Try UTF16 and see if it works for you.
 

--SteveF

You must sign in to leave a comment.