All Forums Tools
brian_m 5 posts Joined 06/15
26 Jun 2015
TPT UTF8 Import of "no valid unicode character" - need help

Hi,
i need held to import a .csv into a table.
 
We have a .csv file with a few columns. One of them is unicode. The import of these "U+10002A" characters failed. Without these characters the import is fine.
TPT19350 I/O error on file xxx
TPT19003 read
 
Teradata TPT is version 13.10.00.08
When you need more information please write.
 
Thanks!
brian
 
 

feinholz 1234 posts Joined 05/08
26 Jun 2015

Please provide the entire output from the console.
 

--SteveF

brian_m 5 posts Joined 06/15
28 Jun 2015

Teradata Parallel Transporter Load Operator Version 13.10.00.04
LOAD_OPERATOR: private log specified: best_log_name
FILE_READER: TPT19008 DataConnector Producer operator Instances: 1
FILE_READER: TPT19003 ECI operator ID: FILE_READER-10650
FILE_READER: TPT19222 Operator instance 1 processing file '/text1.TXT'.
LOAD_OPERATOR: connecting sessions
LOAD_OPERATOR: preparing target table
LOAD_OPERATOR: entering Acquisition Phase
FILE_READER: TPT19350 I/O error on file '/text1.TXT'.
FILE_READER: TPT19003 Read
FILE_READER: TPT19350 I/O error on file '/text1.TXT'.
LOAD_OPERATOR: disconnecting sessions
FILE_READER: TPT19221 Total files processed: 0.
LOAD_OPERATOR: Total processor time used = '0.23 Second(s)'
LOAD_OPERATOR: Start : Mon Jun 29 07:31:19 2015

Hi,
here ist the output.
 
 

feinholz 1234 posts Joined 05/08
29 Jun 2015

Thank you.
And just checking that your script indicates "USING CHARACTER SET UTF8" prior to the DEFINE JOB?

--SteveF

feinholz 1234 posts Joined 05/08
29 Jun 2015

It appears as though  "U+10002A" is a character from 4-byte UTF8 encoding.
If so, Teradata load/unload products do not support 4-byte UTF8 data.

--SteveF

brian_m 5 posts Joined 06/15
29 Jun 2015

Thank you.
I think, we can not load the data without preparing the file.

brian_m 5 posts Joined 06/15
05 Aug 2015
BEGIN LOADING
   $DBX_LOAD....
   ERRORFILES
     $DBX_LOAD...._ERR1,
     $DBX_LOAD...._ERR2
     CHECKPOINT 3000000;
     SET RECORD VARTEXT "§" NOSTOP DISPLAY_ERRORS;

     axsmod /../.../work/cp2uni_axm.so "CodePage=UTF8, ErrorChar=U+003F";

Hi,
we found a solution for this unicode import problem! Using the "AXSMOD" file from the Unicode Toolkit.
The untranslatable character is now a "?" (define in ErrorChar).
And you can use the axsmod in TPT script:
Varchar AccessModuleInitStr = 'CodePage=UTF8, ErrorChar=U+003F, EOR=0A',
Varchar AccessModuleName = '/.../.../work/cp2uni_axm.so'
Simple - when you know...
 
greets,
brian
 

brian_m 5 posts Joined 06/15
05 Aug 2015

Info:  1st Code in last post is a FastLoad.
But you can use axsmode in FastLoad, MLoad or TPT.
There different axsmod files for AIX, Suse, ... 
Please refer to the documentation "Teradata Unicode Toolkit".

david.craig 73 posts Joined 05/13
05 Aug 2015

An import of U+10002A is uncommon as it is a user-defined character in the Supplementary Private Use Area-B. It could also be a corrupted encoding.  Private use has been used by Japanese communications companies to encode Emoji.

You must sign in to leave a comment.