All Forums Tools
Gupta_Pieeater 8 posts Joined 12/14
19 Mar 2015
TPT - Delimited Data Parsing error: Invalid multi-byte character

I am developing a TPT load (Unix envirnoment) for a data file (with UTF-8 encoding) to populate a Teradata table with columns defined as VARCHAR() CHARACTER SET UNICODE. One character in the data file is causing my load to fail. If I remove this character the load completes successfully.
When the data file is viewed via Winscp the problem character appears as a square box, when I copy and paste the character into a Word document it appears as a "smiley face" emoji/emoticon type thing. Winscp details the following attributes for the character: character 55357 (oxD83 encoding utf-8)
whilst a bit of a googling suggests the following character 55357, unicode code point U+D83D, UTF-8 (Hex) ed a0 bd 
I'm afraid this means nothing to me, what do I need to do to ensure that the TPT load job doesn't fail for these spurious UTF-8 characters which appear not to be supported by Teradata UTF-8 Unicode character set ? I don't want to pre process the file to remove this specific character as tomorrow I could easily receive a file with a different problem character.
Thanks for any assistance
 
 
 

feinholz 1234 posts Joined 05/08
24 Mar 2015

When you want assistance, it is always a good idea to provide:
1. the version of TPT you are using
2. the actual failure (is it a DBS failure? a TPT failure)
 
The word "fail" can mean many things.
Did the job complete bu the row(s) with the aforementioned character end up in the error table?
If so, that would indicate the character is not supported by Teradata.
Did TPT fail?
If so, what was the error message?
 

--SteveF

rai.sandeep03 2 posts Joined 08/15
17 Aug 2015

Hi ,

 

I am also getting similar error due to emoji/emoticons in data .

 

FILE_READER: TPT19134 !ERROR! Fatal data error processing file 'users/data/tgtfiles/rep_t_hit.out'. Delimited Data Parsing error: Column length overflow(s) in row 230.

 

 

FILE_READER: TPT19003 TPT Exit code set to 12.

TPT_INFRA: TPT02255: Message Buffers Sent/Received = 0, Total Rows Received = 0, Total Rows Sent = 0

FILE_READER: Total files processed: 0.

LOAD_OPERATOR: Total processor time used = '0.337322 Second(s)'

LOAD_OPERATOR: Start : Mon Jul 20 13:41:55 2015

LOAD_OPERATOR: End   : Mon Jul 20 13:42:04 2015

Job step insert_data terminated (status 12)

Job rep_t_hit_85912063 terminated (status 12)

Job start: Mon Jul 20 13:30:04 2015

Job end:   Mon Jul 20 13:42:04 2015

Total available memory:          20000676

Largest allocable area:          20000676

Memory use high water mark:       3490272

Free map size:                       1024

Free map use high water mark:          19

Free list use high water mark:          0

 

Thanks in advance,

Sandeep Rai

rai.sandeep03 2 posts Joined 08/15
17 Aug 2015

Please ignore previous post , here is the correct error
 
Hi ,

 

I am also getting similar error due to emoji/emoticons in data .

 

DATACONN: TPT19003 Warning: EscapeTextDelimiter has been encountered as the last character of column data

LOAD_OPERATOR: disconnecting sessions

DATACONN: TPT19134 !ERROR! Fatal data error processing file '/home/HIT_TAB.txt'. Delimited Data Parsing error: Invalid multi-byte character in row 158989, col 97.

 

DATACONN: TPT19003 TPT Exit code set to 12.

DATACONN: Total files processed: 0.

DATACONN: TPT19003 11 occurances of EscapeTextDelimiter encountered as the last character of column data.

DATACONN: TPT19003 Warning: The use of the same EscapeTextDelimiter value ('\') to export this data in DELIMITED format will result in an error.

LOAD_OPERATOR: Total processor time used = '0.579601 Second(s)'

LOAD_OPERATOR: Start : Thu Aug 13 09:16:36 2015

LOAD_OPERATOR: End   : Thu Aug 13 09:16:56 2015

Job step MAIN_STEP terminated (status 12)

Job c1030983 terminated (status 12)

Job start: Thu Aug 13 09:16:34 2015

Job end:   Thu Aug 13 09:16:56 2015

 

Thanks in advance,

Sandeep Rai

feinholz 1234 posts Joined 05/08
18 Aug 2015

Most emoji/emoticon characters are not supported by Teradata.
If your data contains invalid multi-byte characters, you can try to set RecordErrorFileName to a valid file and the DataConnector operator will put the error rows into that file and continue processing.
You did not tell me on what platform you are running and what version of TPT you are using.

--SteveF

champ_cs 1 post Joined 12/15
28 Dec 2015

Hi Steve,
We are on 14.10 and while using RowErrFileName, my TPT load puts the record into the error file, but also terminates the load
with below error:
FILE_READER: TPT19134 !ERROR! Fatal data error processing file 'new_gen.out'. Delimited Data Parsing error: Invalid multi-byte character in row 276, col 3.
 
The multibyte character encountered is not supported as of now in TD 14 or 15 hexa value (U+1F3C3).
I just want to make sure load doesn't terminate if it encounteres any such unsupported multi byte character.
 
Please guide.

feinholz 1234 posts Joined 05/08
04 Jan 2016

Some errors will cause the DataConnector operator to terminate if it feels it cannot continue because the parsing would not allow it to find the end-of-record character reliably.
 
However, I will have someone look into this specific case.
 

--SteveF

feinholz 1234 posts Joined 05/08
04 Jan 2016

Was record number 276 the first record with an invalid character?
Can you provide me with all of the attribute values you set up in the script (or job variable file) for the DataConnector operator?

--SteveF

vijaydf 16 posts Joined 06/12
01 Feb 2016
 U+00C7   Ç   Ç  capital C, cedilla

I am using the Tpt version -14.10.00.05 ,the load file is a UTF8  file with above delimiter.
When i try to load this file with CHARACTER SET UTF8 and
textdelimiter = Cedilla. i am getting the error
File_Reader: TPT19134 !Error! Fatal data error processing file X
Delimited data parsing Error : Column length overflow(s) in row1.
When i tried to view the file connent with UTF 8 fomat i am able to see the cedilla format but when i change the format to ISO in putty i am getting a different file format.
when i try to get the hexa value for the delimiter i am getting c3 87, which is not cedilla.
 
any inputs on error. ?
 
 
 

Vijay Mani

feinholz 1234 posts Joined 05/08
01 Feb 2016

Please provide the script and first few rows of data.
 

--SteveF

mezahidali 6 posts Joined 11/14
17 Feb 2016

Why don't you use the OS operator in TPT script along with SED command to replace the character with some meaning ful character. Just need to add one more step and can parse the file.

Fred 1096 posts Joined 08/04
17 Feb 2016

The C3 87 is UTF-8 encoding for U+00C7, so it would seem that your data is in fact UTF-8 encoded.
Does your TPT script specify USING CHARACTER SET UTF8 for the job?
How are you specifying the TextDelimiter in TPT? Is the script itself in UTF-8? If so, did you use the "-e UTF-8" option on the command line to indicate that fact?

You must sign in to leave a comment.