All Forums Tools
aalokshah 5 posts Joined 11/10
11 Nov 2011
Need Help on using TPT with UNICODE characters

Hi, I want to use TPT in one of my project and I am facing the following issue,

(I have read parallel-transporter-unicode-usage#comment-17552 and doing things mentioned in that article but something is missing, here are the details)

table DDL
Id DECIMAL(18,0) TITLE 'Identifier' NOT NULL,

In TPT DEFINE SCHEMA all are casted to appropriate VARCHAR(n)
In EXPORT_OPERATOR SQL all four fields are casted to VARCHAR(n)

I am using FILE_WRITER operator to write data into delimited file.

Scenario 1: - When I run script with following command, the script runs fine but in the output file it does not show up the Chinese unicode characters. ( it does not have USING CHAR SET UTF8 DEFINE JOB MOVE_DATA_TO_FLAT_FILE)
tbuild -f

Scenario 2:- When I put USING CHAR SET UTF8 DEFINE JOB MOVE_DATA_TO_FLAT_FILE, and run thru tbuild -f it gives me an error EXPORT_OPERATOR: TPT12108: Output Schema does not match data from SELECT statement

My script is written in ASCII character set (all English) and I am using AIX machine to write a script. I do not have windows script generator tool.

How do I configure my script so that it gives me UTF8 characters in the file?


nj4nagoor 1 post Joined 11/11
14 Nov 2011


Can any body tell me about TPT stage. Is it a seperate stage that we can view in datastage or Is it a unix script ? I would like to know about the TPT stage for my project usage purpose.


feinholz 1234 posts Joined 05/08
14 Nov 2011

There are 2 things to consider:

1. you cannot write data in delimited format by using the Export operator; you must use the Selector operator and you must set the ReportMode attribute to the correct value

2. you must make sure that the size of the VARCHAR columns is in terms of "bytes", not "characters".


For example, a CHAR(10) CHARACTER SET UNICODE will result in 30 bytes required for the column if the client session character set is set to UTF8.

Thus, in the schema object, you would use CHAR(30) for that column.

The DBS defines sizes in terms of characters.

The client products work in terms of bytes.


aalokshah 5 posts Joined 11/10
23 Nov 2011

Thanks, the second point you mentioned worked for me. But I still do not understand why EXPORT will not work for delimited format. I am running my script and it works fine. Is there any specific reason why EXPORT should not be used?

I read that SELECTOR operator is much slower than export operator. My data export requirement is massive.

feinholz 1234 posts Joined 05/08
23 Nov 2011

The Export operator executes the FastExport protocol. That procotol returns the data in binary format, not text. Therefore, we cannot use that operator to write out delimited data, which is text.

The Selector operator can be used to retrieve the data in "report" mode, which retrieves the data in text format, and so can be used to create delimited output.

The only way to use the Export operator is for you (the user) to CAST the SELECT statement so all columns are converted (by the DBS) to VARCHAR. If you want to do that, it is up to you, but the operator cannot do that for you.



aalokshah 5 posts Joined 11/10
23 Nov 2011

Thanks again, that probably explains why my script is running with EXPORT operator ( I'm casting every column to VARCHAR)

feinholz 1234 posts Joined 05/08
23 Nov 2011

Just be careful because the first 2 bytes might be in binary. The normal format that is exported includes a 2-byte row length.



aalokshah 5 posts Joined 11/10
08 Dec 2011

It does not include first 2 bytes. In fact, one of the reason going for TPT was to replace fastexport which includes these additonal two bytes and hence additonal processing at our end.

Taking a step back,

2. you must make sure that the size of the VARCHAR columns is in terms of "bytes", not "characters".

This worked in TD13.10 but not working on 12.0 TPT utlitity. It again gives me the error

EXPORT_OPERATOR: aborting due to the following error:
Output Schema does not match data from SELECT statement
Job step MAIN_STEP terminated (status 12)

One more question,

How (If at all) FastExport sessions are differnet from TPT sessions? I read in documentation that number of session is limited by number of AMPs in the system. What is a co-relation between sessions and AMP? I can not understand how AMPs influence number of sessions.


feinholz 1234 posts Joined 05/08
08 Dec 2011

Ok, I will work backwards.

The first thing to understand is that TPT and the older legacy standalone utilities do the exact same thing.

It is called following (and executing) a particular "protocol".

So, the Export operator executes the FastExport protocol. The protocol describes the set of steps and communications between the client and the database. The sessions are the same. The method of data retrieval is the same.

As to AMPs, the special protocols (FastLoad, MultiLoad, FastExport) connect special "data" sessions that run in special partitions that connect directly to the AMPs.

And our utilities have a max limit as to how many of these special session that can be connected to the database, and that limit is 1-per-AMP.

(Of course, with FastExport -- or TPT Export operator -- noone really connects 1 session per available AMP, but that is the rule.)


Now, back to your schema mismatch problem. To help you further, at a minimum I will need to see your entire script. And a brief description of what you are trying to do.


gsoh 2 posts Joined 08/12
13 Aug 2013

I was faced with the same situation.
If you solve this problem, Could you share me with script ?

You must sign in to leave a comment.