All Forums Tools
skunkwerk 17 posts Joined 08/15
08 Apr 2016
Exporting CSV from TPT

Hi,
I have installed FastExport but apparently it cannot output CSV files.
So now I'm trying to get ahold of Teradata parallel transporter, to export a large table (hundreds of millions of rows).
I've read this post and the quickstart guide for TPT, but I still don't know how to export a CSV from a table.
Could someone please provide a sample script that connects to a database with username/password and saves a table as a .CSV file?
 
thank you,
imran

Tags:
skunkwerk 17 posts Joined 08/15
08 Apr 2016

I just changed the select statement in the example file to my own table, and I get this error:
 
ubuntu@home:/opt/teradata/client/15.10/tbuild/sample/quickstart$ tbuild -f qstart2.txt -v jobvars2.txt -j qstart2
Teradata Parallel Transporter Version 15.10.00.04 64-Bit
Job log: /opt/teradata/client/15.10/tbuild/logs/qstart2-1.out
Job id is qstart2-1, running on spark-analytics
Teradata Parallel Transporter Export Operator Version 15.10.00.04
Teradata Parallel Transporter DataConnector Operator Version 15.10.00.04
$EXPORT: private log specified: exportprivate.log
$FILE_WRITER[1]: Instance 1 directing private log report to 'dtacop-ubuntu-14018-1'.
$FILE_WRITER[1]: DataConnector Consumer operator Instances: 1
$FILE_WRITER[1]: ECI operator ID: '$FILE_WRITER-14018'
$FILE_WRITER[1]: Operator instance 1 processing file 'flatfile2.dat'.
$EXPORT: connecting sessions
TPT_INFRA: TPT02639: Error: Conflicting data type for column(4) - "dt". Source column's data type (CHAR) Target column's data type (INTDATE).
$EXPORT: TPT12108: Output Schema does not match data from SELECT statement
$EXPORT: disconnecting sessions
$EXPORT: Total processor time used = '0.04642 Second(s)'
$EXPORT: Start : Sat Apr  9 00:57:08 2016
$EXPORT: End   : Sat Apr  9 00:57:11 2016
$FILE_WRITER[1]: Total files processed: 0.
Job step MAIN_STEP terminated (status 12)
Job qstart2 terminated (status 12)
Job start: Sat Apr  9 00:57:06 2016
Job end:   Sat Apr  9 00:57:11 2016
 
THis is my qstart2.txt:
DEFINE JOB qstart2
(
  APPLY TO OPERATOR ($FILE_WRITER)
  SELECT * FROM OPERATOR($EXPORT);
);
 
This is my jobvars2.txt (with credentials replaced):
 SourceTdpId          = 'db.acmecorp.com'
,SourceUserName       = 'myusername'
,SourceUserPassword   = 'mypass'
,DDLPrivateLogName    = 'ddlprivate.log'
,ExportPrivateLogName = 'exportprivate.log'
,TargetErrorList      = ['3807']
,TargetFileName       = 'flatfile2.dat'
,TargetFormat         = 'delimited'
,TargetTextDelimiter  = ','
,TargetOpenMode       = 'write'
,SelectStmt           = 'SELECT * FROM rtl.base;'

dnoeth 4628 posts Joined 11/04
09 Apr 2016

This is probably related to the session's DateForm setting, ansiDate returns a 10-character string 'yyyy-mm-dd'  and integerDate returns the internal 4-byte integer.
 
Try adding a new job variable:
ExportDateForm = 'IntegerDate'

Dieter

skunkwerk 17 posts Joined 08/15
09 Apr 2016

I was able to get it working by changing it to:
dt = ANSIDATE
 
It runs, but it's very slow (75,000 rows x 2 columns imported/minute).  I have an 800 million row table.
I've tried increasing the number of MaxSessions, and all the other instance variables, but it has no impact on the speed.
I don't think it's a network issue, as I'm able to download files from the Internet at 65 MB/s on the same network.
 
Can someone please recommend how I can speed this up?  It says in the documentation somewhere that SQL selects are slow compared to exports?  How do I do an export without a SQL select?

dnoeth 4628 posts Joined 11/04
10 Apr 2016

You submitted a SQL Select, but used the FastExport protocol, so this is the fastest possible way.
Changing the number of sessions might not help at all, because this is usually set/adjusted by Workload Management.
 
You need to check the logs: tlogview -j yourjobid -f  "*"  -g where yourjobid is the one returned when you start the job: "Job id is qstart2-1"
There's lots of information returned, you need to look at the FILE_WRITER details.
 
75,000 rows per minute is extremely slow, I just tried the QuickStart script & exported 3,000,000 rows to a 130MB flat file from a local TD VM to another local Windows WM and the whole TPT job run in < 10 seconds, < three seconds for the FileWriter part.
 
 

Dieter

skunkwerk 17 posts Joined 08/15
11 Apr 2016

Thanks Dieter
 
Here's my log: http://pastebin.com/bXatXwKk
 
What's interesting is that it says the query itself takes 2-3 seconds, but the time from when it starts to when it finishes is close to 4 minutes.  What could be going on?
 
regards,
imran

dnoeth 4628 posts Joined 11/04
11 Apr 2016

I don't have expertise in reading detailed TPT logs, better wait for Steve Feinholz :-)

 

 

Sending the data was only 6 seconds:

 

**** 15:53:11 Select execution completed. 5850 data blocks generated.

**** 15:53:17 Finished sending rows

 

 
Your system is huge, 5850 AMPs, and you request way too many sessions, one per AMP (I didn't know this is possible for an Export):

**** 15:52:56 Maximum number of special sessions requested: All

**** 15:52:56 Minimum number of special sessions required:  1

              Instance Assigned Connected Result                
              ======== ======== ========= ======================
                  1      2925      299    CLI Error 301
                  2      2925      300    CLI Error 301
              ======== ======== ========= ======================
                Total    5850      599    Warning

 

Logging on 599 sessions runs for more than 2 minutes and logoff was another 40 seconds (there's a TASMSessionLimit event, but it doesn't seem to limit the number).

 

 

Try a single instance and a few sessions, for 290,413 rows a single session should be enough: ExportMaxSessions = 1
When your export is larger you will still use only a few sessions.
You should ask your DBAs, they should know about the settings for TPT Exports to achieve good performance on your system.

Dieter

feinholz 1234 posts Joined 05/08
11 Apr 2016

The default for the Export operator is 1 per available AMP.
Unlike FastExport that defaults to 4.
(We could not have a default of '4' when we support multiple instances, and all of the operators are now consistent with each other.)
The job in question specifies 2 instances, and so ExportMaxSessions needs to be at least 2.
But, as Dieter noted, since you only need 1 (or maybe 2) sessions, you only need 1 instance of the Export operator.
If you were claiming that the entire job took a long time, it looks (to me without seeing the whole log) as if the session connects took up most of the time.
 

--SteveF

You must sign in to leave a comment.