All Forums Tools
Kir Bojo 12 posts Joined 06/06
30 Jun 2006
FastLoad Performance Tuning for large volumes of input data

Hi,I am facing a problem Loading a data from an input file. Though the data is being successfully loaded to respective table in Teradata, the time taken for this task is high.Some important information are:The system is MP RAS UNIXThe input records are read as VARTEXT**** 15:55:45 Number of FastLoad sessions requested = 8**** 15:55:51 Number of AMPs available: 10TOTAL RECORD LENGTH = 3502Total number of records inserted is 778100Total CurrentPerm of the table after successful loading is 378,142,720Time taken to load is appr. 8 hrs (1hr for 100000 records)Can someone help me out to increase the speed of the task for the above scenario.Thanks,

Barry-1604 176 posts Joined 07/05
30 Jun 2006

Could you give us some insight into your target table? What is the primary index of your target table and how unique is the primary index?

Kir Bojo 12 posts Joined 06/06
02 Jul 2006

Target table is created as below:CREATE TABLE fssLoad_t( UnixTimestamp char(24), transDateTime TIMESTAMP NOT NULL, UserID char(6) NOT NULL, ErrorCode Integer, InterfaceType Integer, RowLimit Integer, RowsFound Integer, BytesRetrieved Integer, RowsReturned Integer, SizeReturned Integer, DBRespTime Decimal(10,2), TotalTransTime Decimal(10,2), ConstrainedTime Decimal(10,2), GSEQueryCost Integer, TotalCPUCost Integer, SQLString varchar(3000), ClientAppID varchar(25), ClientApplVersion char(35), ClientObjectID char(35), ClientObjectVersion char(35), TopEndProd varchar(10), GSEServerVer char(15), GSEDicVer char(15), GSEViews varchar(50) , OraInstNames varchar(30), ReqOptimised char(10), NoInstruments Integer default NULL, NoFetches smallint default NULL ) PRIMARY INDEX(UserID);; As you can see above, there is a NUPI on UserID.Thanks,

ANGEL 20 posts Joined 02/05
03 Jul 2006

HI,As stated ion your query you have NUPI defined on the table. You have not mentioned whether you table is SET or MULTISET. Assuming your table is SET, then Teradata does a duplicate row check for SET table with No UNIQUE INDEX defined. . Therefore, a SET table with many rows per NUPI has a terrible insert performance. Create a MULTISET table and see the change in performance.Hope this helps.ThanksAngel

Kir Bojo 12 posts Joined 06/06
03 Jul 2006

Thanks Angel.I created the table as MULTISET and made a test run for 100000 records. There was slight improvement in the speed but still the performance is hit by time consumption of little less than an hour for the above set of records.Please help me out in further improving the performance.Thanks,

BBR2 96 posts Joined 12/04
03 Jul 2006

You may want to see the skew factor for the table.You can use the following query or you may use Teradata administrator to find this.It is best if we have a skewfactor of about 5-8 for large tables.This is only possible by selecting a unique enough primary index.By selecting a unique enough PI, we ensure that the data is evenly distributed over AMPS. Another solution to reduce the run time is to load the data into staging table having composite PI. You may do a INSERT..SELECT. In any case, there will be re-distribution of rows when you do INSERT..SELECT.Hope this helps. SEL DatabaseName,TableName ,(100 - (CAST(AVG(CurrentPerm) AS DECIMAL(18,2)) /MAX(Currentperm)*100))FROM DBC.tablesizeGROUP BY DatabaseName, TableNameWHERE DatabaseName IN ('DB_Name')AND TABLENAME = 'TableName'ORDER BY databasename, TableName;

Kir Bojo 12 posts Joined 06/06
04 Jul 2006

Thanks Vinay.The reply was more informative. I checked out for the skew factor and it falls between 5-8. I have a requirement constraint that i can't create a Unique PI. In fact, the records cannot support a unique PI.I tried with your second solution of having a composite PI. But, eventhough there was an even distribution to AMPs, it still takes the same time to load 100000 records.Kindly let me know if any other information is required.If someone can tell me what is the optimum time consumed by the FastLoad utility for loading data from input file which takes a permspace of 100,000,000 on Teradata and the necessary configuration information for this optimal performance, then it would be of great help.Please help me. Thanks,

BBR2 96 posts Joined 12/04
04 Jul 2006

Hmmm...Can you post more information * Network bandwidth details (client/host)* Current workload details* Number of sessions used by Fast load* Actual process - Are you dropping and recreating the table or truncating* If you are truncating, you may want to use DELETE..ALL option (this may not be the problem as you seem to have localized the problem to the fast load script itself)* Probably snippets from your fast load log* It may very well be network issue in reading the file if the file is on the file serverI don't think 1 GB of data loads to be an issue.It is typical to load 100's of GB of data w/out performance impact (i am be overboard here)Lets see if others have better suggestion than this.Vinay

BBR2 96 posts Joined 12/04
04 Jul 2006

Hmmm...Can you post more information * Network bandwidth details (client/host)* Current workload details* Number of sessions used by Fast load* Actual process - Are you dropping and recreating the table or truncating* If you are truncating, you may want to use DELETE..ALL option (this may not be the problem as you seem to have localized the problem to the fast load script itself)* Probably snippets from your fast load log* It may very well be network issue in reading the file if the file is on the file serverI don't think 1 GB of data loads to be an issue.It is typical to load 100's of GB of data w/out performance impact (i am be overboard here)Lets see if others have better suggestion than this.Vinay

Kir Bojo 12 posts Joined 06/06
05 Jul 2006

Please find below some snippets of the output log of fastload: **** 10:28:43 Processing starting at: Tue Jul 4 10:28:43 2006**** 10:28:43 Character set has been set to: ASCII**** 10:28:43 Tenacity Enabled: 3 hour(s)**** 10:28:43 Sleep Minutes Set: 5 minute(s)**** 10:28:43 Buffer size has been set to 49 KBytes0001 SHOW VERSION; FastLoad Version 07.05.00.002 for UNIX 5.4 running Streams TCP/IP0003 RECORD 1 thru 100000; **** 10:28:43 Starting record number set to : 1**** 10:28:43 Ending record number set to : 100000**** 10:28:43 Current CLI or RDBMS allows maximum row size: 64K**** 10:28:43 Character set for this job: ASCII**** 10:28:45 Number of FastLoad sessions connected = 10**** 10:28:45 FDL4808 LOGON successful Here I dropped the destination and error tables. Here I created a destination table as in previous post with MULTISET and PRIMARY INDEX(UserID,transDateTime).**** 10:28:50 Now set to read 'Variable-Length Text' records**** 10:28:52 Number of AMPs available: 10**** 10:28:52 BEGIN LOADING COMPLETE**** 10:28:52 FDL4803 DEFINE statement processedTOTAL RECORD LENGTH = 3502 Here I did INSERT INTO...VALUES...**** 10:28:53 Number of recs/msg: 14**** 10:28:53 Starting to send to RDBMS with record 1**** 10:33:34 Sending row 10000**** 10:39:44 Sending row 20000**** 10:44:58 Sending row 30000**** 10:49:53 Sending row 40000**** 10:54:34 Sending row 50000**** 10:59:52 Sending row 60000**** 11:05:08 Sending row 70000**** 11:10:47 Sending row 80000**** 11:16:22 Sending row 90000**** 11:22:01 Sending row 100000**** 11:22:52 Finished sending rows to the RDBMS0015 END LOADING; **** 11:22:54 END LOADING COMPLETE Total Records Read = 100000 - skipped by RECORD command = 0 - sent to the RDBMS = 100000 Total Error Table 1 = 0 ---- Table has been dropped Total Error Table 2 = 0 ---- Table has been dropped Total Inserts Applied = 99987 Total Duplicate Rows = 13 Start: Tue Jul 4 11:22:53 2006 End : Tue Jul 4 11:22:54 2006**** 11:22:58 Logging off all sessions**** 11:22:59 Total processor time used = '1.44 Seconds' . Start : Tue Jul 4 10:28:43 2006 . End : Tue Jul 4 11:22:59 2006 . Highest return code encountered = '0'.**** 11:22:59 FDL4818 FastLoad TerminatedFrom above it is clear that most of the time is consumed in Loading data. Checkpoint can seem to be a concern, but there was no improvement even without checkpoints.There is not much workload on NCR node.Thanks,

Fred 1096 posts Joined 08/04
07 Jul 2006

Is the FastLoad running directly on one of the database (TPA) nodes or on a separate application server? Client CPU usage appears not to be an issue (1.44 seconds). Have you looked at disk I/O and network throughput on the box running FastLoad? And network througput on the Teradata node(s)?Mismatched ethernet speed or duplex settings between an adapter and a network switch can result in a connection that appears to work for trivial requests but is extremely slow when handling large data volume.As an aside, note that FastLoad 7.5.0 dates from 2002 (TTU6.2). I don't think really that's causing your problem, but you might want to consider upgrading.

Kir Bojo 12 posts Joined 06/06
11 Jul 2006

Thanks Fred.FastLoad is running on a separate application server and not on the TPA node.The application server is in a different network and the TPA node in another. Both the networks are configured for 100Mbps transfer speeds. It can well be a network issue, which i am investigating. Meanwhile, any suggestions and views are welcome.Thanks,

Kir Bojo 12 posts Joined 06/06
12 Jul 2006

Hi All,Thanks for your help.The issue is resolved. The cause of the delay was due to network and switches.Thanks,

You must sign in to leave a comment.