All Forums Tools
EricSantAnna 15 posts Joined 04/11
19 Oct 2012
TPT Instances - How this works?

Hi Teradata gurus,
 
I have developed a process in Java that read some files, add some fields, and load in some pipes to my TPT script.
In my tests I'm loading 6 pipes with 6 separeted Threads.
To load this I have these Producer Operators:

DEFINE OPERATOR PIPE_READER1()
DESCRIPTION 'Define opcoes de leitura de arquivos'
TYPE DATACONNECTOR PRODUCER
SCHEMA T811_R2_EXCLUIDOS_DETRAF_2_SCHEMA
ATTRIBUTES
(
        VARCHAR AccessModuleName = 'np_axsmod.dll'
      , VARCHAR AccessModuleInitStr
      , VARCHAR FileName              = '\\.\pipe\EXCLUIDOS_DETRAF1'
      , VARCHAR Format      = 'DELIMITED'
      , VARCHAR TextDelimiter     = ';'
      , VARCHAR IndicatorMode         = 'N'
      , VARCHAR OpenMode              = 'Read'
);

...         /*PIPE READER 2,3,4,5*/

DEFINE OPERATOR PIPE_READER6()
DESCRIPTION 'Define opcoes de leitura de arquivos'
TYPE DATACONNECTOR PRODUCER
SCHEMA T811_R2_EXCLUIDOS_DETRAF_2_SCHEMA
ATTRIBUTES
(
        VARCHAR AccessModuleName = 'np_axsmod.dll'
      , VARCHAR AccessModuleInitStr
      , VARCHAR FileName              = '\\.\pipe\EXCLUIDOS_DETRAF6'
      , VARCHAR Format      = 'DELIMITED'
      , VARCHAR TextDelimiter     = ';'
      , VARCHAR IndicatorMode         = 'N'
      , VARCHAR OpenMode              = 'Read'
);
and a APPLY like this:
INSERT INTO MYTABLE ...
VALUES ...
TO OPERATOR (DATA_LOAD () [3]) /*DATA_LOAD is an Update Operator*/
  
  SELECT *
  FROM OPERATOR
  (
   PIPE_READER1()[1]
  )

UNION ALL
...        /*PIPE READER 2,3,4,5*/
  
  UNION ALL
  
  SELECT *
  FROM OPERATOR
  (
   PIPE_READER6()[1]
  );

Here we have:
Insert into DATA_LOAD()[using 3 instances] the union of my 6 PIPE_READER [using 1 instances each one]
Works well:
              Rows Inserted: 42782588
              Rows Updated:  0
              Rows Deleted:  0

But I see this on the log:
                        Instance    Rows Sent  
                        ========  =============
                            1        49127971
                            2             290
                            3               0
                        ========  =============
                          Total      49128261

Why the hell it load 49127971 using the first instance, 290 registers using the second and don't use the third instance?
(I had tried using 6 instances, and the others instances are not used too)
UNION ALL is not the solution for the "paralelism"?
I had built this script how the examples on the Teradata Parallel Transporter User Guide.
Another questions:
- How I can measure how much pipes is better to use to balance the load between producer and consumer?
- JMS may be faster than pipes?

feinholz 1234 posts Joined 05/08
19 Oct 2012

The UNION ALL is to get the parallelism from the producer operator side.
When sending the rows to the consumer operator (the Update operator in your case), the data is not sent in a round robin fashion to the instances. That would hurt performance due to the context switching.
Instead, we send the data to the first instance. And if that first instance can keep up with the rate at which the data is going through the data streams, it will get all of the work.
When the first instance cannot keep up, we will begin to send data to the 2nd instance.
As you can see, the 3rd instance got no work, meaning you do not need 3 instances for that job.
In fact, you really do not need the 2nd instance either because it did so little. If you take away the 3rd instance, more sessions will be distributed to the other 2 instances and you will probably notice that the 2nd instance will get no rows.
In other words, the bottleneck is still with your pipes. A single instance of the Update operator can keep up with the rate that the data from those 6 pipes is feeding data to it.

--SteveF

EricSantAnna 15 posts Joined 04/11
19 Oct 2012

Great feinholz!
Now everything makes sense.
Then, I could increase the number of pipes(Java Threads), if not for the high CPU usage to open more files. (I had dozens of process like these running!)
Thanks!
 

03 Jul 2013

That was a great explanation, feinholz. You made it really clear.
Now, for a Producer Dataconnector Operator, when reading a flat file, can I guarantee that the more instances I use, the faster the file will be read?
Are the rows of the flat file evenly distributed across all instances I set?

feinholz 1234 posts Joined 05/08
03 Jul 2013

We do support the use of multiple instances reading from a single file. We have seen some nice performance improvement (due to the operating system caching the I/O reads). However, YMMV.
As far as multiple instances reading from different files, yes you can get good performance improvements by using more instances. However, you have other issues to deal with. Namely, disk head contention. If all files are on the same drive, the disk head contention and other operating system environmental issues might affect performance.
So, guarantee? No, but in a lot of cases, yes.
When using multiple instances to read a single file, yes the rows are distributed evenly across the reader instances.
When using multiple instances to read from multiple files, we load balance the files across the instances according to the file sizes.

--SteveF

indranilbosu 3 posts Joined 02/16
19 Feb 2016

Can I use more than one instances of LOAD_OPERATOR if I am using FASTLOAD as load method?

Fred 1096 posts Joined 08/04
19 Feb 2016

You can specify multiple instances for a LOAD_OPERATOR. Sessions will be divided among the instances as evenly as possible.

You must sign in to leave a comment.