

Executing Partial Table Copies with Teradata Data Mover
In the last Teradata Data Mover (TDM) article (Introduction to Teradata Data Mover: Create your first job), we discussed creating and executing a TDM job to copy a full table between Teradata systems. This use case is very common in the field when customers want to initially populate the target Teradata system with the same table that exists on the source Teradata system. Customers will not want to copy the entire table to the target system every time changes are made to the source system though. Tables on production systems can get quite large and it doesn't make sense to copy the entire table when only a subset of rows have been changed since the last copy took place. This is why TDM supports executing partial table copies as well as full table copies.
Determine What Changes Need To Be Copied
TDM only copies objects when told to do so by the user (either through the TDM command-line interface or the Viewpoint portlet). TDM does NOT automatically detect when changes have been made to the source system and then copy those changes to the target system. As a result of this, users must tell TDM what part of the table needs to be copied if they don't want the entire table copied in the job. Determining what changes have been made to a table can be complicated, but most customers accomplish this by maintaining a date/timestamp or batch id column in their source tables to track when changes have occurred.
Create a Partial Copy TDM Job
Let's use the same jg185041.items1
table we created in the Introduction to Teradata Data Mover: Create your first job article to create a partial copy TDM job. In this example, let's assume we want to copy all rows that correspond to when sales were made after 2009. This of course assumes that the date values in the table reflect when the row was updated and the sale was made. Here is the XML file we will use to create a TDM job that will copy all rows corresponding to sales made after 2009:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.teradata.com/dataMover/v2009/DataMover.xsd"> <job_name>jg_partialcopy_devX</job_name> <source_tdpid>dmdev</source_tdpid> <source_user>jg185041</source_user> <source_password>jg185041</source_password> <target_tdpid>dmsmp</target_tdpid> <target_user>jg185041</target_user> <target_password>jg185041</target_password> <log_level>1</log_level> <database selection="unselected"> <name>jg185041</name> <table selection="included"> <name>items1</name> <sql_where_clause><![CDATA[ WHERE salesdate > '2009-12-31']]></sql_where_clause> <key_columns> <key_column>deptId</key_column> <key_column>prodId</key_column> <key_column>salesdate</key_column> </key_columns> </table> </database> </dmCreate>
As you can see in the XML file above, the <sql_where_clause>
and <key_column>
tags are used to tell TDM that this is a partial table copy job. TDM will pass the where clause statement as-is to the underlying utility used to extract the data and then it will transfer only those extracted rows to the target table. TDM will also create the target table if it doesn't already exist prior to copying the data. If the target table already exists and has rows in it then TDM will copy the data to a staging table first before using merge or insert-select to get the data from the staging table to the target table. The key columns specified in the job will be used in the merge or insert-select statement to uniquely identify the rows that need to be updated in the target table. Specifying the proper key columns is required so that TDM only updates the appropriate rows in the target table.
Here is the status output of executing this job:
dm-agent4:/home/jg185041/datamover/testing # datamove status -job_name jg_partialcopy_devX -output_level 3 Data Mover Command Line 13.10.00.03 Status mode Command parameters: - job_name: jg_partialcopy_devX - output_level: 3 Permission authorized Requesting status... Connected to Daemon version 13.10.00.03 Job Name: jg_partialcopy_devX Job Execution Name: jg_partialcopy_devX-20111228233504EST TYPE ID STATUS CURRENT STEP START TIME DURATION TIME ----------------------------------------------------------------------------------------------------------------------------------------------------- Job: 25 COMPLETED_SUCCESSFULLY 4 12/28/11 11:35 PM 0:0:26 TYPE ID STATUS STEP TYPE START TIME DURATION TIME ----------------------------------------------------------------------------------------------------------------------------------------------------- Step: 3 COMPLETED_SUCCESSFULLY MOVE_TABLE_DATA 12/28/11 11:35 PM 0:0:24 Step: 4 COMPLETED_SUCCESSFULLY RESOLVE_TABLE_AFTER_LOAD 12/28/11 11:35 PM 0:0:1 TYPE ID NAME MOVE PHASE STATUS TYPE ROWS BYTES AGENT UTILITY TIMESTAMP ----------------------------------------------------------------------------------------------------------------------------------------------------- Task: 221 "jg185041"."items1" MOVING_DATA ARCHIVE table 19 2624 Agent1 ARC 12/28/11 11:35 PM Task: 221 "jg185041"."items1" MOVING_DATA COMPLETE table 19 2624 Agent1 ARC 12/28/11 11:35 PM Task: 221 "jg185041"."items1_b28cc4_t" MOVING_DATA COPY table 19 2624 Agent1 ARC 12/28/11 11:35 PM Task: 222 POST_DATA_MOVE COMPLETE Agent1 SQL 12/28/11 11:35 PM
You can see in the status output that only 19 rows were copied in this job instead of the full 200 in the source table. You can also see that ARC was used to execute this partial table copy. This might seem strange, but ARC can actually be used to execute partial table copies when PPI tables are specified in the TDM job. TDM will use TPTAPI or JDBC to execute partial table copies in all other cases though.
The next TDM article will discuss how to execute dynamic partial table copies with TDM.
Hi jgordon,
I am having 2 issues with partial copy can you help me out
Is it possible to copy a table partially using ARC when it's having a Identity Column.
And my second question would be my partial job is failing the Row Count Validation. As it was comparing the row count on target side with where caluse on target sides control date can't it check the source control date.
Any help would be appreciated. Thanks in Advance