

Introducing TDM into a production Dual System environment
Teradata Data Mover can be a valuable product to add into a dual system production environment either as a scheduled / triggered step in your load process or for ad-hoc re-synchronization of tables.
Teradata Data Mover introduction
With Teradata Data Mover, jobs can be defined to copy specified database objects, such as tables and statistics, from one Teradata Database system to another. In addition, Teradata Data Mover can copy complete databases.
In addition to tables and statistics, the following database objects associated with tables can be copied with the tables, but not by themselves:
- hash indexes
- join indexes
- journals
- triggers
Dual-system support with active replication between two systems is provided through Teradata Data Mover's command-line interface, which allows the user to:
- Enable regular full or partial table copies as part of a dual-system synchronization solution
- Recover databases or tables in response to unexpected system failures or other data loss
- Copy partial production data to a QA system to provide necessary data for accurate pre-production testing
Architecture considerations
The DMAgent can run on the server with the DMDaemon and the DMCommandLine. However you may achieve better performance by running the DMAgent on a second server. Remember that all data transferred is handled by the DMAgent and the system it resides on.
As you would expect, using multiple DMAgents can result in increased throughput, but remember you cannot run more than one DMAgent on a single server. Plus the actual number of requested concurrent tasks and the agent.maxConcurrentJobs setting will affect the ability to effectively use multiple DMAgents.
The network configuration is critical for good performance of large data copying. Two areas of network configuration are -
(a) the network switch used should not be a shared port switch. Sharing switch resources can have drastic effects on the effective bandwidth available at any given time based on the other systems that are also using that switch
(b) additional performance gains can be achieved by using multiple network connections between the Teradata Data Mover server(s) and each of the Teradata database system, such that one set of connections supports traffic to the primary Teradata system and the other set supports traffic to the secondary Teradata system.
Monitoring TDM components
The Teradata Data Mover components include not only hardware components but also processes should be monitored. Teradata Data Mover has two daemon processes, DMDaemon and DMAgent that manage and control Data Mover jobs. These daemon processes are expected to be continuous running processes and should be registered in TMSM as components. On the Teradata Data Mover server(s), these processes, DMDaemon and DMAgent, are typically configured as services that begin executing when the server is booted and stop executing when the server is shut down.
The following monitor script is an example of what you can place on each Teradata Data Mover server, and can be called by cron every few minutes -
#!/bin/bash DMroot=/opt/teradata/client/14.00/datamover system=`hostname` if [ $system == "TDMprimary" ] then ECO="Region1" else ECO="Region2" fi SENDEVENT="/usr/bin/sendevent -e $ECO -t Heartbeat -s $system" # Teradata Data Mover daemon health check section resource=TDMdaemon-$system $DMroot/commandline/datamove list_agents > /tmp/DMcheck_$$.log 2>&1 daemon=`grep -c "Connected to Daemon" /tmp/DMcheck_$$.log` if [ $daemon -eq 1 ] then # daemon is running $SENDEVENT --et HCHK -r $resource -m "DMdaemon is running" fi # Teradata Data Mover agents health check section resource=TDMagent-$system agentname=`grep agent.id $DMroot/agent/agent.properties|cut -d'=' -f2` agent=`grep -c "$agentname found to be communicating" /tmp/DMcheck_$$.log` if [ $agent -eq 1 ] then # an agent is active $SENDEVENT --et HCHK -r $resource -m "DMagent is running" fi rm -f /tmp/DMcheck_$$.log
Since a process can be seen in the process stack but be non-responsive, the above script method for monitoring is preferred since it is a high-level acknowledgement that the process is alive on the server. For this reason, TDM is used to verify that the TDM processes are alive and functioning.
This article will not cover the details of setting any desired TMSM thresholds or actions, but the Teradata Data Mover components should be monitored, with appropriate thresholds and alert actions for a critical production sub-system.
Static vs dynamic jobs
There are two approaches for defining and using Data Mover jobs:
-
Static jobs that are pre-defined for copying specific database objects from a specific source system to a specific target system.
-
Dynamic jobs that are built on an as-needed basis and can target any database object from any appropriate source system to any appropriate target system.
Each approach has its benefits and drawbacks.
Static approach drawbacks: requires the objects, source and target systems, the direction of copy and the type of copy (full verses partial) to be pre-defined.
Static approach advantages: a job can include a set of tables that may be copied in parallel, and the job startup time is minimal.
Dynamic approach drawbacks: each job handles just one table and the job startup involves more steps, so it takes more time (this extra overhead time may, or may not, be significant for any specific customer environment or SLA needs).
Dynamic approach advantages: highly flexible and provides for easily creating jobs as needed in a production cycle – as the table load order changes (as tables are, or are not, loaded) the dynamic approach does not require any special logic to handle these changes.
We will look at both approaches.
Triggering TDM jobs
By schedule, such as from an ETL script at the successful completion of a load job, or directly from an enterprise scheduler following the successful load of a table, or trigged from TMSM when a load job is recognized as completing.
As needed, on an ad-hoc basis or from TMSM as part of a table sync/re-sync action following the detection of an out-of-sync condition for a pair of tables.
Teradata Data Mover table validation
Teradata Data Mover UOWhealthAmt
value may not match the ETL UOWhealthAmt value from the source load due to any deletes executed as part of the load and, if ARC is used, ARC moving complete partition or table. Because of this, table validation may be desired for every Teradata Data Mover job.
You may wish to validate tables using a custom method that does a checksum on all but LOBs and timestamp fields of a table, and if it is a partial table copy that was done, the records included could be just those for the current partition or the last 30 days, or whatever seems appropriate.
Since it is desirable to do table validation for most table copies, you may wish to do it for every copy and execute the table validation at the successful completion of every job.
The dynamic table copy script tdm_template.sh
calls the table validation script checksum.sh
for the table copied against both the source and the target database systems.
Both tdm_template.sh
and cheksum.sh
scripts are provided as examples at the end of this article.
Monitor TDM jobs
Of course you want to be actively monitoring any Teradata Data Mover jobs regardless of how they might have been triggered. Teradata Data Mover will send TMSM events if configured to do so and this is controlled by the value of the tmsm.mode
key – the default value is NONE
and should be set to ONLY_REAL_TMSM
. Additionally, to more easily identify which TMSM event is associated with which trigger action, you can pass a Unit-Of-Work id (UOWid) to the datamover command so all events generated by the datamover job will match up with the external UOWid. This is accomplished by either specifying -uowid $UOW
as a parameter to the start
, re-start
or move
command, or adding a <uowid>$UOW</uowid>
as the penultimate line of the xml file used for the job definition (assuming that you have set UOW to be the desired UOWid to use). If using the tdm_template.sh
script example, the UOWid
is a required parameter and it is passed to datamover for you.
Observations and best practices
-
Teradata Data Mover cannot handle deleted records, so any load process that includes the possibility of deleting rows is not a good candidate for Teradata Data Mover unless full table copy mode is used.
Note: For a partial table copy, if data has been deleted from the source system removed rows will still exist in the target. - When using Data Mover’s Command Line Interface (CLI) always invoke commands using a single user id, such as the dmuser, to avoid log file write permissions. CLI always uses the current user and associated permissions to write to the dmCommandLine.log. Set the umask for dmuser to 002 to allow other members of the same group (defaults to users) to also write to the dmCommandLine.log, and public to be able to read it.
-
Create multiple dedicated Teradata user accounts to facilitate multiple ARC jobs when running concurrent Data Mover jobs. Each Teradata Data Mover job using ARC requires sole userid access (no other session for this userid can exist while the job runs. Teradata Data Mover will default to ARC when possible. The sample script
tdm_template.sh
utilizes userids from a dedicated pool and<use_userid_pool>true</use_userid_pool>
is set so that Teradata Data Mover will manage the pool of users itself (using each available id as needed and waiting for one to become available when they are all in use). See section ofconfiguration.xml
below:<property> <key>job.useUserIdPool</key> <value>true</value> <targetUserPool> <system name="ASTRO"> <user> <name>tdm_user0</name> <password>pw0</password> </user> <user> <name>tdm_user1</name> <password>pw1</password> </user> <user> <name>tdm_user2</name> <password>pw2</password> </user> </system> </targetUserPool> <description>Purpose: Use a target user from the pool of users. This enables running multiple arc tasks at the same time</description> </property>
-
Define a Teradata Data Mover job naming convention that identifies the direction of flow (e.g.
TBL_A-Primary_Secondary
). -
When creating static jobs, for each job created, create a second job that reverses the flow to allow for the situation where a failover to the Secondary system is necessary, and loads are re-directed to the Secondary system (e.g.
TBL_A-Primary_Secondary
andTBL_A-Secondary_Primary
). - When doing partial copies on a PPI table, either with static or dynamic jobs, ARC can be used for faster copies when the constraint field is also the partitioning field (such as a ActivityDate)
Example Scripts
Note: While these scripts have been verified, they are provided “as is” with no support of any type.
tdm_template.sh
#!/bin/bash # template for dynamically creating TDM cli jobs # assumes user id dmuser exists and has log and xml directory # with tdm_template.xml file in the xml directory if [ $# -lt 6 ] then echo "Usage: $0 Source_system Target_system Database Table (F|P) UOW" exit 1 fi cmd="$0 $@" source=$1 source_user=tdm_user0 source_pw=Teradata target=$2 db=$3 table=$4 mode=$5 UOW=$6 streams=4 # this should be increased if table size is greater than 50Gb Ssess=2 Tsess=`expr $Ssess \* 2` job=${table}-${source}_${target} HOME=/home/dmuser XML=$HOME/xml LOGDIR=$HOME/log rm -f $LOGDIR/$job-*.log # cleanup from any prior successful run LOGFILE=$LOGDIR/$job-`date +"%Y%m%d%H%M"`.log ERRFILE=$LOGDIR/$job-`date +"%Y%m%d%H%M"`.err touch $LOGFILE chmod 666 $LOGFILE system=`hostname` type="TDM" resource=TDM-$table if [ $system == "dirac" ] then ECO="einstein" else ECO="innovation" fi SENDEVENT="/usr/bin/sendevent –app DataMover -j $job -r $resource -t $type -e $ECO -s $system -w $UOW" DM=/opt/teradata/client/14.00/datamover/commandline/datamove $SENDEVENT --et START echo "source=$source" > $LOGFILE echo "target=$target" >> $LOGFILE echo "db=$db" >> $LOGFILE echo "table=$table" >> $LOGFILE echo "mode=$mode" >> $LOGFILE echo "UOW=$UOW" >> $LOGFILE echo "$SENDEVENT --et START" >> $LOGFILE echo "Job $job called `date`" >> $LOGFILE rm -f $XML/$UOW.xml cp $XML/tdm_template.xml $XML/$UOW.xml 2>>$LOGFILE if [ $? -ne 0 ] then msg="ERROR: $XML/tdm_template.xml does not exist or is not readable" echo $msg $SENDEVENT --et ALERT -l 10 -a 1012 -m "$msg" exit 1 fi chmod 666 $XML/$UOW.xml echo "<job_name>$job</job_name>" >> $XML/$UOW.xml echo "<source_tdpid>$source</source_tdpid>" >> $XML/$UOW.xml echo "<source_user>$source_user</source_user>" >> $XML/$UOW.xml echo "<source_password>$source_pw</source_password>" >> $XML/$UOW.xml echo "<target_tdpid>$target</target_tdpid>" >> $XML/$UOW.xml echo "<target_user></target_user>" >> $XML/$UOW.xml echo "<target_password></target_password>" >> $XML/$UOW.xml echo "<use_userid_pool>true</use_userid_pool>" >> $XML/$UOW.xml echo "<data_streams>$streams</data_streams>" >> $XML/$UOW.xml echo "<source_sessions>$Ssess</source_sessions>" >> $XML/$UOW.xml echo "<target_sessions>$Tsess</target_sessions>" >> $XML/$UOW.xml echo "<overwrite_existing_objects>true</overwrite_existing_objects>" >> $XML/UOW.xml echo "<log_level>1</log_level>" >> $XML/$UOW.xml echo "<online_archive>false</online_archive>" >> $XML/$UOW.xml echo "<database selection=\"unselected\">" >> $XML/$UOW.xml echo "<name>$db</name>" >> $XML/$UOW.xml echo "<table selection=\"included\">" >> $XML/$UOW.xml echo "<name>$table</name>" >> $XML/$UOW.xml echo "<compare_ddl>false</compare_ddl>" >> $XML/$UOW.xml utility="default" if [ "$mode" == "P" ] then # this section assumes a common timestamp field is available for last modified date/time echo ".logon $target/$Suser,$Spw" > $XML/$table.btq echo ".set WIDTH 128" >> $XML/$table.btq echo "select max(last_update_dttm) from $db.$table;" >> $XML/$table.btq echo ".quit" >> $XML/$table.btq bteq < $XML/$table.btq > $XML/$table.ts TS="`tail -7 $XML/$table.ts|head -1`" TS=`echo $TS|sed "s/ //g"` # strips any spaces first=`echo "$TS"|cut -c1` if [ "$first" == "2" ] then # first character matches expected year start, so prepare partial copy code utility="tptapi_load" dt=`echo $TS|cut -c1-10` tm=`echo $TS|cut -c11-` TS=$dt" "$tm echo ".logon $source/$Suser,$Spw" > $XML/$table.btq echo ".set WIDTH 128" >> $XML/$table.btq echo "show table $db.$table;" >> $XML/$table.btq echo ".quit" >> $XML/$table.btq bteq < $XML/$table.btq > $XML/$table.pi # get first unique index definition UI=`grep "UNIQUE" $XML/$table.pi|head -1` if [ -z "$UI" ] then # if no unique index defined, get all column names echo ".logon $source/$Suser,$Spw" > $XML/$table.btq echo ".set WIDTH 128" >> $XML/$table.btq echo "select 'COL:'as \"col\",ColumnName from dbc.\"columns\" where DatabaseName = '${db}' and TableName = '${table}';" >> $XML/$table.btq echo ".quit" >> $XML/$table.btq bteq < $XML/$table.btq > $XML/$table.col PI=`grep ^COL: $XML/$table.col|grep -v LAST_UPDATE_DTTM|sed "s/COL://"` else PI=`echo $UI|cut -d'(' -f2|cut -d')' -f1|sed "s/,//g"` fi echo "<sql_where_clause>" >> $XML/$UOW.xml echo "<![CDATA[" >> $XML/$UOW.xml echo "WHERE Last_Update_Dttm > '$TS' ]]>" >> $XML/$UOW.xml echo "</sql_where_clause>" >> $XML/$UOW.xml echo "<key_columns>" >> $XML/$UOW.xml for key in $PI do echo "<key_column>$key</key_column>" >> $XML/$UOW.xml done echo "</key_columns>" >> $XML/$UOW.xml fi rm -f $XML/$table.btq $XML/$table.ts $XML/$table.pi $XML/$table.col fi echo "</table>" >> $XML/$UOW.xml echo "</database>" >> $XML/$UOW.xml echo "<uowid>$UOW</uowid>" >> $XML/$UOW.xml echo "</dmCreate>" >> $XML/$UOW.xml echo "cleaning up and removing any earlier version of job $job" >> $LOGFILE $DM stop -job_name $job >> $LOGFILE 2>&1 $DM cleanup -job_name $job >> $LOGFILE 2>&1 $DM delete_job -job_name $job -all -skip_prompt >> $LOGFILE 2>&1 echo "creating tdm job $job" >> $LOGFILE try=0 while true do if [ "$utility" != "default" ] then $DM create -force_utility $utility -f $XML/$UOW.xml >> $LOGFILE 2>&1 else $DM create -f $XML/$UOW.xml >> $LOGFILE 2>&1 fi if [ $? -eq 0 ] then break else err=`tail $LOGFILE|grep -c "Error: cannot connect to Daemon"` if [ $err -eq 1 -a $try -lt 3 ] then try=`expr $try + 1` echo "Create failed: re-trying [$try]" >> $LOGFILE sleep 180 else msg="Job create error for $job - see $ERRFILE" echo $msg $SENDEVENT --et ALERT -l 10 -a 1012 -m "$msg" mv $LOGFILE $ERRFILE echo " " >> $ERRFILE echo "To re-run this job, execute the following command:" >> $ERRFILE echo "$cmd" >> $ERRFILE chmod 666 $ERRFILE rm -f $XML/$UOW.xml $LOGDIR/$UOW.status exit 1 fi fi done try=0 while true do $DM start -job_name $job -sync >> $LOGFILE if [ $? -eq 0 ] then $DM status -job_name $job -output_level 3 > $LOGDIR/$UOW.status rows=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/ */ /g"|cut -d' ' -f7` bytes=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/ */ /g"|cut -d' ' -f8` agent=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/ */ /g"|cut -d' ' -f9` if [ $agent == "ARC" ] then util=$agent agent=`grep COPY $LOGDIR/$UOW.status|tail -1|sed "s/ */ /g"|cut -d' ' -f9` else util=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/ */ /g"|cut -d' ' -f10` fi duration=`grep ^Job: $LOGDIR/$UOW.status|sed "s/ */ /g"|cut -d' ' -f8` sec=`echo $duration|cut -d':' -f3` min=`echo $duration|cut -d':' -f2` hr=`echo $duration|cut -d':' -f1` hsec=`expr $hr \* 3600` msec=`expr $min \* 60` secs=`expr $hsec + $msec` secs=`expr $secs + $sec` rate=`expr $bytes / $secs` rate=`expr $rate \* 3600` msg="Success | $job | rows=$rows | bytes=$bytes | duration=$duration | rate=$rate-bytes/hr | agent=$agent | utility=$util" echo $msg cat $LOGDIR/$UOW.status >> $LOGFILE $HOME/bin/checksum.sh $db $table $source $user $pw >> $LOGFILE 2>&1 $HOME/bin/checksum.sh $db $table $target $user $pw >> $LOGFILE 2>&1 $SENDEVENT --et END -v $rows -m "$msg" rm -f $XML/$UOW.xml $LOGDIR/$UOW.status $LOGDIR/$UOW.log exit 0 else sleep 60 err=`tail $LOGFILE|grep -c "Daemon cannot find any Agents"` if [ $err -gt 0 -a $try -lt 2 ] then # try again try=`expr $try + 1` echo "Unable to find Agent: re-trying [$try]" >> $LOGFILE else UserErr=`$DM status -job_name $job -output_level 4|grep -c ARC0700` if [ $UserErr -gt 0 -a $try -lt 3 ] then try=`expr $try + 1` echo "User already logged on: re-trying [$try]" >> $LOGFILE $DM cleanup -job_name $job > /dev/null sleep 180 else msg="Failed - see $ERRFILE" echo $msg $SENDEVENT --et ALERT -l 10 -a 1012 -m "$msg" mv $LOGFILE $ERRFILE echo " " >> $ERRFILE echo "To get status details execute the following command:" >> $ERRFILE echo "$DM status -job_name $job -output_level 4" >> $ERRFILE echo " " >> $ERRFILE echo "To re-run this job, execute the following command:" >> $ERRFILE echo "$cmd" >> $ERRFILE chmod 666 $ERRFILE rm -f $XML/$UOW.xml $LOGDIR/$UOW.status exit 1 fi fi fi done
tdm_template.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.teradata.com/dataMover/v2009/DataMover.xsd">
checksum.sh
#!/bin/bash # File: checksum.sh - creates checksum on all fields in a table # except for LOB and timestamp fields Usage="Usage: $0 DataBase TableName System User Pw" if [ $# -ne 5 ] then echo $Usage exit 1 fi db=$1 tab=$2 sys=$3 user=$4 pw=$5 bteq <<EOF > /tmp/$tab$DATE.columns .logon $sys/$user,$pw select 'COL:',columnname,columntype from dbc.columns where databasename='$db' and tablename='$tab' and columntype not in ('CO','BO','TS'); .quit EOF grep ^COL: /tmp/$tab$DATE.columns|sed "s/ */ /"|cut -d' ' -f2 > /tmp/$tab$DATE.col Columns= for col in `cat /tmp/$tab$DATE.col` do Columns=$Columns,$col done Columns=`echo $Columns|sed "s/,//"` bteq <<EOF > /tmp/$tab$DATE.sum .logon $sys/$user,$pw select 'CHECK:',SUM(CAST(HASHBUCKET(HASHROW($Columns)) AS DECIMAL(38,0))) from $db.$tab; .quit EOF CheckSum=`grep ^CHECK: /tmp/$tab$DATE.sum|sed "s/ */ /"|cut -d' ' -f2|cut -d'.' -f1` echo $tab:$sys:$CheckSum rm -f /tmp/$tab$DATE.*