blog icon Recent Articles

With today's businesses directly tied to mission-critical applications for decision making, continuity and availability are vital requirements for the success of Active Data Warehousing. As such, plans for recovery from any failure must be introduced into the design and deployment of ETL jobs as early as possible. One of the main challenges for such a plan or ETL design is how to quickly recover from a failure.

While transactional processing through the use of “message queues” is a common approach in ADW today, the file-oriented approach is also begining to find its way into ADW due to its simplicity in nature and ease-of-control. Today, many companies monitor and store thousands--or hundreds of thousands--of transactions per day across their branches and stores. Transactional data is usually collected and stored as files in directories, before being merged into the enterprise-wide data warehouse. In fact, there have been Teradata sites which extract transactional data from message queues, pre-process them, and store them into different directories based on transaction types, in an “active” manner. By “active”, we mean the files are created as transactions are collected.

With traditional Teradata utilities such as Fastload, Multiload, and TPump, multiple data files are usually processed in a serial manner. For example, if the data to be loaded into the Data Warehouse reside in several files, they must be either concatenated into a single file before data loading or processed sequentially on a file-by-file basis during data loading.

In contrast, Teradata Parallel Transporter (TPT) provides a feature called “directory scan” which allows data files in a directory to be processed in a parallel and scalable manner as part of the loading process. In addition, if multiple directories are stored across multiple disks, a special feature in TPT called “UNION ALL” can be used to process these directories of files in parallel, thus achieving more throughput through scalability and parallelism across disks.

Teradata Parallel Transporter (TPT) is a flexible, high-performance Data Warehouse loading tool specifically optimized for Teradata Database, which enables data extraction, transformation and loading. It incorporates an infrastructure, which provides a parallel execution environment for product components called “operators”, which integrate with the infrastructure in a "plug-in" fashion and are thus interoperable.