yxu's picture
Company:
Country:
Personal Website:
Day Job:
blog icon Recent Articles

Hadoop MapReduce programmers often find that it is more convenient and productive to have direct access from their MapReduce programs to data stored in a RDBMS such as Teradata Enterprise Data Warehouse (EDW) because:

  1. There is no benefit to exporting relational data into a flat file.
  2. There is no need to upload the file into the Hadoop Distributed File System (HDFS).
  3. There is no need to change and rerun the scripts/commands in the first two steps when they need to use different tables/columns in their MapReduce programs.

Hadoop systems [1], sometimes called Map Reduce, can coexist with the Teradata Data Warehouse allowing each subsystem to be used for its core strength when solving business problems. Integrating the Teradata Database with Hadoop turns out to be straight forward using existing Teradata utilities and SQL capabilities. There are a few options for directly integrating data from a Hadoop Distributed File System (HDFS) with a Teradata Enterprise Data Warehouse (EDW), including using SQL and Fastload. This document focuses on using a Table Function UDF to both access and load HDFS data into the Teradata EDW. In our examples, there is historical data already in Teradata EDW, presumably derived from HDFS for trend analysis. We will show examples where the Table Function UDF approach is used to perform inserts or joins from HDFS with the data warehouse.

blog icon Recent Reference

yxu hasn't created any reference articles.