watzke's picture
Company:
Country:
Personal Website:
Day Job:
blog icon Recent Articles

 

Overview

This article provides an example of implementing a java table operator. The specific use case is support for the SHA-2 family of hash encodings, for more details on SHA-2 see https://en.wikipedia.org/wiki/SHA-2

Background on why a JAVA table operator was chosen as the implementation mechanism for SHA-2.

Overview

This article describes how to combine exploratory analytics and operational analytics within the Teradata Unified Data architecture (UDA). The UDA is a logical and physical architecture that adds a data lake platform to complement the Teradata Integrated Data Warehouse. In the Teradata advocated solution, the data lake platform can either be Hadoop or a Teradata Integrated Big Data Platform optimized for storage and processing of big data. Query Grid is an orchestration mechanism that supports seamless integration of multiple types of purpose built analytic engines within the UDA.

Introduction

The following article describes how to use Teradata DBQL tables to analyze the performance of Teradata to Hadoop (T2H) queries that transfer data from Hadoop to Teradata. This articles focuses on the steps that are part of the interaction between Teradata and Hadoop, it should be clear that query can have multiple other pre and post processing Teradata steps.

T2H has two processing methods

Background

This article is a follow on to article [1] which discussed implementing K-means using a Teradata release 14.10 table operator. The main contribution of this article is to discuss how to use the new Teradata 15.0 multiple input stream feature and a short discussion on a gcc compiler performance optimization.

This article describes how to use Teradata query grid to execute a Mahout machine learning algorithm on a Hadoop cluster based on data sourced from the Teradata Integrated Data Warehouse. Specifically the Mahout K-means cluster analysis algorithm is demonstrated.  K-means is a computationally expensive algorithm that under certain conditions is advantageous to execute on the Hadoop cluster. Query Grid is an enabling technology for the Teradata Unified Data Architecture (UDA).

In a prior article [1] we described how to use the Teradata 14.10 CalcMatrix operator and R to perform a multiple variable linear regression analysis. This article extends that concept with a comprehensive in database solution by introducing a new in database table operator named “CM_Solve”. This approach has value in cases when you want to solve a large number of independent systems of equations or you simply do NOT want to use the R client for solving the system of equations based on the SSCP matrix.

Linear Regression

In statistics, linear regression is an approach to model the relationship between a scalar dependent variable y and one or more independent variables denoted x. Linear regression is one of the oldest and most fundamental types of analysis in statistics. The British scientist Sir Francis Galton originally developed it in the latter part of the 19th century. The term "regression" derives from the nature of his original study in which he found that the children of both tall and short parents tend to "revert" or "regress" toward average heights.

Table Operators

This article discusses how to implement a Teradata 14.10 table operator using K-means clustering as an example use case. 

blog icon Recent Reference

watzke hasn't created any reference articles.