Skip to main content

Teradata Package for R - tdplyr

Teradata R Package Product Overview

Note: Teradata recommends to install tdplyr from https://github.com/Teradata/tdplyr.
Download from downloads.teradata.com location if your organization does not allow you to download directly from https://github.com/Teradata/tdplyr.

The Teradata® R Package product combines the benefits of open-source R language environment with the massive parallel processing capabilities of Teradata Vantage, which includes the Teradata Machine Learning Engine analytic functions and the Teradata Advanced SQL Engine in-database analytic functions. Teradata R Package allows users to develop and run R programs that take advantage of the Big Data and Machine Learning analytics capabilities of Teradata Vantage.

The Teradata R Package product is tdplyr, an R library package like other open-source R packages. The package interface makes available to R users a collection of functions for analytics that reside on Teradata Vantage, so that R users can perform analytics with no SQL coding required. Specifically, the tdplyr package provides functions for data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with open-source R capabilities. Moreover, the tdplyr package conforms and works with the verbs and functions of the dplyr and dbplyr packages.

The Teradata R package depends on rlang, dplyr, dbplyr, DBI, magrittr, jsonlite, purrr, bit64 (which are available from CRAN) and teradatasql (which is available from Teradata's Github repository and downloads.teradata.com) packages.


IMPORTANT INSTALLATION NOTICE:

The Teradata R package is incompatible with dbplyr 2.0.0 which has new updates that break tdplyr features. To use tdplyr, the version of dbplyr package must be 1.4.4. tdplyr should be installed using the following commands till a new version of tdplyr that is compatible with dbplyr 2.0.0 is released.


To download and install tdplyr and dependencies automatically, if minimum required version is not met, specify the Teradata R package repository and CRAN in the repos argument for install.packages.

> Rscript -e "install.packages('tdplyr',repos=c('https://r-repo.teradata.com','https://cloud.r-project.org'))"

The Teradata R Package works over connections to:

- Teradata Vantage with Advanced SQL Engine and ML Engine

- Teradata Vantage with Advanced SQL Engine only

Sandbox environment

tdplyr will provide a sandbox environment that can be used to run user scripts outside Vantage. User can test scripts in a Vantage-like environment before uploading them for execution in the target Advanced SQL Engine. The sandbox environment is based on a SLES12 SP3 docker image that contains an R distribution (interpreter and add-on libraries) based on the latest Teradata In-nodes R release for SLES12-SP3:

  • R interpreter (Version 3.6.3)
  • Add-on libraries for R

User can choose to setup the docker environment and test R scripts by running them inside the docker container, or user can directly execute scripts on Vantage. 
The sandbox environment docker image name is "rstosandbox:1.0".

The docker image size is about 4.5 GB. Due to the large size, Teradata recommends downloading it beforehand, and saving it into a local folder.

 

Model Cataloging

Model Cataloging allows users to save the model related information in a way that it can be reused by the supported functions of the Machine Learning or Advanced SQL Engine via SQL, Teradata Python Package (teradataml) or Teradata R Package (tdplyr) client analytic libraries.
For example, an ML Engine DecisionForest (td_decision_forest_mle) model saved by using SQL can be retrieved for use with tdplyr for scoring with the DecisionForestPredict function from ML Engine (td_decision_forest_predict_mle) or Advanced SQL Engine (td_decision_forest_predict_sqle). Similarly, an ML Engine or Advanced SQL Engine model saved by using teradataml can be described and retrieved by tdplyr.


tdplyr offers functions to use the Model Catalog, allowing the users to:
•    Save a model and related information to the catalog;
•    List the saved models;
•    Describe a saved model;
•    Retrieve a saved model for reuse;
•    Publish a saved model to set its access level and status;
•    Delete a saved model.
In order for any user to use the model cataloging feature in tdplyr, setup must be performed by the administrator on the Vantage system.
The required scripts along with the instructions to set up the Vantage system are part of this bundle vantage_model_cataloging_1.0.0.tar.gz.

General product information is available in the Teradata Documentation Website.

Teradata R Package User Guide – B700-4005

Teradata R Function Reference – B700-4007

For Teradata customer support, please visit Teradata Access. For community support, please visit the Teradata Connectivity Forum.

Not Applicable
OS version

Details

  • Version
  • Released
  • TTU
  • OS
  • Teradata

Teradata Package for R - tdplyr