tdplyr 17.00.00.00
sto_sandbox_R3.5.1_sles12sp3.0.5.2_docker_image.1.0.0
vantage_model_cataloging 1.0.0
Other Releases
Version Released Download
16.20.00.06 20 Feb 2020
About this download

Teradata R Package Product Overview

Note: Teradata recommends to install tdplyr from https://github.com/Teradata/tdplyr.
Download from downloads.teradata.com location if your organization does not allow you to download directly from https://github.com/Teradata/tdplyr.

The Teradata® R Package product combines the benefits of open-source R language environment with the massive parallel processing capabilities of Teradata Vantage, which includes the Teradata Machine Learning Engine analytic functions and the Teradata Advanced SQL Engine in-database analytic functions. Teradata R Package allows users to develop and run R programs that take advantage of the Big Data and Machine Learning analytics capabilities of Teradata Vantage.

The Teradata R Package product is tdplyr, an R library package like other open-source R packages. The package interface makes available to R users a collection of functions for analytics that reside on Teradata Vantage, so that R users can perform analytics with no SQL coding required. Specifically, the tdplyr package provides functions for data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with open-source R capabilities. Moreover, the tdplyr package conforms and works with the verbs and functions of the dplyr and dbplyr packages.

The Teradata R package depends on rlang, dplyr, dbplyr, DBI, magrittr, jsonlite, purrr, bit64 (which are available from CRAN) and teradatasql (which is available from Teradata's Github repository and downloads.teradata.com) packages.


IMPORTANT INSTALLATION NOTICE:

The Teradata R package is incompatible with dbplyr 2.0.0 which has new updates that break tdplyr features. To use tdplyr, the version of dbplyr package must be 1.4.4. tdplyr should be installed using the following commands till a new version of tdplyr that is compatible with dbplyr 2.0.0 is released.

1. Ensure dbplyr 1.4.4 is installed.
There are various ways to install a specific version of a R package. Teradata recommends using the following command.
Note: To run below command, the R package 'remotes' should be present on the client machine.

> Rscript -e "remotes::install_version('dbplyr', version = '1.4.4', repos = 'https://cloud.r-project.org')"

2. Install tdplyr.
To download and install tdplyr and dependencies automatically, if minimum required version is not met, specify the Teradata R package repository and CRAN in the repos argument for install.packages.

> Rscript -e "install.packages('tdplyr',repos=c('https://teradata-download.s3.amazonaws.com','https://cloud.r-project.org'))"

The Teradata R Package works over connections to:

- Teradata Vantage with Advanced SQL Engine and ML Engine

- Teradata Vantage with Advanced SQL Engine only

Sandbox environment

tdplyr will provide a sandbox environment that can be used to run user scripts outside Vantage. User can test scripts in a Vantage-like environment before uploading them for execution in the target Advanced SQL Engine. The sandbox environment is based on a SLES12 SP3 docker image that has:

    1. R Interpreter (Version 3.5.1)
    2. Add on packages for R


User can choose to setup the docker environment and test R scripts by running them inside the docker container using the function td_test_script(), or user can directly execute scripts on Vantage. 
The size of docker image is around ~2.96GB. Due to the large image size, Teradata recommends downloading it beforehand, and saving it into a local folder.

Model Cataloging

Model Cataloging allows users to save the model related information in a way that it can be reused by the supported functions of the Machine Learning or Advanced SQL Engine via SQL, Teradata Python Package (teradataml) or Teradata R Package (tdplyr) client analytic libraries.
For example, an ML Engine DecisionForest (td_decision_forest_mle) model saved by using SQL can be retrieved for use with tdplyr for scoring with the DecisionForestPredict function from ML Engine (td_decision_forest_predict_mle) or Advanced SQL Engine (td_decision_forest_predict_sqle). Similarly, an ML Engine or Advanced SQL Engine model saved by using teradataml can be described and retrieved by tdplyr.


tdplyr offers functions to use the Model Catalog, allowing the users to:
•    Save a model and related information to the catalog;
•    List the saved models;
•    Describe a saved model;
•    Retrieve a saved model for reuse;
•    Publish a saved model to set its access level and status;
•    Delete a saved model.
In order for any user to use the model cataloging feature in tdplyr, setup must be performed by the administrator on the Vantage system.
The required scripts along with the instructions to set up the Vantage system are part of this bundle vantage_model_cataloging_1.0.0.tar.gz.

General product information is available in the Teradata Documentation Website.

Teradata R Package User Guide – B700-4005

Teradata R Function Reference – B700-4007

For Teradata customer support, please visit Teradata Access. For community support, please visit the Teradata Connectivity Forum.