Downloads

BYOM Extended UniverseThis package contains a collection of Teradata functions complementary to Teradata BYOM (Bring Your Own Model). These functions do not replace BYOM, instead, they help you work more efficiently with Small Language Models directly in the database. 🚀 Functions Included 1. ArgMaxA table operator that extracts the index and value of the largest element from a vector embedded in table columns.Use case: Get the predicted class and confidence score from classification models outputting a probability vector.Inputs:Table with vector columns named like emb_0, emb_1, ..., all of type FLOAT.Outputs:All input columns, plus:arg_max_index: Index of the highest value in the vector.arg_max_value: The corresponding value.Parameters:VectorColumnsPrefix (STRING): Prefix for vector columns (e.g., 'emb_').VectorColumnsNumber (INTEGER): Number of vector columns.Example:SELECT * FROM byom_extended_universe.ArgMax( ON sasha.complaints_sentiment USING VectorColumnsPrefix('emb_'), VectorColumnsNumber(2)) AS a; 2. SoftMaxTransforms a vector of raw scores into a probability distribution (values sum to 1). Useful for making classification outputs more interpretable.Inputs:Table with raw prediction vector columns.Outputs:All original columns, with vector columns replaced by their SoftMax-transformed equivalents.Parameters:VectorColumnsPrefix (STRING): Prefix for vector columns.VectorColumnsNumber (INTEGER): Number of vector columns.Example:SELECT * FROM byom_extended_universe.SoftMax( ON sasha.complaints_sentiment USING VectorColumnsPrefix('emb_'), VectorColumnsNumber(2)) AS a; 3. LengthInTokensA table operator that calculates the length in tokens for a text field. Especially handy for LLM/SLM input prep.Inputs:Data table with a txt column (text to process).Tokenizer table: One row, one column "tokenizer" of type BLOB, using the DIMENSION keyword. The BLOB should contain the contents of the tokenizer.json file from the desired model. This is the same format as in BYOM’s ONNXEmbeddings.Outputs:All input columns.Additional column: length_in_tokens (INTEGER) — number of tokens generated using the provided tokenizer on the txt field.Parameters: (none)Example:SELECT * FROM byom_extended_universe.LengthInTokens ( ON (SELECT id, txt FROM complaints.complaints_clean) ON (SELECT tokenizer FROM embeddings_tokenizers WHERE model_id = 'bge-small-en-v1.5') DIMENSION) a; 4. ChunkTextText chunking is critical for working with language models—breaking large texts into model-friendly pieces. ChunkText is a tokenizer-aware chunker: splits text so no chunk exceeds a specified token limit, using your provided tokenizer.Inputs:Data table with a txt column (VARCHAR or CLOB; text to chunk). Use CLOB type to process bigger texts.Tokenizer table: One row, one column "tokenizer" of type BLOB, using the DIMENSION keyword. The BLOB should contain the contents of the tokenizer.json file from the desired model, as used in BYOM’s ONNXEmbeddings.Outputs:All input columns except txt.chunk_number (INTEGER): chunk index, starts at 0.txt (VARCHAR): text of the chunk, always Unicode.chunk_length_in_tokens (INTEGER): length of the chunk in tokens.Parameters:MaxTokensType: INTEGERDefault: (req)Description: Maximum tokens per chunk (must be > 2).MaxOverlapWordsType: INTEGERDefault: 0Description: Words to carry over from previous chunk (semantic overlap; 0 = no overlap).FirstNChunksType: INTEGERDefault: 0Description: Output only the first N chunks (0 = all chunks).OutputVarcharLengthType: INTEGERDefault: 6000Description: Length of output txt VARCHAR (1–32000 allowed).SplittingStrategyType: STRINGDefault: 'WORDS'Description: How to split the text (see below for strategies).Splitting Strategy (parameter SplittingStrategy):WORDS (default): Splits the text by words. Chunks are created by grouping words so the total token count stays under MaxTokens.SENTENCES: Splits by sentences using language-appropriate sentence boundaries. Chunks consist of full sentences, grouped as long as the token limit allows.PARAGRAPHS: Splits by paragraphs, each chunk aiming to be one or more full paragraphs under the token limit.Fallback logic: If a text unit (paragraph, sentence, or word) exceeds the MaxTokens limit, the function automatically falls back to a finer splitting strategy (e.g., from PARAGRAPH → SENTENCE → WORD) for that chunk.If a single word is still too long, it is placed in its own chunk—even if it exceeds MaxTokens.Chunks may overlap by up to MaxOverlapWords to preserve semantic context (especially useful for RAG or embeddings).Internally, splitting uses regular expression rules for each unit type.Example:SELECT * FROMbyom_extended_universe.ChunkText( ON (SELECT id, txt FROM complaints.complaints_clean) ON (SELECT tokenizer FROM embeddings_tokenizers WHERE model_id = 'bge-small-en-v1.5') DIMENSION USING MaxTokens(25) MaxOverlapWords(3) FirstNChunks(1) SplittingStrategy('WORDS') OutputVarcharLength(32000)) a; 5. NerReplace NerReplace is a production-ready tool for finding and replacing sensitive information (PII) directly inside Teradata:Remove PII before sharing data—replace names, addresses, account numbers, and more without moving data outside the database.Stay compliant and secure—all processing happens in-database, minimizing the risk of exposing private information.Seamless integration—makes it safe to use third-party analytics or machine learning tools without compromising privacy.Customizable, auditable, and fast—NerReplace brings privacy-first data workflows directly to your Teradata environment.This is a Table Operator. This function is supposed to operate on an output of the ONNXEmbeddings function.Inputs:Table with input data. Required columns are:txt - the input text (could be VARCHAR or CLOB)logits - the output of model executed with ONNXEmbeddings (BLOB or VARBYTE)and any other columnsOne-line, one-column table with tokenizer.Should be one record table and one column named tokenizer.This column should be BLOB datatype.This should be the contents of tokenizer.json from HuggingFace for the model used in ONNXEmbeddings.The same as third table input in ONNXEmbeddings.Should go with the DIMENSION keyword.One-line, one-column table with model config.Should be one record table and one column named config.This column should be BLOB datatype.This should be the contents of config.json from HuggingFace for the model used in ONNXEmbeddings.Should go with the DIMENSION keyword.Outputs:All the columns from input table. In column txt, the original text is replaced with processed text.Column logits is copied only if KeepLogits parameter is true.Column replaced_entities - the column of datatype VARCHAR with entities details. Only appears if OutputDetails is 'true'.Parameters:OutputDetailsRequired: OptionalType/Allowed Values: "true" / "false"Default: falseDescription: If "true", adds a column with details (JSON) about each replaced entity: begin, end, score, text, and label.Example: {"begin":109,"end":130,"text":"Quantum Analytics LLC","entity":"COMPANYNAME","score":0.9395}EntitiesToReplaceRequired: OptionalType/Allowed Values: List of entity labels (e.g. 'EMAIL', 'SSN', …)Default: allDescription: Restrict replacements to these entity labels. With NONE aggregation, BIO prefixes are present (e.g. B-EMAIL). Raises error if label doesn’t exist for model. Column name: replaced_entities.AggregationStrategyRequired: OptionalType/Allowed Values: "NONE", "SIMPLE", "AVERAGE", "FIRST", "MAX"Default: SIMPLEDescription: How tokens are grouped into entities:NONE: per-tokenSIMPLE: group by wordFIRST: use first tokenAVERAGE: average logitsMAX: max logit(AVERAGE, FIRST, MAX require a word-aware tokenizer.)ReplaceWithEntityNameRequired: OptionalType/Allowed Values: "true" / "false"Default: trueDescription: If "true", replaces entities with their label (e.g. 'EMAIL'). If "false", uses ReplacementText parameter.ReplacementTextRequired: OptionalType/Allowed Values: Any stringDefault: (none)Description: Text to replace entities with (e.g. "[REDACTED]"). Must set ReplaceWithEntityName='false' to use.KeepLogitsRequired: OptionalType/Allowed Values: "true" / "false"Default: falseDescription: If "true", copies the logits column to output.ReplacedEntiyInfoLengthRequired: OptionalType/Allowed Values: Integer (1–32000)Default: 6000Description: Length limit (characters) for the replaced_entities column (JSON). Only applies if OutputDetails='true'.Aggregation Strategies (parameter AggregationStrategy):NONE: No grouping—each token is an entity.SIMPLE: Groups tokens into words using tokenizer's word-ids, assigns entity by max score.FIRST: Entity label from the first token of each word. (Requires word-aware tokenizer)AVERAGE: Entity label by averaging logits for all tokens in a word, then softmax. (Requires word-aware tokenizer)MAX: Entity label by maximum logit over all tokens in a word. (Requires word-aware tokenizer)Example:SELECT *FROM byom_extended_universe.NerReplace( ON (SELECT id, txt AS orig_txt, txt, logits FROM sasha.ner_input_distilbert_finetuned_ai4privacy_v2) ON (SELECT model AS tokenizer FROM sasha.ner_tokenizers WHERE model_id = 'distilbert_finetuned_ai4privacy_v2') DIMENSION ON (SELECT model AS config FROM sasha.ner_model_configurations WHERE model_id = 'distilbert_finetuned_ai4privacy_v2') DIMENSION USING AggregationStrategy('AVERAGE') OutputDetails('True') EntitiesToReplace('FIRSTNAME', 'LASTNAME', 'SSN', 'DOB')) a; 🛠️ InstallationPre-requisite: BYOM v.6.0 or newer must be installed on your Teradata system. Installation steps:Create the database and grant permissions:CREATE DATABASE byom_extended_universe AS PERM = <50000000 * NUMBER OF AMPs IN A SYSTEM>;GRANT CREATE EXTERNAL PROCEDURE ON byom_extended_universe TO dbc;GRANT CREATE FUNCTION ON byom_extended_universe TO dbc;Switch to the database and install the JAR file:DATABASE byom_extended_universe; CALL SQLJ.INSTALL_JAR('cj!<PATH TO JAR FILE>', 'BYOM_EU', 0);Create the functions:REPLACE FUNCTION byom_extended_universe.ArgMax() RETURNS TABLE VARYING USING FUNCTION ArgMax_contract LANGUAGE JAVA NO SQL PARAMETER STYLE SQLTable EXTERNAL NAME 'BYOM_EU:com.teradata.byom.extended.universe.vector.ops.ArgMax.execute()';REPLACE FUNCTION byom_extended_universe.SoftMax() RETURNS TABLE VARYING USING FUNCTION SoftMax_contract LANGUAGE JAVA NO SQL PARAMETER STYLE SQLTable EXTERNAL NAME 'BYOM_EU:com.teradata.byom.extended.universe.vector.ops.SoftMax.execute()';REPLACE FUNCTION byom_extended_universe.LengthInTokens() RETURNS TABLE VARYING USING FUNCTION LengthInTokens_contract LANGUAGE JAVA NO SQL PARAMETER STYLE SQLTable EXTERNAL NAME 'BYOM_EU:com.teradata.byom.extended.universe.nlp.utils.LengthInTokens.execute()';REPLACE FUNCTION byom_extended_universe.ChunkText() RETURNS TABLE VARYING USING FUNCTION ChunkerTO_contract LANGUAGE JAVA NO SQL PARAMETER STYLE SQLTable EXTERNAL NAME 'BYOM_EU:com.teradata.byom.extended.universe.chunking.ChunkerTO.execute()';REPLACE FUNCTION byom_extended_universe.NerReplace()RETURNS TABLE VARYING USING FUNCTION ReplaceNerTO_contractLANGUAGE JAVANO SQLPARAMETER STYLE SQLTableEXTERNAL NAME 'BYOM_EU:com.teradata.byom.extended.universe.ner.ReplaceNerTO.execute()'; Grant execution rights:GRANT EXECUTE FUNCTION ON byom_extended_universe TO <DESIRED USER/ROLE>;

Teradata package for Langchain

Version: 20.0.0.0 - Created: 10 Sep 2025

langchain-teradata is a Teradata package for Langchain that provides users with access to Teradata's Vector Store capabilities.For community support, please visit the Teradata Community.For Teradata customer support, please visit Teradata Support.Copyright 2025, Teradata. All Rights Reserved.

.NET Data Provider for Teradata

Version: 20.00.07.00 - Created: 04 Sep 2025

The .NET Data Provider for Teradata is an implementation of the Microsoft ADO.NET specification. It provides direct access to the Teradata Database and integrates with the DataSet. .NET Applications use the .NET Data Provider for Teradata to load data into the Teradata Database or retrieve data from the Teradata Database. For Visual Studio 2017 and newer, you will need to download the Integrated Help package and/or the appropriate VS Integration package if you wish to use these features. For the VS Integration features, simply download the file and execute it. Microsoft Edge changes the file extension from VSIX to ZIP. You must rename the file back to VSIX to execute it. VSIXInstaller.exe is part of Visual Studio 2017 and VSIX extension should already be associated with VSIXInstaller.exe. For Integrated Help, unzip the file to a temporary directory and then use the Help, Add and Remove Content menu to install the help. Use the Browse [...] button near the bottom of the dialog to select the extracted helpcontentsetup.msha file, then click Update. PLEASE NOTE that the VS Integration package is self-contained and does not require the .NET Data Provider for Teradata to be installed. However, a runtime dependency conflict may arise if your project depends on both, VS Integration and .NET Data Provider for Teradata. To avoid such conflict, the .NET Data Provider for Teradata must be the same or greater version than the VS Integration package. For older versions of Visual Studio, the main Windows installation package (version 16.10 and older) can optionally install the integrated help and Visual Studio integration features. The Teradata Developer Tools for Visual Studio is available from the Visual Studio Marketplace for Visual Studio 2015-2019 and Visual Studio 2022. The release contains a query tool that enables queries to be composed and executed against a Teradata Database. Queries are composed using a custom editor window with intellisense capabilities. Separate windows are used to display results and history. The .NET Data Provider for Teradata is also available as a NuGet package at https://www.nuget.org/packages/Teradata.Client.Provider/. The Entity Framework Core Provider is available as a NuGet package at https://www.nuget.org/packages/Teradata.EntityFrameworkCore/. For community support, please visit the Connectivity forum.

Teradata Package for Generative AI

Version: 20.00.00.03 - Created: 17 Mar 2025

Teradata Package for Generative AI OverviewNote: Teradata recommends pip install from https://pypi.org/project/teradatagenai .Download from downloads.teradata.com location if your organization does not allow you to install directly from https://pypi.org/project/teradatagenai .teradatagenai is a Generative AI package developed by Teradata. It offers a comprehensive suite of APIs designed for a wide range of text analytics applications and seamless access to the Enterprise Vector Store. With teradatagenai, users can seamlessly process and analyze text data from various sources, including emails, academic papers, social media posts, and product reviews. This enables users to gain insights with precision and depth that rival or surpass human analysis.General product information is available in the Teradata Documentation Website.For community support, please visit the Teradata Community.For Teradata customer support, please visit Teradata Support.Copyright 2025, Teradata. All Rights Reserved.

TdBench 8.01 For Any DBMS

Version: tdbench-8.01.04 - Created: 14 Oct 2024

TdBench is a tool designed to simulate realistic database workloads for applications and platforms.This tool can be used with any DBMS supporting JDBC to:Measure performance before vs after a change to add indexes, partitioning, compression, etcMeasure the impact to your DBMS of changes to settings, a patch, or a new software releaseSimulate a workload for a new application or a proof of conceptCompare the performance of one platform to anotherCompare performance of different data base vendor’s productsGetting Software and Help for TdBench:You can download the latest package of the TdBench jar file and setup information with scripts for Teradata and non-Teradata DBMSs from this page and unzip it in a directory on a server of PC with connectivity to your DBMSTerdata's JDBC driver is included. Search the web for other vendor's JDBC drivers and save them on your server or PCAdditional DBMS setup scripts and information may be found at https://github.com/Teradata/tdbench. You can submit issues, questions or contribute DBMS setup information at https://github.com/Teradata/tdbench/discussions. Manuals, white papers and videos are reference at the bottom of this page. What does TdBench do?TdBench simulates realistic production systems by allowing definition of the different types of work and adjusting the number of concurrent executions for each type of work;It captures the results each query execution in its internal database.It facilitates analysis host DBMS resource consumption by maintaining test metadata on the host DBMS to join with its query logs.Tests are defined with:queues of SQL queries and scripts or OS commandsvariable number of execution threads (workers) per queuecommands to pace queries by time or percentageparameterized queries to simulate different usersoptional query prepare reducing DBMS parsingscheduled start of processes or individual queriesFixed work or fixed period execution modelsScripting language to automate multiple testsTests can be defined as simply as 4 statements. Analysis capabilities have been used to track individual query performance over hundreds of runs during projects with constraints like:WHERE RunID in (79, 81, 105)Example: Basic test of all queries in 1 worker session:define serial Test of queries executed seriallyqueue thequeries scripts/queries/*.sqlworker thequeries mydbmsrunExample: Fixed period test of 10 minutes with 2 queues and a total of 5 worker sessions:define workload5 Test of 1 heavy and 4 reporting worker sessionsqueue hvy scripts/queries/hvy*.sqlworker hvy mydbms 1queue rpt scripts/queries/rpt*.sql;worker rpt mydbms 4run 10mThere are nearly 60 commands for defining and scripting multiple tests. You could use:the PACE command with an interval reference command to control arrival of queries on a queue, orPACE with a percentage to limit the percentage of total queries executed from one queue, orAT command to schedule events, or QUERY LIST to replay query starting as the executed in production There are built-in variables, user variables, IF and GOTO statements.There are 69 built-in help files and a TdBench 8.01 User Guide to help you get started.TdBench Documentation:TdBench 8.01 User GuideTdbench 8.01 Tri-Fold Command ReferenceWhite Papers:Essential Guide to Benchmarks for DBAs1-Page Essential Guide to Benchmarks for ExecutivesBenchmark DeceptionBenchmark Deception And How to Avoid Benchmark TricksVideos:TdBench Overview - Why it was created and what it does (0:10:09)TdBench Command Language - Demonstration of use (0:14:19)Design of a Good Benchmark - Training session on constructing a benchmark that models realistic database workloads (0:41:33)

Browse by category

General

This is a catch-all content vertical, for content that does not belong to another vertical.

Database

Content related to the Teradata Database.

Aster

Content related to the Teradata Aster technologies.

Extensibility

Content related to UDFs etc.

Connectivity

Covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC, ODBC, etc..

Applications

Tools

Channel for Teradata tools, e.g. Teradata IDE, Load/Unload, etc.

Viewpoint

Content related to the Teradata Viewpoint platform.