extensibility

Content related to UDFs etc.
Java External Stored Procedure and UDF runtime library

Version: 1.0 - Created: 07 Jul 2008

This library is used when developing Java functions for execution in the Teradata database using an IDE like eclipse. It provides the interface descriptions for the database methods available to the functions. 

Teradata UDFs for popular Oracle functions

Version: - Created: 22 Jun 2008

A collection of UDFs that emulate the most-commonly used Oracle built-in SQL functions. README This zip file contains the C source and the installation and test scripts for the following User-Defined Functions: grepc (not an Oracle function) capitalize (not an Oracle function) add_months ascii ceil chr decode floor greatest grepc initcap instr last_day least length lpad ltrim months_between nvl replace round rpad rtrim sign to_char to_date to_number translate trunc While I have tested these as thoroughly as I can imagine how to do so, there is no test like the user's code. Since this is new code, you might expect to see problems which I have not anticipated. Test all these as well as you can before putting them into a production system. INSTALLATION: Unzip this file with "Restore Directory Structure" turned on. The source, compile script and test script for each function is contained in the directory for that function, in a file or files called (something).btq. Edit this file(s) and change $PGMPATH$ to the full path name of the directory where this code resides; then you can run it with bteq. I recommend you .SET WIDTH to 254. I also recommend you review the output carefully for any errors. It is probably best to install one function at a time in this manner. If you have perl installed on your PC, you can use ChangeAllPaths.bat to change all the $PGMPATH$ values in the bteq scripts. Otherwise, you can put all this on a UNIX system and change the .bat script to a shell script. Having done that, you can run CompileAll.btq (from bteq) to compile and test all the functions. These scripts are all set up to install the UDFs from a PC or other client to the Teradta server. If you are transferring the source files to the server and installing from there, be sure to change all the EXTERNAL NAME clauses of REPLACE FUNCTIONs from "CS" and "CI" to "SS" and "SI," respectively. Note that many of these UDFs use C ASCII string functions. If the user ID that installs the functions has a default character set of UNICODE, then the system will assume that the UDFs can process Unicode strings, and the UDFs will always see only the first character of each parameter. (The second byte will be 0x00, which looks like end-of-string in ASCII.) If Kanji support was installed on the Teradata system, then all users will have the UNICODE default character set. To be sure that the UDFs work as expected, make sure the installer ID is using LATIN: MODFIY USER <installer-ID> AS DEFAULT CHARACTER SET LATIN; The user <installer-ID> may be able to perform this operation itself, but it then needs to logon again before installing the UDFs. After they are installed, you can change the user back to UNICODE if you want: MODFIY USER <installer-ID> AS DEFAULT CHARACTER SET UNICODE; From then on, the UDFs will operate in Latin or ASCII mode no matter what the user's character set is. Do not alter the UDFs to non-protect mode until you have run the test scripts and some other samples that you make up yourself. You can use Alter.btq to change all the functions except decode to not-protected. (You'll have to make your own script for decode.) Once you are sure they are working, You may want to re-install the UDFs into the special SYSLIB database so they can be accessed by all users without having to qualify the function name by databasename. Please contact me with any questions, problems, or (especially) fixes. George Coleman +1 503 738 9050 george.coleman@teradata.com UPDATE June 11, 2008: -------------------- * Replace function rewritten by Nicholas Muller. This version throws an error if the result string will be longer than MAXIMUM_LENGTH. MAXIMUM_LENGTH is set to 16000 in the distribution, but you may want to make it shorter. You should not make it longer than 32000. It should match the length in the RETURNS clause of the REPLACE FUNCTION statement in replace.btq. If you use oreplace() to return a column, and there are other columns in the SELECT, and the RETURNS length is a large number like 32000, then you will get a "row size exceeded" error from Teradata. In this case, you must either reduce the maximum length or cast the result to something shorter. UPDATE May 14, 2008: ------------------- * Changed the way many functions handle null input. Most will now return NULL instead of throwing a SQL error. This should be closer to Oracle's behavior. * Added create_test_table.bteq and test.data. Some of the test scripts use the test table ("mytable"); these files will provide that. * Changed all EXTERNAL NAME clauses from Sx to Cx. * Updated copyright dates. Miscellaneous Functions Notes --------------- ---------------------------------------------------------------- grepc Usage: grepc(string1, string2) Returns: integer count of occurrences of string2 in string1. Satisfies RFC 52803. capitalize Capitalize(string) returns a string with the first word capitalized. This module is in the initcap directory. Oracle Functions Notes --------------- ---------------------------------------------------------------- add_months This works like the Oracle add_months function, such that if the starting date is at end-of-month, or the resulting date exceeds the end-of-month, then the resulting date is at end-of-month. Teradata already has an add_months(), so install this as oadd_months() or another likely name. ascii This rejects null input, but that is probably a bug. ceil Defined for FLOAT and DEC(18,5). If you need more than 5 decimal places, you must change this function. chr This rejects null input, but that is probably a bug. decode This one presented some real challenges. The Oracle decode() accepts up to 256 arguments. The only restrictions on data types are that the 2nd, 4th, 6th, ... arguments must be the same data type as the first, and that all the others must be the same data type, which need not be the same as the first. If we had UDFs that accepted all combinations of seven different data types with from 3 to 256 parameters, we would need ... well, a LOT of modules (10,668 I think). So I decided to let you decide how many parameters your users are going to want to use with decode, and I wrote a perl script (decode.pl) that generates combinations of CHAR, DEC(9) and INT, up to as many parameters as you specify. It will also generate a BTEQ script (decode.btq) to install them. To use it, move decode.pl to a machine that has Perl on it (e.g. MP-RAS) or install Perl on your PC (it's free). Then do: perl decode.pl N, where N is the number of parameters you think you'll need. 8 might be a good number. If someone tries to decode() with more arguments than that, or if they try to use a different data type, they'll get a "Function does not exist" error message. If you want to add more data types to this, feel free. But please forward your code to me so we can share it with everyone else. floor Defined for FLOAT and DEC(18,5). If you need more than 5 decimal places, you must change this function. greatest Defined for all pair combinations of CHAR, DEC(9) and INT. This Oracle function can actually take more than one argument, but I have not yet seen anyone using more than two. If we start having to support more data types and more arguments, we'll end up with the same problem we have with decode(). It might be easier to convince your user to use CASE operations instead. In fact, most uses of greatest() seem to be in combination with decode or sign, and these expressions can be done much more cheaply with CASE. Some users may be surprised to learn that Oracle now supports CASE too. initcap The initial version of this had some bugs, which should be fixed now. instr This rejects null input, but that is probably a bug. last_day (of the month.) This is defined for date, timestamp and timstamp-with-zone. least See the notes for greatest(). length Same as Teradata's char_length. Uses strlen(). lpad Offers functionality not available in Teradata SQL. ltrim Offers functionality not available in Teradata SQL. months_between Unlike "cast((date1-date2 month) as integer)", this returns a FLOAT value. Also, if the days in each date are the same or are both end-of-month, the result is a whole number. nvl Like COALESCE(). Defined for CHAR, INT, DEC(9) and FLOAT. replace Offers functionality not available in Teradata SQL. Install this as oreplace(), because we already have a REPLACE. The MAXIMUM_LENGTH defined in the .c files should match the RETURNS length in the replace.btq file. round Numbers: Defined for INTEGER, FLOAT and DEC(18,5). If you need more than 5 decimal places, you must change this function. Dates: Defined for DATE, TIMESTAMP and TIMESTAMP WITH TIME ZONE. rpad Offers functionality not available in Teradata SQL. rtrim Offers functionality not available in Teradata SQL. sign Defined for DEC(9), FLOAT and INTEGER. Most uses of this would be better rewritten as CASE operations. to_char Numbers: Defined for INTEGER, FLOAT and DEC(18,5). If you need more than 5 decimal places, you must change this function. There may be bugs in this, because I had a hard time understanding Oracle's format rules. Let me know if anyone complains, but get a good explanation of how this should work before you call me. Another note: the NLS_PARAM argument has not been implemented. Dates: Defined for DATE, TIMESTAMP and TIMESTAMP WITH TIME ZONE. In Oracle, you can Spell (using SP and optionally TH) any number that occurs in the date or timestamp, but I have only implemented this for Day Numbers (D, DD and DDD). If you want to add more spelling, feel free. I just got really tired of doing all that checking in the logic for SP and TH. to_date Defined for DATE, TIMESTAMP and TIMESTAMP WITH TIME ZONE. to_number This returns FLOAT, so when you use it you might want to recast the result as something more palatible, like INT or DEC(9,2). translate Offers functionality not available in Teradata SQL. Install this as otranslate(), because we already have a TRANSLATE. trunc Defined for FLOAT and DEC(18,5). If you need more than 5 decimal places, you must change this function. Directory / Function Program(s) Comments --------------- ----------------------- ---------------- add_months add_months_d.c Need to install this as "oadd_months" add_months_t.c or something similar, since we already add_months_tz.c have an "add_months" function. ascii ascii.c ceil ceilDec8.c Decimal ceilFloat.c Float ceilInt.c Integer chr chr.c decode decode.pl Use decode.pl perl script to generate decode.test C source modules and Bteq scripts. floor floorDec8.c Decimal floorFloat.c Float floorInt.c Integer greatest greatestcc.c character - character greatestcd.c character - decimal greatestci.c character - integer greatestdc.c decimal - character greatestdd.c decimal - decimal greatestdi.c decimal - integer greatestic.c integer - character greatestid.c integer - decimal greatestii.c integer - integer grepc grepc.c RFC 52803 (not an Oracle function) initcap capitalize.c (var. of initcap; not an Oracle fn.) initcap.c instr instr_1.c instr_2d.c decimal arg. instr_2i.c integer arg. instr_3dd.c decimal, decimal instr_3di.c decimal, integer instr_3id.c integer, decimal instr_3ii.c integer, integer last_day last_day_d.c date arg. last_day_t.c timestamp arg. last_day_tz.c timestamp-with-zone arg. least leastcc.c character - character leastcd.c character - decimal leastci.c character - integer leastdc.c decimal - character leastdd.c decimal - decimal leastdi.c decimal - integer leastic.c integer - character leastid.c integer - decimal leastii.c integer - integer length length.c lpad lpad_1.c integer length lpad_1d.c decimal length lpad_2.c integer length lpad_2d.c decimal length ltrim ltrim1.c ltrim2.c months_between months_between_dd.c date - date months_between_dt.c date - timestamp months_between_td.c timestamp - date months_between_tt.c timestamp - timestamp nvl nvlc.c char nvld.c decimal nvlf.c float nvli.c integer replace oreplace1.c "oreplace()" because Teradata oreplace2.c already has a "replace" round rounddate.h rounddate1.c date rounddate2.c RoundDec8.c decimal RoundDec81.c RoundFloat.c float RoundFloat1.c RoundInt.c integer RoundInt1.c roundts1.c timestamp roundts2.c roundtz1.c timestamp-with-zone roundtz2.c rpad rpad_1.c integer length rpad_1d.c decimal length rpad_2.c integer length rpad_2d.c decimal length rtrim rtrim1.c rtrim2.c sign signd.c decimal signf.c float signi.c integer to_char to_char_d1.c decimal to_char_d2.c to_char_date.h (header) to_char_dt1.c date to_char_dt2.c to_char_f1.c float to_char_f2.c to_char_i1.c integer to_char_i2.c to_char_num.h (header) to_char_ts1.c timestamp to_char_ts2.c to_char_tz1.c timestamp-with-zone to_char_tz2.c to_date to_date.h (header) to_date1.c 1 argument to_date2.c 2 arguments to_number to_number.h (header) to_number_1.c 1 argument to_number_2.c 2 arguments to_number_3.c 3 arguments translate otranslate.c "otranslate()" because Teradata already has a "translate" trunc truncdate.h (header) truncdate1.c date truncdate2.c truncDec8.c decimal truncDec81.c truncFloat.c float truncFloat1.c truncInt.c integer truncInt1.c truncts1.c timestamp truncts2.c trunctz1.c timestamp-with-zone trunctz2.c

Month-Year Difference Determination UDF

Version: - Created: 15 Jun 2008

These UDFs implement difference determination in months/years using smaller units of granularity, days/months, to accurately calculate the difference. According to the ANSI SQL standard, units smaller than the unit of the result are to be ignored in an SQL calculation, but this can lead to results not consistent with business practices (Refer to NTA 1553). This distribution contains full source code, REPLACE FUNCTION SQL, a test suite and documentation.

SHA-1 Message Digest UDF

Version: - Created: 12 Jun 2008

This UDF implements the Secure Hash Algorithm, SHA-1, for computing a condensed representation of a message. When a message of any length < 264 bits is input, the SHA-1 produces a 160-bit output called a message digest (FIPS 180-1 / RFC 3174). This distribution contains full source code, a test suite, scripts, and documentation. README SHA-1 UDF package for Teradata Version 1.1 (2008-05-08) Copyright (c) 2008 Teradata Corporation. All Rights Reserved. This package is provided "as is". No warranty is provided. 1. Summary This package includes a UDF which generates SHA-1 message digest. For SHA-1 itself, refer RFC1321[1]. 2. Syntax hash_sha1(varchar) This function takes one char/varchar argument of arbitary length and always returns 160-bit hash value as hexadecimal char(40) format. It does not accept NULL value. 3. Installation 1 unpack the archive and go to src directory 2 start bteq and login to teradata 3 .run file = hash_sha1.btq 4 if you would like to validate hash value generated by the UDF, run sha1test_cre.sql, sha1test_ins.sql and sha1test_chck.sql in the test directory. 4. History Removed malloc() function which may cause memory problem in UDF. Fixed compilation issues on Windows platform. 5. Caveat You should not use these function to encrypt password without 'salt'. 6. Author KURODA Akira <akira.kuroda@teradata.com> Teradata Japan A. Rererences [1] FIPS PUB 180-1, SECURE HASH STANDARD http://www.itl.nist.gov/fipspubs/fip180-1.htm B. Example hash_sha1('abc') -> A9993E364706816ABA3E25717850C26C9CD0D89D hash_sha1('abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq') -> 84983E441C3BD26EBAAE4AA1F95129E5E54670F1

MD5 Message Digest UDF

Version: - Created: 12 Jun 2008

This UDF implements the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input (RFC1321). This distribution contains full source code, a test suite, scripts, and documentation. README MD5 UDF package for Teradata Version 2.0 (2008-05-08) 1. Summary This package includes a UDF that generates MD5 message digest. For MD5 itself, refer RFC1321[1]. 2. Syntax hash_md5(varchar) This function takes one char/varchar argument of arbitary length and always returns 128-bit hash value as hexadecimal char(32) format. It does not accept NULL value. The name of the UDF is changed from there is an incompatibility against previous release. (see 4. History) 3. Installation 1 unpack the archive and go to src directory 2 start bteq and login to teradata 3 .run file = hash_md5.btq 4 if you would like to validate hash value generated by the UDF, run md5test_cre.sql, md5test_ins.sql and md5test_chck.sql in the test directory You can edit md5.h to customize hash generation. UDF_MD5_UPPERCASE: if set to 1 (default), generate MD5 result in upper case UDF_MD5_COMPAT: if set to 1 (not default), generate same result as previous release (see 4. History) 4. History Fixed MD5 algorithm that generates wrong result when the input length is 54 + 64*N bytes (N>= 0). Removed malloc() function which may cause memory problem in UDF. Fixed compilation issues on Windows platform. 5. Caveat You should not use these function to encrypt password without 'salt'. 6. Author KURODA Akira <akira.kuroda@teradata.com> Teradata Japan A. Rererences [1] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. B. Example hash_md5('') -> d41d8cd98f00b204e9800998ecf8427e hash_md5('a') -> 0cc175b9c0f1b6a831c399e269772661 hash_md5('abc') -> 900150983cd24fb0d6963f7d28e17f72 hash_md5('message digest') -> f96b697d7cb7938d525a2f31aaf161d0 hash_md5('message digest') -> f96b697d7cb7938d525a2f31aaf161d0 hash_md5('abcdefghijklmnopqrstuvwxyz') -> c3fcd3d76192e4007dfb496cca67e13b hash_md5('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789') -> d174ab98d277d9f5a5611c2c9f419d9f hash_md5('123456789012345678901234567890123456789012345678901234567890123456 78901234567890') -> 57edf4a22be3c955ac49da2e2107b67a