AttachmentSize
Package icon LOBLoad.zip11.03 KB
Package icon Examples.zip48.04 KB

In part 1, we will look at getting those Large Objects of yours into the Teradata. One way or another we will get them into Teradata, kicking and screaming if need be (however, it will be relatively painless, I promise) and we will do this despite any "Objections" "Large" may have.

Later in part 2, we will drag those Large Objects back out and use them in a web application.

What is a Large Object?


As the name implies Large Objects allow you to store large things in a database. They come in two flavours (sadly, neither Chocolate nor Vanilla):

  • Character Large Object or CLOB – used to store character data (e.g. text documents including plain text, XML, HTML, Rich Text Format etc) 
  • Binary Large Object or BLOB – used to store binary data such as images, sounds and structured binary documents (e.g. pdf, Word, Excel etc).

These sounds very similar to existing data types (char/varchar and byte/varbyte), so why do we need them? Well there are a couple of differences between large Objects and the existing data types.

Large Objects and the “equivalent” data types

The first and most obvious difference is that the maximum size of Large Objects is significantly larger than their “equivalent” data types.
The maximum size of char, varchar, byte and varbyte (on Teradata) is around 64K bytes for character fields this translates to around 64,000 characters for single byte character encoding schemes (e.g. ASCII) or around 30,000 characters for multi-byte character encoding schemes (e.g. Unicode). Large Objects can be up to 2 Giga Bytes (GB) in size.

Storing Large Objects

A second difference is how Large Objects are stored within the database.

Astute readers, will note that the “maximum” row size (i.e. the maximum size of all columns in a single table) is around 64Kbytes; the actual value also depends on the database version. This presents us with a bit of a conundrum. If the maximum row size is 64KB, then how can we store (multiple) objects up to 2GB in it?

The database internals folks tell us that Large Objects are not actually stored in the “row”. Instead they are stored in a sub-table.

Table and sub table containing extended data

In simple terms, a sub-table is a set of data that augments one or more tables in some way. Although we might not think of it as such, the most common example of a sub-table is an index (other than the primary index). Thus common examples of sub-tables include secondary indexes and join indexes.

Management of the sub-table is handled automatically by Teradata. In most, if not all, circumstances, this “separation of data” within the row is completely transparent and doesn’t affect anything that you would be doing.

While splitting the row up into “basic attributes” (e.g. name, id, etc) and Large Objects might sound like a bad idea it actually is a good idea. For example most queries are based upon the basic data stored in the numeric and character (i.e. varchar) attributes and not the content of the large objects. If you are performing an “all rows scan” for example. Then splitting the table (rows) like this will make the query run faster if it contains Large Objects.

I do not wish to get into an internals discussion (that may be the subject of a future article), but consider this query:
 

select count(*)
from customer
where gender = 'F';


When running the query Teradata will need to scan the table to satisfy the "gender = 'F'" condition. This means reading the entire row (or block of data) from disk into memory and checking the gender column for the value 'F' and counting it if the condition is satisfied.

Getting down to tin tacks, the performance of the query is directly related to the size of the row being read from disk. It is not possible for any database to just read the “gender” column – the entire row must be read. The gender column is extracted from the row read. In fact databases don’t even read rows – they read blocks which can contain multiple rows.

If the row is large, the query will take a little longer than if the row was smaller. Why? Because more rows fit into a block of data if they are smaller and hence more rows are read per block if they are smaller. Hint: there is a performance tip hidden here for those of you building online “tactical” applications that need to run all rows scan queries (e.g. a search by name or partial name).

If the above customer table had several large objects (e.g. a photo, digital signature etc) and these were stored in the row (as opposed to being stored in the sub-table) then this would make the row larger and queries such as the one above will take longer to run because more I/O is required. By splitting this “unnecessary” data out of the primary table and placing it in the sub table, performance can be improved.

Having said all that, let me add that common sense must prevail. I am absolutely not advocating that anyone should go out and remove all the less frequently used attributes from your entities, encode them into a BLOB or CLOB and store them in a sub table on the basis that you are unlikely to filter your results based upon those less commonly used values. All you will achieve is to make life difficult – but if masochism is one of your guiding principles …

Loading Large Objects

Resistance is futile; whatever “Large’s” “Objection” is, he (or she) will be loaded into the database whether he/she likes it or not, kicking and screaming if need be (although it will be relatively painless I promise).

So what are our options?

Options

There are three main options to get large objects into the database. These are:

  • Use a Teradata utility such as bteq or SQL Assistant (a.k.a. Queryman),
  • Use a third party utility that has good Large Object support,
  • Write your own program using CLI, JDBC or ODBC.

I will briefly cover the Teradata Utility option and present a Java (i.e. JDBC) application to load data. The code for the application is attached.

The table

In all of the examples, the following table will be used:

create table devX_lob (
    id varchar (80),
    mime_type varchar (80),
    char_lob character large object,
    binary_lob binary large object
)
unique primary index (id);

 

Both types of Large Object are present in the table, but the examples predominantly use the Binary Large Object. This is because Binary Large Objects are typically cooler than Character Large Objects and that’s what this article is all about – being cool and impressing your friends (or not)!

The Primary Key for this table is the id which is varchar. The reason for this will become apparent in part 2 of this article, but for now let’s just say it contains the name of the file representing the large object. Finally, there is the mime_type this also becomes important in part 2 where we output the Large Objects in a web application.

Load LOBs with a Teradata Utility


bteq

Until recently bteq did not really support Large Objects. However the more recent versions of bteq (12.00.00.02 or later) provide support for large objects.


The following example shows how a control file (lobjobbteq.txt) can be used to control the loading of large objects.

 

delete from devX_lob all;
.import vartext ',' LOBCOLS=1 file='/temp/lob/lobjobbteq.txt';

.repeat *
using (lobFile blob as deferred, id varchar(40), lobType varchar(1), mimeType varchar(40))
insert into devX_lob (id, mime_type, binary_lob)
values (:id, :mimeType, :lobFile);

 

The content of the control file is this:

/temp/lob/thumbs down.gif,thumbsdown.gif,B,image/gif
/temp/lob/thumbs up.gif,thumbsup.gif,B,image/gif
/temp/lob/crying.gif,crying.gif,B,image/gif
/temp/lob/small.pdf,small.pdf,B,application/pdf


The Examples.zip attachment contains both of the above files together with the images and pdf.

Once the bteq script has been run, there are four records in the table as can be seen in the result set window.

Since this example does not use CLOB’s the char_lob column is always null.

The LOB data is identified by the file names. These must be located in the first columns of the control file.

 

TPT

Support for large objects can also be found in TPT. Information about this can be found here.

 

SQL Assistant (QueryMan)


The tool formerly known as QueryMan has featured basic Large Object support for some time. You can import Large Objects using SQL Assistant by setting up an import job. An import job consists of the same elements as the bteq import, but is a little simpler.

First you need a parameterised query:

insert into devx_lob (id, mime_type, binary_lob)
values (?, ?, ?B);


The question mark fields in the “values” clause correspond to the columns in your control file. The “question mark B” column tells SQL Assistant that the third column is a BLOB and that the control file lists the file name of the blob.

The control file looks like this:

thumbsdown.gif,image/gif,thumbs down.gif
thumbsup.gif,image/gif,thumbs up.gif
crying.gif,image/gif,crying.gif
small.pdf,application/pdf,small.pdf


Finally you set SQL Assistant into “Import mode” and run the query. When you do, SQL Assistant will prompt you for the location of the control file. Notice that the file names (the 3rd column) do not have paths. SQL Assistant assumes that the files representing the LOB data are in the same directory as the control file. If you attempt to use path names to locate files in other directories SQL Assistant will abort the import and display an error in the status bar.

The Examples.zip attachment contains the control script together with the images and pdf. You will need to copy and paste the query from the article.

When I tried this, SQL Assistant successfully imported the 3 graphics files, but for an unknown reason, it could not import the PDF. The error reported was “A data file specified on record 4 is missing or too large”. The PDF was only 51.5K which is not too large and it definitely existed in the same directory as the 3 GIF files. The other two methods presented here all successfully imported the 4 files. My suspicion is that this is an issue in my environment, not an issue with SQL Assistant, ODBC or the import function.

Update: Refer to the comments below for the solution to the above problem.


Write your own


At first thought writing your own utility to import Large Objects may seem like a bit of overkill. However, if you are pre-bteq 12.00.00.02 and encounter a similar problem to the one I did with SQL Assistant then you may have to resort to this approach.

Fortunately, writing a program to import Large Objects into the database isn’t that hard. And once you have the basic program, you can modify it to work anyway you want it to.

Your choices are to write the program using Java (and JDBC) or one of the many languages that use ODBC based database connectivity (e.g. C, C++, Visual Basic etcetera). In this article we will use Java and JDBC.


A JDBC Lob Loader


In the attachment (LOBLoad.zip), you will find a complete NetBeans project that contains the source code for LobLoad. The program is relatively simple, so simple that the entire project consists of just one class (LobLoad.java).


LobLoad is a Java program that, curiously enough, reads a control file and loads Large Objects into the database using JDBC. The format of the control file is:

  • The id of the large Object (id column)
  • The type of large Object (C or B meaning character or binary). This determines which column in the above table is being loaded.
  • The MIME type which is used in part 2
  • The name of the file containing the Large Object.


The sample control file looks like this:

thumbsdown.gif,B,image/gif,/temp/lob/thumbs down.gif
thumbsup.gif,B,image/gif,/temp/lob/thumbs up.gif
crying.gif,B,image/gif,/temp/lob/crying.gif
lobjob.btq,C,text/plain,/temp/lob/lobjob.btq
small.pdf,B,application/pdf,/temp/lob/small.pdf


Most of the program relates to establishing the database session and processing the control file.


The part relating to the query that actually loads the Large Objects is as follows:
 

...
sqlText = "insert into devX_lob (id, mime_type, char_lob, binary_lob)\n" +
          "values (?, ?, ?, ?);";
PreparedStatement lobStmt = connection.prepareStatement(sqlText);
...

The above establishes the query we will be using to load the Large Objects and prepares it ready for use. Preparing a query means sending the query text (without values) to Teradata for parsing. The data values will be supplied at a later time, which is what the rest of the program does.


The control file is read within the loop. The loop is executed once for each input line. Each time through the control file is parsed (split into its component parts). For the sake of brevity I’ve omitted the loop and the sections that extract and set up the other values from the control file.


The following shows how the file representing the large object is opened. The InputStream is then passed to our prepared Statement for transmission to Teradata.

File lobFile = new File (lobFileName);
FileInputStream lobInputStream = new FileInputStream (lobFile);
long lobSize = lobFile.length();

lobStmt.setString(1, id);
lobStmt.setString(2, mimeType);
if (isBlob) {
    lobStmt.setNull (3, Types.CLOB);
    lobStmt.setBinaryStream(4, lobInputStream, (int) lobSize);
}
else {
    lobStmt.setAsciiStream(3, lobInputStream, (int) lobSize);
    lobStmt.setNull (4, Types.BLOB);
}
lobStmt.executeUpdate();
lobInputStream.close ();


In the above code, we establish a FileInputStream, but this could be any of the other types of InputStream. For example if you wish to store (persist) a Java Object in your table, you could convert it to a byte array (byte []) and use a ByteArrayInputStream to store it in a BLOB column.


After setting the id and mime type, we decide if we are working with a Binary Large Object or a Character Large Object. If we are working with a BLOB then the 3rd parameter (the CLOB) is set to null and the InputStream is supplied to parameter 4 (the BLOB). Otherwise we have a CLOB and these assignments are reversed (the CLOB gets the InputStream and the BLOB is null).


Finally we execute the statement which causes the row to be created and our file to be uploaded into either the BLOB or CLOB column. With a bit of luck our Large Objects will appear in the database. 

A note for Java 6.0 users


The astute Java 6.0 reader may wonder why I went to all the bother to use the particular setXxxStream method that I did.


In reference to the java.sql.PreparedStatement javadocs for release 6.0 you will find there are two additional setXxxStream methods which would have been more convenient to use. Indeed there are more than just two, but the point is the same. In the case of setBinaryStream these additional methods are:

void setBinaryStream(int parameterIndex, InputStream x, long length);
void setBinaryStream(int parameterIndex, InputStream x);


Equivalent signatures exist for setAsciiStream. Both of the above methods would be more convenient to use:

  • If I used the first method, the cast of the file length to int would be unnecessary (saving a whopping 6 characters per call, not to mention the corresponding productivity boost! ).
  • If I used the second method all the code relating to figuring out how big the file is (and the cast to int) would not be required. This includes the construction of the File object whose only purpose in life is to obtain the length of the file (FileInputStream has a constructor that takes the file name as a String and thus does not need a File).


So why did I use such a “convoluted” approach? The answer is two fold:

 

  • Again in reference to the javadoc, you will note that the methods listed above are new in 1.6 (a.k.a. 6.0) and thus users of earlier JDK’s would not have access to these methods. This means the attached code wouldn’t work for them.
  • The JDBC driver is compiled to be compliant with the JDBC 3.0 specification (which is part of Java 5.0 and 1.4). Thus the driver does not include these methods.

So if you are using JDK 6.0 (like me) you could try to call these methods. If you do, your program will compile, but you will get an “AbstractMethodError” thrown because the methods simply do not exist in the JDBC driver. This seems to apply up to and including version 12.00.00.106 of the driver. 

Conclusion


Hopefully this article gives you some ideas as to how you might go about loading Large Objects into Teradata.
In part 2 of this article, we will look at retrieving our large objects and use them in a web application. Stay tuned!
 

Discussion
aw185056 1 comment Joined 06/09
05 Jun 2009

Waiting anxiously to retrieved the stored objects...

MikeDempsey 94 comments Joined 10/06
05 Jun 2009

The problem you got when loading the PDF file using SQL Assistant was probably the fact that SQL Assistant has an option to specify the maximum size of a LOB to be imported.
My guess is that you had this value set smaller than 51K, which would result in the message that the LOB was too large.

This option is used to limit the amount of memory it consumes but it must always be set greater than the largest file you are planning to import.

Note that SQL Assistant 12 is limited to a maximum LOB size of 10MB;
version 13 has no such limit.
(BTEQ was limited to 64K - not sure about 12.0 or 13.0)

SuzanneA 1 comment Joined 05/09
09 Jun 2009

Please add a "bteq" tag to the article. Thanks. Also, BTEQ supports deferred mode transfer of LOB files starting with the 12.00.00.02 version. It is its support for inline mode transfer that is limited to 64K.

dbc1012 31 comments Joined 03/09
09 Jun 2009

Thanks Mike for the tip relating to the option in SQL Assistant that specifies the maximum LOB size.
We had a bit of discussion on the topic and discovered that in addition to setting this parameter there is a setting in the ODBC Data Source definition that must also be checked.

1) Select your Data source and choose configure.
2) Click the "Options >>>" button
3) Ensure that "Use Native Large Object support" is checked.

nosnow111 2 comments Joined 05/09
02 Nov 2009

attachment seems to be missed

dbc1012 31 comments Joined 03/09
02 Nov 2009

Hmmm, it was there, but now it is missing.
I've readded it - all the best.

CarlosAL 6 comments Joined 04/08
09 Mar 2010

Windows Scripting Host & OLEDB/ADO is an alternative for uploading & downloading BLOBs/CLOBs from/to OS files as explained here:

http://carlosal.wordpress.com/2008/11/23/insertar-ficheros-de-imagen-en-teradata-usando-windows-scripting-host-y-ado/

http://carlosal.wordpress.com/2010/01/26/guardar-datos-de-columnas-blob-en-teradata-como-ficheros-usando-windows-scripting-host-y-oledbado/

HTH.

Cheers.

Carlos.

jaknap 1 comment Joined 06/10
10 Jun 2010

Attachment Examples.zip seems to be missing please. Thx

dbc1012 31 comments Joined 03/09
15 Jun 2010

I've re-added it. Hopefully it sticks around this time :-)

sutapa81084 3 comments Joined 07/10
19 Jul 2010

Hi,

I am new to teradata and working putting image files in teradata using your blog.I have successfully being able to do it using sql assistant.I was trying to check if i can do the same using BTEQ.I had been struggling with it and need your help and suggestions....

Here is my table which i created:
create table ALL_WKSCRATCHPAD_DB.devX_lob (
id varchar (80),
mime_type varchar (80),
binary_lob binary large object
)unique primary index (id);

The bteq file which i am using luks like ths:

.set width 132
.logon usrname,pwd

delete from ALL_WKSCRATCHPAD_DB.devX_lob all;

.import vartext ',' file='/temp/lob/lobjobbteq.txt';

.repeat *
using (lobFile blob as deferred, id varchar(40), lobType varchar(1), mimeType varchar(40))
insert into ALL_WKSCRATCHPAD_DB.devX_lob (id, mime_type, binary_lob)
values (:id, :mimeType, :lobFile);
.quit

====================================================

my lobjobbteq.txt looks like this:

/temp/lob/thumbs down.gif,thumbsdown.gif,B,image/gif
/temp/lob/thumbs up.gif,thumbsup.gif,B,image/gif
/temp/lob/crying.gif,crying.gif,B,image/gif
/temp/lob/small.pdf,small.pdf,B,application/pdf

I am getting the following error when i am logging in:

import vartext ',' file='/temp/lob/lobjobbteq.txt';
BTEQ -- Enter your DBC/SQL request or BTEQ command:

.repeat *
BTEQ -- Enter your DBC/SQL request or BTEQ command:
using (lobFile blob as deferred, id varchar(40), lobType varchar(1), mimeType varchar(40))
insert into ALL_WKSCRATCHPAD_DB.devX_lob (id, mime_type, binary_lob)
values (:id, :mimeType, :lobFile);
*** Starting Row 0 at Mon Jul 19 11:57:41 2010

*** Failure 2673 The source parcel length does not match data that was defined.
Statement# 1, Info =2
*** Total elapsed time was 1 second.

*** Failure 2673 The source parcel length does not match data that was defined.
Statement# 1, Info =2
*** Total elapsed time was 1 second.

*** Failure 2673 The source parcel length does not match data that was defined.
Statement# 1, Info =2
*** Total elapsed time was 1 second.

*** Failure 2673 The source parcel length does not match data that was defined.
Statement# 1, Info =2
*** Total elapsed time was 1 second.

*** Warning: Out of data.
*** Finished at input row 4 at Mon Jul 19 11:57:41 2010
*** Total number of statements: 4, Accepted : 0, Rejected : 4

*** Total elapsed time was 1 second.

*** Total requests sent to the DBC = 4
*** Successful requests per second = 4.000

BTEQ -- Enter your DBC/SQL request or BTEQ command:
.quit
*** You are now logged off from the DBC.
*** Exiting BTEQ...
*** RC (return code) = 8

If you could kindly take a look and suggest what exactly is going wrong would help me a lot!I have used your steps throughout for entering image through sql assistant but i want to do it with BTEQ and it returns me the above errors,kindly help.

Thanks
SG

Sutapa Ghosh

dbc1012 31 comments Joined 03/09
19 Jul 2010

Basically you need a side order of "LOBCOLS" with that script.
The .import command needs to know how many columns are Large Objects. You can read more about this in the .import command section of the bteq manual available from teradata.com.

However, if you modify your import to read as follows, you should be OK:
.import vartext ',' lobcols=1 file='/temp/lob/lobjobbteq.txt';

sutapa81084 3 comments Joined 07/10
20 Jul 2010

Hi,

Thanks so much for your reply,after trying to run the bteq query,i realized that according to your blog BLOBS are supported by bteq 12.00.00.02 or late and we have bteq 12.00.00.00.00,i would try the above suggestions as soon as we update ours.

Also,what i am trying to figure out is if there is any way using which we can upload the pics from a web front end and then in a batch process upload the pics into the database.....i downloaded upload beans which can be used to upload pics and store them in a folder or zip.....and then have a context file made in the folder which can be run in batch mode.....i don't know if there is any better way of doing it...if you have knowledge to share in this regard ,let me know.

Thanks

Sutapa Ghosh

dbc1012 31 comments Joined 03/09
20 Jul 2010

I'm not familiar with "upload beans", but its web site (http://www.javazoom.net/jzservlets/uploadbean/uploadbean.html) says you can upload the files into memory (and files and other targets - including a database!). Given that it is in memory, you can probably access it's data as an array of bytes.

If so, the example code above that is based upon a file input stream can be used. Rather than creating a file input stream, you can probably use a java.io.ByteInputStream.

At the end of the day, you should look at the methods your package uses to make data available (ideally in memory - because then you won't have to muck around with temporary files) and the classes in java.io that allow you to get an InputStream. Your "upload beans" package may even provide a "getUploadedDataAsInputStream" type of method that returns an InputStream of some sort.

I've also done this using my own custom class (i.e. I wrote my upload bean class), but I won't have access to this code for a couple of weeks.
My class provides the data as "byte []" which I can then use to create a ByteInputStream and therefore upload it to the database.

There are plenty of examples out there that will upload from a web page and provide access to the data either as an InputStream or something that can be used to construct an InputStream based object. In the unlikely event your "upload bean" doesn't provide this access from the memory object, you could either use one of the files that it creates, examine it's "database" option a bit more, or use google to find an alternative that does give you access to it's uploaded data as a byte array.

Good luck

sutapa81084 3 comments Joined 07/10
21 Jul 2010

Hey ,

Thanks for your reply and i do appreciate your help.I worked around the database upload feature and it did play the trick......i could upload it directly to the database.....:)

Sutapa Ghosh

dibya_plus 3 comments Joined 05/08
17 Jul 2014

Thanks for such a nice article. In above example, we have made the id attribute as varchar. I want to make the ID column as integer than varchar. I tried to load the data with integer but getting error. I have couple of questions... 1. Can we load integer data with lob. 2. if yes then how can I load integer with lob?

dbc1012 31 comments Joined 03/09
17 Jul 2014

Thankyou for your feedback, I'm glad you found the article helpful.
If I understand your question, you are really asking two separate questions.
Q1) Can the ID be an integer instead of a varchar?
A1) Yes, just call "lobStmt.setInt(1, id);" instead of "lobStmt.setString(1, id);" Obviously you would need to make appropriate changes to the table definition and change the declaration of id in the Java code from "String id;" to "int id;"
Q2) Can we load integer data into a LOB column?
A2) Yes, but why? If you have some integers, put them in an integer columns. However if you really, really, really wanted/needed to store an integer as a LOB, you would have to options. Option One, convert it to a String and store it in a CLOB. Option Two, create your integer as an Integer object (i.e. a java.lang.Integer) and serialize it as described in Part 3 of this series. When serializing a Java object such as an Integer, you would need to store the serialized representation as a BLOB in the database.
Be aware that when you serialise a java object such as a java.lang.Integer, you might get more than just the integer that you have put in it. That is, a Serialised Integer will likely be a bit bigger than the four (eight?) bytes normally required to store an integer (bigInt) value in the database.
I hope this helps.

SREDDY11 1 comment Joined 05/15
01 Jun 2015

Hi, here we are trying to import a file which as clob data in the reference file

counts.dat: is the main file

counts.dat.001.lob: reference file for clob data

Following is the script: 

 

.import vartext ',' LOBCOLS=1 file='/etl/tst/scripts/ids/counts.dat';

.repeat *

using (CURR_XSITN_ST varchar(10), RTF_UNQ_ID varchar(26),MSG_CNTNT clob as deferred)

insert into ISO_WORK.ETL_RTF_ARCHV59(CURR_XSITN_ST, RTF_UNQ_ID, MSG_CNTNT)

values (:CURR_XSITN_ST, :RTF_UNQ_ID,:MSG_CNTNT);

 

 

Teradata table to which we are importing

 

CREATE SET TABLE ISO_WORK.ETL_RTF_ARCHV59 ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      CURR_XSITN_ST INTEGER,

      RTF_UNQ_ID CHAR(26) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,

      MSG_CNTNT character large object)

PRIMARY INDEX pi_ETL_RTF_ARCHV ( RTF_UNQ_ID );

 

Following is the error: 

 

*** Error: Teradata Database returned an invalid LOB token.

 

 *** Exiting BTEQ...

 *** RC (return code) = 7

 

thanks

 

 

 

You must sign in to leave a comment.