All Forums Connectivity
jasonmoore2k 11 posts Joined 06/15
17 Aug 2016
Teradata JDBC Driver returns the wrong schema / column nullability

I'm running an Apache Spark application that uses terajdbc4 to fetch data from a Teradata server.  It seems to be returning the wrong value for the nullability for many of the columns in the table.  I've found two scenarios at work:
1. If I use dbtable = "TABLE_NAME" all columns are returned with nullable = true (despite some not being nullable)
2. If I use dbtable = "(SELECT * FROM TABLE_NAME) as t" all columns are returned with nullable = false (despite some being nullable)
Is this a known issue with either the Teradata driver or the server?

Tags:
tomnolan 594 posts Joined 01/08
18 Aug 2016

Which JDBC API method does your application call in order to obtain the column nullability information?
 
If your application is calling the ResultSetMetaData.isNullable method, for that method (and all the other ResultSetMetaData methods), the Teradata JDBC Driver returns information to the application that is provided by the Teradata Database.
 
You are observing a difference in metadata when referring to a table name directly versus referring to a derived table subselect expression. The Teradata Database does treat expressions differently from direct table/column references with respect to metadata such as nullability.
 
You can specify the LOG=DEBUG connection parameter and the Teradata JDBC Driver will print to System.out all the details of the information received from the Teradata Database.

jasonmoore2k 11 posts Joined 06/15
18 Aug 2016

The call is made in Apache Spark's infrastructure, and it uses ResultSetMetaData.isNullable as you expected:
https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140
val nullable = rsmd.isNullable(i + 1) != ResultSetMetaData.columnNoNulls
I'll try LOG=DEBUG to see if I get anything useful.

tomnolan 594 posts Joined 01/08
19 Aug 2016

That logic in Apache Spark is incomplete. The ResultSetMetaData.isNullable method can return 3 possible values:
columnNoNulls indicates that a column does not allow NULL values.
columnNullable indicates that a column allows NULL values.
columnNullableUnknown indicates that the nullability of a column's values is unknown.
 
 
As I mentioned earlier, the Teradata Database does treat expressions differently from direct table/column references with respect to metadata such as nullability. In particular, for expressions, the Teradata Database is likely to indicate that the nullability is unknown, which in turn means that the Teradata JDBC Driver will return columnNullableUnknown from the ResultSetMetaData.isNullable method.
 
The logic in Apache Spark is misleading. The "val nullable" is false when the column is definitely known to disallow NULLs, but the "val nullable" is true when the column permits NULLs and also when the column's nullability is unknown (such as if it's an expression).
 

jasonmoore2k 11 posts Joined 06/15
19 Aug 2016

True, it is reducing a three-value to a boolean by mapping the unknown to nullable = true, and this might explain case #1.
But I can see an actual problem in case #2 where every column was returning nullable = false, which should only happen if ResultSetMetaData.isNullable is returning ResultSetMetaData.columnNoNulls - which just isn't correct (and the column actual contains some null values).
I'll have some time in a few days to put together a simple example with the debug logging to confirm exactly what's happening.

jasonmoore2k 11 posts Joined 06/15
21 Aug 2016

I've put together an example outside of the Spark application which has confirmed my suspicion above (using 15.10.00.22):

val driver = new com.teradata.jdbc.TeraDriver()
val url = "jdbc:teradata://..."
val properties = new java.util.Properties()
val conn = driver.connect(url, properties)
val table = "DIM_OFFER" // contains a non-nullable field called OFFER_NBR and a nullable field called OFFER_DESC

val statement = conn.prepareStatement(s"SELECT OFFER_NBR, OFFER_DESC FROM $table WHERE 1=0")
val rs = statement.executeQuery()
val rsmd = rs.getMetaData
rsmd.isNullable(1) // 2 (i.e. ResultSetMetaData.columnNullableUnknown)
rsmd.isNullable(2) // 2

val statement = conn.prepareStatement(s"SELECT OFFER_NBR, OFFER_DESC FROM (SELECT OFFER_NBR, OFFER_DESC FROM $table) as t WHERE 1=0")
val rs = statement.executeQuery()
val rsmd = rs.getMetaData
rsmd.isNullable(1) // 0 (i.e. ResultSetMetaData.columnNoNulls)
rsmd.isNullable(2) // 0

Maybe it's related to sub-queries messing up the nullability?  Is this a known limitation?

tomnolan 594 posts Joined 01/08
22 Aug 2016

>>> Maybe it's related to sub-queries messing up the nullability?  Is this a known limitation?
 
In short, Yes.
 
Your subquery is used as a derived table. Here is an excerpt from the Teradata Database design notes for result set metadata:

For a column reference from a set operation (e.g. UNION), derived table, recursive WITH, or non-recursive WITH, the "Base Column name" field will be the Result Column Name, and Database Name and Table Name won’t be provided.  The presence of Database Name and Table Name indicate the presence of “Base Column Name”. When Database Name and Table Name aren’t present (indicating that the SQL statement caused the base table information to be either ambiguous or “lost”), then Base Column Name is simply Result Column Name. 

 
This design note indicates that the Teradata Database cannot provide accurate metadata for the schema name and table name (rsmd.getSchemaName and rsmd.getTableName) for a derived table's result set columns. The design note doesn't metion nullability, but there appears to be a similar limitation.
 
I did the following test...

create volatile table tomtest(c1 integer, c2 integer not null) on commit preserve rows

followed by

select c1, c2 from tomtest

and also

select c1, c2 from (select c1, c2 from tomtest) as dt

 
I enabled the Teradata JDBC Driver's debug logging with the LOG=DEBUG connection parameter. The first query, with direct table column references, produces the following debug log output. The Teradata Database indicates column c1 isNullable=Y and column c2 isNullable=N, both as expected per their column definitions in the create table command. The Teradata JDBC Driver returns rsmd.isNullable(1) = 1 (columnNullable) and rsmd.isNullable(2) = 0 (columnNoNulls), both as expected.

*** StatementInfoParcel ***

*** Begin List with 0 items ***

*** End of List ***

*** Begin List with 2 items ***

com.teradata.jdbc.jdbc_4.parcel.FullContentMetadataItem
-- Header --
itemLayoutType  : 1
itemContentType : 2
itemLength      : 91
receiveItemNumber        : 1
transmitItemNumber       : 1
-- Body --
dataType                 : 497
maxDataLengthInBytes     : 4
totalNumberOfDigits      : 10
numberOfIntervalDigits   : 0
numberOfFractionalDigits : 0
databaseName             : guest
procedureOrTableName     : tomtest
columnOrParameterName    : c1
columnPositionInTable    : 1
asClauseName             :
title                    : c1
format                   : -(10)9
defaultValue             :
isIdentityColumn         : N
isDefinitelyWritable     : Y
isNullable               : Y
mayBeNull                : Y
isSearchable             : Y
isWritable               : Y
udtIndicator             : 0
udtTypeName              :
dataTypeMiscInfo         :
charsetCode              : 0
maxNumberOfCharacters    : 0
isCaseSensitive          : U
isSigned                 : Y
isKeyColumn              : N
isUnique                 : N
isExpression             : N
isSortable               : Y
SPParameterDirection     : U
structDepth              : 0
isTemporal               : 0
attributeName            :
serverDataType           : 0
arrayNumberOfDimensions  : 0
arrayMaxCardinalities    :
arrayLowerBounds         :

com.teradata.jdbc.jdbc_4.parcel.FullContentMetadataItem
-- Header --
itemLayoutType  : 1
itemContentType : 2
itemLength      : 91
receiveItemNumber        : 2
transmitItemNumber       : 2
-- Body --
dataType                 : 496
maxDataLengthInBytes     : 4
totalNumberOfDigits      : 10
numberOfIntervalDigits   : 0
numberOfFractionalDigits : 0
databaseName             : guest
procedureOrTableName     : tomtest
columnOrParameterName    : c2
columnPositionInTable    : 2
asClauseName             :
title                    : c2
format                   : -(10)9
defaultValue             :
isIdentityColumn         : N
isDefinitelyWritable     : Y
isNullable               : N
mayBeNull                : N
isSearchable             : Y
isWritable               : Y
udtIndicator             : 0
udtTypeName              :
dataTypeMiscInfo         :
charsetCode              : 0
maxNumberOfCharacters    : 0
isCaseSensitive          : U
isSigned                 : Y
isKeyColumn              : N
isUnique                 : N
isExpression             : N
isSortable               : Y
SPParameterDirection     : U
structDepth              : 0
isTemporal               : 0
attributeName            :
serverDataType           : 0
arrayNumberOfDimensions  : 0
arrayMaxCardinalities    :
arrayLowerBounds         :

 
The second query, with a subquery and derived table column references, produces the following debug log output. In this situation, the Teradata Database indicates column c1 isNullable=N and column c2 isNullable=N. The nullability is not accurate for column c2. The Teradata JDBC Driver returns rsmd.isNullable(1) = 0 (columnNoNulls) and rsmd.isNullable(2) = 0 (columnNoNulls). While the nullability is not accurate for column c2, the Teradata JDBC Driver is dutifully reporting the value it obtained from the Teradata Database.

*** StatementInfoParcel ***

*** Begin List with 0 items ***

*** End of List ***

*** Begin List with 2 items ***

com.teradata.jdbc.jdbc_4.parcel.FullContentMetadataItem
-- Header --
itemLayoutType  : 1
itemContentType : 2
itemLength      : 79
receiveItemNumber        : 1
transmitItemNumber       : 1
-- Body --
dataType                 : 497
maxDataLengthInBytes     : 4
totalNumberOfDigits      : 10
numberOfIntervalDigits   : 0
numberOfFractionalDigits : 0
databaseName             :
procedureOrTableName     :
columnOrParameterName    : c1
columnPositionInTable    : 0
asClauseName             :
title                    : c1
format                   : -(10)9
defaultValue             :
isIdentityColumn         : N
isDefinitelyWritable     : N
isNullable               : N
mayBeNull                : Y
isSearchable             : Y
isWritable               : N
udtIndicator             : 0
udtTypeName              :
dataTypeMiscInfo         :
charsetCode              : 0
maxNumberOfCharacters    : 0
isCaseSensitive          : U
isSigned                 : Y
isKeyColumn              : N
isUnique                 : N
isExpression             : N
isSortable               : Y
SPParameterDirection     : U
structDepth              : 0
isTemporal               : 0
attributeName            :
serverDataType           : 0
arrayNumberOfDimensions  : 0
arrayMaxCardinalities    :
arrayLowerBounds         :

com.teradata.jdbc.jdbc_4.parcel.FullContentMetadataItem
-- Header --
itemLayoutType  : 1
itemContentType : 2
itemLength      : 79
receiveItemNumber        : 2
transmitItemNumber       : 2
-- Body --
dataType                 : 496
maxDataLengthInBytes     : 4
totalNumberOfDigits      : 10
numberOfIntervalDigits   : 0
numberOfFractionalDigits : 0
databaseName             :
procedureOrTableName     :
columnOrParameterName    : c2
columnPositionInTable    : 0
asClauseName             :
title                    : c2
format                   : -(10)9
defaultValue             :
isIdentityColumn         : N
isDefinitelyWritable     : N
isNullable               : N
mayBeNull                : N
isSearchable             : Y
isWritable               : N
udtIndicator             : 0
udtTypeName              :
dataTypeMiscInfo         :
charsetCode              : 0
maxNumberOfCharacters    : 0
isCaseSensitive          : U
isSigned                 : Y
isKeyColumn              : N
isUnique                 : N
isExpression             : N
isSortable               : Y
SPParameterDirection     : U
structDepth              : 0
isTemporal               : 0
attributeName            :
serverDataType           : 0
arrayNumberOfDimensions  : 0
arrayMaxCardinalities    :
arrayLowerBounds         :

 
The design note says that the database name and table name are not provided for a subquery and derived table column references. We see that behavior in the debug log output examples above. The debug log shows that the "databaseName" and "procedureOrTableName" field values are blank. This corresponds to an empty zero-length string returned from the rsmd.getSchemaName and rsmd.getTableName methods.
 
To summarize, this is a limitation of the Teradata Database. Your application must be aware that when the rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length string, then the other metadata values may not be completely accurate.
 

tomnolan 594 posts Joined 01/08
22 Aug 2016

Correction: In my post above, where I said that the nullability for column c2 is inaccurate -- it's actually the nullability for column c1 that is inaccurate.
 
Column c1 doesn't have a NOT NULL clause, so column c1 is nullable, therefore, the Teradata Database should indicate that column c1 isNullable=Y, but instead the Teradata Database indicates that column c1 isNullable=N, which is not correct.

jasonmoore2k 11 posts Joined 06/15
22 Aug 2016

I really appreciate the effort you've put in.  Looks like this is defintely an a issue in the database server itself (not the driver/connectivity).

jasonmoore2k 11 posts Joined 06/15
22 Aug 2016

Out of interest, what is the logic behind the property "mayBeNull" on FullContentMetadataItem?  I'm noticing that it differs from the isNullable property in the second query for the first column:

isNullable               : N

mayBeNull                : Y

jasonmoore2k 11 posts Joined 06/15
22 Aug 2016

And as a follow-up question, could the driver use isNullable=N + mayBeNull=Y to instead return columnNullableUnknown?

tomnolan 594 posts Joined 01/08
23 Aug 2016

isNullable refers to the storage, while mayBeNull refers to the result set column.
 
As I recall, OLEDB is the only database interface that exposes the concept of mayBeNull. The concept of mayBeNull, referring to the result set column, is not present in the JDBC API.
 
The javadoc for the ResultSetMetaData.isNullable method is fairly clear that the method returns information about the underlying storage column, not the result set column, thus it is more appropriate for the Teradata JDBC Driver to base the return value of the ResultSetMetaData.isNullable method on the isNullable value received from the Teradata Database.
 
Having said that, we'd like to help you get your application working as desired. Please provide some more detail about why your application needs the nullability information, and what your application would do with that information. Then we can perhaps suggest a workaround, or compose a new feature request.
 

jasonmoore2k 11 posts Joined 06/15
23 Aug 2016

> The javadoc for the ResultSetMetaData.isNullable method is fairly clear that the method returns information about the underlying storage column ...
I'm really not so sure about this interpretation.  Could you point me to the doc that lead you to this?  I read "Indicates the nullability of values in the designated column" as referring to the column in the result set itself not the storage column.
See https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#isNullable-int-
 
> Please provide some more detail about why your application needs the nullability information
Apache Spark uses the value returned from ResultSetMetaData.isNullable to generate Java code to e.g. read the value of the integer in a field.  If the column could contain nulls, it will first check whether the value is null before doing the read.  If the column is expected to never contain nulls, it doesn't need to do this check.
 
> Then we can perhaps suggest a workaround, or compose a new feature request.
I'm really starting to think that the logic should be changed so that ResultSetMetaData.isNullable actually returns the value of OLEDB's mayBeNull if (as you've said) it refers whether the result set column can contain null values.  Is this going to be a difficult change to make?

tomnolan 594 posts Joined 01/08
24 Aug 2016

The Teradata JDBC Driver's ResultSetMetaData.isNullable method has worked the way it does for several years, and we need to preserve its current behavior for applications that expect it.
 
We can consider providing a new connection parameter such as MAYBENULL=ON that would enable you to choose a different behavior for the ResultSetMetaData.isNullable method. The default would be MAYBENULL=OFF to yield the existing old behavior.

jasonmoore2k 11 posts Joined 06/15
24 Aug 2016

I can see why it would be important to preserve existing behaviour, and I think the approach of using a connection parameter as you've described would work well.

jasonmoore2k 11 posts Joined 06/15
30 Aug 2016

Tom, what would be the next steps for this?  Is there another system I should be putting this request into?  How would I track when it is likely to be resolved and a patch provided?  Thanks!

tomnolan 594 posts Joined 01/08
31 Aug 2016

To cover this feature request, I created JDBC RFC 183490 "Connection parameter MAYBENULL for ResultSetMetaData.isNullable variant behavior".
 
If you're a customer, you can use Teradata at Your Service (TaYS) to create an incident requesting the new feature and mentioning the JDBC RFC number. Doing that will enable us to classify the RFC as a customer RFC and to prioritize it higher.
 
Regards,
--Tom

jasonmoore2k 11 posts Joined 06/15
31 Aug 2016

Thanks Tom!  I'll see about getting our client to do as you've suggested.

You must sign in to leave a comment.