All Forums Database
eejimkos 73 posts Joined 01/12
07 May 2014
Recommendation on varhar - char columns

Hello,
 
I would like to ask about PDM on small (var)char() columns.
Is it better to define these columns as char() or varchar() if the length is until 5-10 bytes or not?
We will gain something to tablespace or on performance issues if these are used on Multiload/Fastload / group by /union/ order by ?
What about compression factor?
 
Thank you very much.

Qaisar Kiani 337 posts Joined 11/05
08 May 2014

I think the important factor would be how much variance you are expecting in that specific data field. VARCHARs takes 2 extra bytes in the table header, so if you are expecitng the data to be of variable length and the frequency is bit high then VARCHAR will be more suitable than CHAR.

tomnolan 594 posts Joined 01/08
08 May 2014

VARCHAR is better than fixed-width CHAR with respect to the Teradata wire protocol.
 
(1) Null CHAR values will consume the entire width of the fixed-width CHAR value when they are sent over the wire. In contrast, Null VARCHAR values will only consume two bytes when they are sent over the wire.
 
(2) If you use fixed-width CHAR columns with the UTF8 session character set, then the Teradata Database's Export Width feature will pad the character values with more space characters than you expect.
 

david.craig 73 posts Joined 05/13
08 May 2014

I believe Qaisar meant that the 2 extra bytes are in each row. If you have more than 2 trailing bytes of pad (1 Unicode space, or 2 Latin spaces), then varchar should take up less space. Both types should compress the same, but that depends on the technique.
There are also considerations for aligned row formats. See the database design reference for details.

Qaisar Kiani 337 posts Joined 11/05
09 May 2014

Thanks for the correction David.
Yes I meant rows having 2 extra bytes in case of varchar :)

eejimkos 73 posts Joined 01/12
16 May 2014

Thank you very much for your answers.

eejimkos 73 posts Joined 01/12
22 May 2014

Hello ,
Can i ask something more specific. (TD version 14)
I made one test on a table which has almost 500 columns and  1 M rows. (it as a source table).
Almost 200 of them are varchar(1) and compressed.
1)I changed to char(1) the varchar(1) columns and compressed , i have on perm space 2,186,168,832 bytes instead of 2,127,772,672  , more space? Is this logical?
2)Due to this is a source table which is loading by fastload , will i gain anything if i change to char(1)
. Will i manage to transfer less amount of data?So, less time.
 
Thank you for your time.
 
 
 

dnoeth 4628 posts Joined 11/04
22 May 2014

CHAR(n) always need exactly n bytes for LATIN or n*2 bytes for UNICODE while VARCHARs need 1 or two bytes for each character stored. But VARCHARs also need two byte additional overhead (a word indicating the length).
Assuming your columns are mainly Y/N or similar in LATIN a single character needs 3 bytes as a VARCHAR, but only 1 byte as CHAR. You never ever do a VARCHAR(1) on Teradata :-) 
When the actual length of a string greatly varies and the average is at least 3 characters than the maximum you start using VARCHARS.
Q2: FastLoad calculates the number of records per block based on the maximum defined length.
If you load VARTEXT there should be no difference, but if you load DATA each VARCHAR will add 2 byes more than CHAR, so in your case saving 400 bytes per row will result in a higer number of rows per block. To check look for Number of recs/msg  in the Insert Phase.
 

Dieter

david.craig 73 posts Joined 05/13
22 May 2014

There are many factors to consider when sizing a row. For character types the factors include, server character set, nullability, compression, variable length, alignment, etc. If you have not done so already, perform the the sizing procedure explained in the 'Database Design' reference under the 'Row size Calculation' section.

eejimkos 73 posts Joined 01/12
22 May 2014

Hello,
 
Thank you for your answers.
Noeth  , one question  about Q1 ,  i agree about char(1) and varchar(1) .  But as the example that i provided you , when i change from varchar(1) to char(1) , the size of the table increased.
This is why i asked.  I was expecting less space since i change almost the half of the table. (  i beleive that compression has nothing to do ,correct ? ) Any other thoughts about this behaviour?
 
About Q2 , the same happens on TPT?
 
Thank you once more.
 
 
 
 

eejimkos 73 posts Joined 01/12
19 Jun 2014

Hello,
I finally had some time to make some tests about fastload.
Having the same source file , (set Recore VARTEXT) ,  i tried to simulate two scenarios.
1.Setting almost equal the DDL Length of my columns (varchar() all)
1.Setting 4 times ofr the length of all my columns(varchar() all)
Results
--------
(case_1)
**** 16:57:29 Number of recs/msg: 326
**** 16:57:29 Starting to send to RDBMS with record 1
**** 16:57:30 Starting row 100000
**** 16:57:30 Sending row 119939
**** 16:57:30 Finished sending rows to the RDBMS

**** 16:57:30 Acquisition Phase statistics:
              Elapsed time: 00:00:01 (in hh:mm:ss)
              CPU time:     0.249602 Seconds
              MB/sec:       19.33
              MB/cpusec:    77.45
 
(case_2)
**** 17:15:08 Number of recs/msg: 80
**** 17:15:08 Starting to send to RDBMS with record 1
**** 17:15:08 Starting row 100000
**** 17:15:08 Sending row 119939
**** 17:15:11 Finished sending rows to the RDBMS

**** 17:15:11 Acquisition Phase statistics:
              Elapsed time: 00:00:03 (in hh:mm:ss)
              CPU time:     0.234002 Seconds
              MB/sec:       6.44
              MB/cpusec:    82.61
---
Since the recs/msg has been decreased  , this means that i need more time to transfer  for the same amount of data ?
Should i check anything else from the dbqlogtbl?
 
Thank you very much for your time.
 
 
 
 

You must sign in to leave a comment.