All Forums Database
sreedhar_tera 4 posts Joined 09/11
18 Nov 2011
Multi-Value compression has increased the table size

Hi ,

i have a table with 4 columns like below:

create set table T1 ( col1 decimal(10,0) NOTNULL, col2 char(5) NOTNULL,

                      col3 decimal(10) notnull , col4 Date FORMAT 'yyyy-mm-dd' default date

UNIQUE PRIMARY INDEX p1 (col1,col2,col3);

row count : 3.5 billion, tablesize (before compression): 100GB

so i have compressed this table with multi-value compression on column col4 (Date) datatype, which has numerous number of duplicate value dates.

but the problem here is after compression is completed the table size has increased to 120GB ,instead of reducing the size, i feel it as unusual ..

can anyone help me why the tablesize has increased after the compression?

your soon response is appriciated..

Thank you..

Thanks, Sree..
dnoeth 4628 posts Joined 11/04
19 Nov 2011

According to you table definition the size of an uncompressed record would be 40 byte, so 3,5 billion rows calculate to 140GB.

Dieter

Dieter

meet_as 15 posts Joined 01/06
19 Nov 2011

Hi,

I think the row length would be 8B+5B+8B+4B=25 B. B denoted Byte.

DECIMAL(n,m) columns are stored by the Teradata database in 1, 2, 4, 8, or 16 bytes, depending on the value of n:

 

Number of Digits

Number of Bytes

1 to 2

1

3 to 4

2

5 to 9

4

10 to 18

8

19 to 38

16

Date stores as integer so size is 4byte.

Please correct me if i am wrong.

Regards,

AS

 

dnoeth 4628 posts Joined 11/04
19 Nov 2011

25 Bytes of data + 2 Byte record length + 8 Byte ROWID + 2 Presence Bytes + 2 Bytes in the pointer array = 39 Bytes

Rows are stored byte-aligned: 40 Bytes.

Dieter

Dieter

sreedhar_tera 4 posts Joined 09/11
22 Nov 2011

Hi ,

Thank you for your efforts, but still i think i didn't get the right answer..

my question was.... why the table size is increased aftre compression is performed, instead it's size should redused..

any idea why?

 

Thanks,
Sree..

mjasrotia 66 posts Joined 08/11
22 Nov 2011

Hey Sreedhar,

 

As Dieter said. The actual size of table was 150 GB so the table size has reduced to 120 GB after compression.

 

Thanks

Manik

sreedhar_tera 4 posts Joined 09/11
23 Nov 2011

hi ,

No, the table size was 100GB before compression, but its grown to 120 GB after compression.

i used the below query to find table size before and aftre compression.

SELECT DatabaseName,tablename,

SUM(CurrentPerm)/1024/1024 AS Tablesize_mb

,SUM(PeakPerm)/1024/1024 AS Peak

FROM DBC.tablesize

WHERE DatabaseName IN (dbname') and tablename in('tablename')

GROUP BY 1,2

ORDER BY 1,2

;

Thanks,
Sree..

mjasrotia 66 posts Joined 08/11
23 Nov 2011

Sree,

Compression although significantly reduces the size of the table but beyond a threshold value the space savings are negated as Both the Row  header and Table header increases in size.

 

Thats depends upon the no of valus being compressed. You can go in detail by going through chapter 15 Database Level capacity planning of the TD 12 Database Design book. which demonstrates as how the Presence bits are added to the Row header as you go on compressing more value in the record.

 

Just for quick reference below is the text:

 

Consider the following example. Suppose you have a table with these row characteristics:
• 14-byte row header
• 4-byte nullable nonunique primary index column
• 2-byte nullable SMALLINT non-index data column
The total number of bytes for this row is 20.
Because the primary index and SMALLINT columns are both nullable in this scenario, each
uses a null presence bit in the row header, so 5 unused presence bits remain in the default
presence octet.
Suppose you decide to compress 63 distinct values in the SMALLINT column. This requires
an additional 6 presence bits (see “Presence Bits” on page 864 and “Number of Presence Bits
Required to Represent Compressed Values” on page 868), rolling over into a new presence bits
octet. The row header is now 15 bytes wide, where it was previously only 14 bytes, but when a
row contains a compressed value for the SMALLINT column, it is 15 + 4 = 19 bytes wide, an
apparent savings of 1 byte for each such row in the table.
Upon further analysis, you realize that all the rows you thought were 19 bytes wide are actually
20 bytes wide, so no savings are accrued by compression. The reason the rows expanded from
19 to 20 bytes is the system-enforced even-byte row alignment: the system added a 20th filler
byte to the row to ensure an even offset.
Suppose that instead of compressing 63 distinct values in the SMALLINT column, you
compress only 31. In this case, there is no need to roll over to a second presence octet, so many
rows compress to 18 bytes.
You could also make the primary index non-nullable (the recommended practice anyway),
which also removes the need to roll over to a second presence octet. In this case, all rows can
compress to 18 bytes.

 

 

Manik

sreedhar_tera 4 posts Joined 09/11
23 Nov 2011

Thank you so much for your input Manik,

i have one more doubt, Compression also improves the performance right? in this case (my above case where the size is increased after compression) does this applies?..

i.e.. is the perfomance gain will be achived for the compressed table even though the size is increased?

 

 

Thanks,
Sree..

mjasrotia 66 posts Joined 08/11
24 Nov 2011

Compression is primarily for space reduction and it has a performance upgrade also because more rows can fit into a Data Block and fewer data blocks are read to read more rows.

But if Compression is adding space then I don't think there is any benefit.

I would rather not compress the values beyond the threshold where it actually starts adding space.

 

Thanks

Manik

logc 34 posts Joined 09/05
01 Dec 2011

Does a limit exist for the number of charcters stored in the row header?  A user has DDL that fails (CREATE TABLE Failed. 5627:  Column 'VAR28' exceeds system limit for COMPRESS.)

The column compression statement in question has 172 unique values but the compression values (including single quotes and commas) add up to 7257 characters. 

mjasrotia 66 posts Joined 08/11
02 Dec 2011

A Row Header has no limit as such but the Maximum row size has a limit to an extent which could fit into a DATA BLOCK.

TD has a max block size of 64 KB

Data Block Size(TD 12)  = (DATABLOCK HEADER + ROW DATA + REFERNCE ARRAY + DATABLOCK TRAILER) = 72 bytes (for new and updated tables - (36 for Ver6.2))

so a row in essence could not grow beyond 64 KB - 72 bytes.

Also it depends upon the max data block size defined by your DBA.

Hope it helps.

Manik

logc 34 posts Joined 09/05
06 Dec 2011

Thanks Manik, I appreciate your response.

ToddAWalter 316 posts Joined 10/11
06 Dec 2011

There are two limits that apply. There is a maximum number of bytes for the compression values for a single column controlled by the CompresValueList column in TVFields dictionary table (8192 bytes currently). This is likely the cause of your 5627 error. Separately there is a grand total number of bytes that can fit in a table header. This contains all compression values for all columns, index definitions, column definitions, partition definitions, format strings,... It is not possible to say an exact number for the limit of total compression values because it depends on the size of all the other parts of the table definition. For instance, a very wide table with lots of columns and indexes will be able to accomodate less compression values than a narrower table with less other structures associated with it. Total table header size is currently 1MB.

ndbajde 6 posts Joined 09/06
25 Oct 2012

If you are running on a 64 bit system you may also be running into a 8 byte alignment requirement.

JohnE

amittera 35 posts Joined 12/09
19 Mar 2013

Hi Dieter,
 
I proposed a MVC to my client for Teradata 12 Tables. As per the analysis, I got around 800 GB approx saving on 2.3 TB of tables giving table level and column level savings. After too much of investigation on MVC, Client has come up with a concern as below:
 
Concern: Some of these columns are derived from bases that can change e.g. pricing strategies, cost price changes, tax (vat).
If any of these bases change the profile of the data in the tables will change, which means that a totally new set of ‘ideal’ compression values would apply.
How often would the compression values be reviewed?
 
As per my understanding , If the column values are more volatile for derived columns then we do not suggest applying the compression
But if columns values are more duplicate and static then apply the compress to save the space.  But on the whole ,  I am still confused that even if the columns are derived, but I somehow still got the savings for that table, around 30%-40 %.
Can you please advise , if there is a way, we can apply compression on tables with some some/all derived columns, as i can see much saving ..
 
Regards,
Amit

Amit Saxena
Teradata Consultant

amittera 35 posts Joined 12/09
04 Apr 2013

Hi,
 
Can you please help me with an update if above concern looks meaningful.
 
Regards,

Amit Saxena
Teradata Consultant

raj786 23 posts Joined 04/14
04 Oct 2014

hi
 
what is the max number of value per column that can be compressed using MVC in TD14.
 

Raja_KT 1246 posts Joined 07/09
06 Oct 2014

You can read the doc :)

You cannot compress more than 255 distinct values for an individual column.

 

There are few more guidelines too.

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

vasudev 24 posts Joined 12/12
08 Jan 2015

Hi,
I wanted to know how much i can save using the MVC on multiple columns in a table. 
I am taking a example. In TD 12 I am having a table A of 50 GB.
Going to compress demical(18,2) columns. Like this col1 - 15 values represents 15% of table data. Col2 - 255 values represents 35% of table data. If i apply compression on both these columns how much can i save? Is there any formula to calculate the savings because of compression? Please help.

You must sign in to leave a comment.