Ok, so I shouldn’t even need to broach this topic, as I’m sure you have all heard it before: compress, compress, and compress.

However, for some reason, some sites are still not getting it, or, they have not adequately incorporated this pertinent step in their development process, leading to tables still being implemented without adequate compression. Perhaps, the tables were compressed a long time ago, but are not taking advantage of the advances in compression. like multi-value.

Compression not only decreases your storage space (anyone billing for that?) but also decreases the amount of IO that needs to be performed in retrieval and spooling queried data. For smaller tables, it can also increase the number of tables cached in memory, eliminating subsequent IO’s for those tables.

This query takes a quick look at the type of compression applied to each table, pointing to opportunities for further compression. I am looking at table 10GB in size or greater...adjust to your site's needs:

SELECT dbt.DATABASENAME,
       dbt.TABLENAME,
       MAX(CASE WHEN (compressvaluelist IS NOT NULL) 
                THEN (CASE WHEN INDEX(compressvaluelist,',') > 0
                           THEN '3. MVC '
                           ELSE '2. SVC '
                           END)
                ELSE '1. NONE'
                END) COMPRESS_TYPE,
       MIN(pds.Current_Perm) CURRENT_PERM

  FROM dbc.columns dbt,

      (SELECT t.DATABASENAME,
              t.TABLENAME,
              SUM(ts.CurrentPerm) CURRENT_PERM 
         FROM DBC.Tables t,
              DBC.TableSize ts
        WHERE t.DATABASENAME = ts.DATABASENAME
          AND t.TABLENAME = ts.TABLENAME
          AND ts.TABLENAME <> 'ALL'
       HAVING CURRENT_PERM > 10000000000
        GROUP BY 1,2) pds

WHERE dbt.DATABASENAME IN ('AAAA','BBBB','CCCC')
  AND dbt.DATABASENAME = pds.DATABASENAME
  AND dbt.TABLENAME = pds.TABLENAME

-- HAVING COMPRESS_TYPE = '1. NONE'

GROUP BY 1,2
ORDER BY 1,3, 4 DESC,2
;

 

Here are some sample results from the query:

The query results simply give me a feel for what level of compression has been applied. NONE is bad, SVC (Single Value Compression) is better, but MVC (Multi-Value Compression) is where we really where we want to be at if possible.

The results are ordered within database by the size of the table and type of compression. Largest tables with NONE (no compression, followed by largest tables with SVC compression, etc. This should help you to quickly target the large outliers.

 

BTW, if you are just starting to compress, or redoing your compression, I propose this methodology:

Identify the largest tables in each database, as well as all the frequently used smaller common reference tables.

Start working down each database hitting the largest tables. For example take the top 5 from database a, b and c. Compress, implement, then move to next largest tables.

Simultaneously, work through as many of the common reference tables as you can as quickly as you can; since most of these will be your most widely used tables across applications, and will therefore benefit as wide a user base as possible. (Remember, frequency of use equates to more IO, so making these tables smaller and potentially cacheable can have a large impact on overall throughput).

 

Try it at your site and see what you get. If you can, please share some feedback on your results with the rest of us.

As always, comments or questions are welcome.

Good Luck!

Dave

 

Discussion
GALPGR 10 comments Joined 08/09
21 Jul 2010

Hello !

What are the best practices to modify a table without compression?

Example:

Table uncompressed:

CREATE SET TABLE AAAA.TABLE-A,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
COLUMN0 BYTEINT NOT NULL,
COLUMN1 DATE FORMAT 'YYYYMMDD',
COLMUN2 CHAR(20) NOT NULL
)
PRIMARY INDEX ( COLUMN0 )
;

RENAME TABLE AAAA.TABLE-A TO AAAA.TABLE-A_old
;

New Table compressed:

CREATE SET TABLE AAAA.TABLE-A,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
COLUMN0 BYTEINT NOT NULL,
COLUMN1 DATE FORMAT 'YYYYMMDD',
COLMUN2 DECIMAL CHAR(20) NOT NULL COMPRESS ('CONSTANT0', 'CONSTANT1', 'CONSTANT2', 'CONSTANT3')
)
PRIMARY INDEX ( COLUMN0 )
;

INSERT INTO TABLE AAAA.TABLE-A
SELECT * FROM AAAA.TABLE-A_old
;

DROP TABLE AAAA.TABLE-A_old
;

Finally, collect statistics on the table AAAA.TABLE-A

Is there a better solution for it?

Regards,

David.Roth 17 comments Joined 05/09
21 Jul 2010

This is the process I generally see used. Alters are limited, especially if there is any volume to the table.

The other thing to remember is that when new table is created any permissions on the old table need to be applied as well

Dave

GALPGR 10 comments Joined 08/09
21 Jul 2010

Thank you very much for your reply Dave!

GALPGR 10 comments Joined 08/09
21 Jul 2010

Thank you very much for your reply Dave!

jkwight 13 comments Joined 06/10
03 Aug 2010

Dave,

Compression (or doing it) is usually something everyone would LOVE to do - BUT, the tools available (???) are either limited, do not make the right decisions (e.g. compress all values up to and including the first 256 values), cumbersome to use, etc. etc. What would you recommend (that you know) to address these extremely time consuming issues. Provided there is a good tool (tool set) to better automate, analyze and implement, everyone would be 'compressing' and life in your world - would be good;-) Your approach is good, it's the execution of analysis on 'what to do' that is the laborious part and where a better tool would help.

JK

JK - TD-PS
Arizona Territories

Denor 1 comment Joined 08/10
27 Aug 2010

Hi Dave,
I am newbee for Teradata.
I just wanted to know, what a Teradata DBA does every morning
like check - nightly backups, check Permanen space and spool space etc.
what else he does.
Thanks in advance

DN

goldminer 9 comments Joined 05/09
08 Sep 2010

AtanaSoft has a pretty good tool called CompressTool (there is a pretty hefty price tag with it though). Deiter also has a pretty good compression algorithm that I have used quite a bit in the past.

jkwight 13 comments Joined 06/10
08 Sep 2010

Goldminer - I know Compress Tool well - was a BETA tester for it. For one customer, by compressing only 11 of their big tables with the Compress Tool (single run) they were able to save more than the price of Compress Tool! So, I think it's a matter of perception - OR - as Teradactyls, we got used to getting 'free/cheap tools' from TD (e.g. Queryman and WinDDI) - not free really but thousands of copies all over the world and not even @ TD Sites give it that impression. AND, SQL Assistant can't hold a candle to AtanaSoft's Query Tool! Horses for courses I guess. For me, I purchased AtanaSuite for myself while I was still working for TD. Now that I've left and a DBA at a small TD customer I still have it. If you've every used Compress Tool in earnest - you'd never want anything else for doing Compression. Haven't seen anything close to it - yet. But then again, that's just me for what it's worth on today's market;-) Thanks for your input.

JK - TD-PS
Arizona Territories

jkwight 13 comments Joined 06/10
08 Sep 2010

Dear DN. As a DBA on a small site (just in data, not complexity) I use the PCMP suite of data collection facilities. Scripts run each night to scrape DBC tables (including DBQL tables) and put them off in history tables. From there, there are a lot of reports you can run. Also, I ahve TD Manager Alerts set up to monitory my 'data databases' and if any of the databases get wtihin X% - I get an EMail message and I then go in and take care of it. You can also create a BTEQ Script or (I prefer a Stored Procedure) to run daily, check DB sizes and if it finds any over x%, it can allocated more space using Dynamic SQL. So yo ucan automate a lot of the monitoring activities, in several ways. Hope this helps.

JK - TD-PS
Arizona Territories

goldminer 9 comments Joined 05/09
08 Sep 2010

I agree JK... I just bought CompressTool (along with SyncTool) a couple of weeks ago and love it. My only point was that to justify the cost, even though we know it saves money, to upper management can be a hard thing to do.

jkwight 13 comments Joined 06/10
09 Sep 2010

Godminer - that's great and you're right about getting upper mgt to understand about there tools and specially with Teradata! If you go to na oracle shop, you normally see TOAD which is what AtanaSuite was modeled after for TD and cost there is 'no problem', but when you bring up the equivalent tool for TD for roughly the same price - they go nuts - 'are you kidding - that's too expensive'!!! Go figure. Glad you got those two tools - they'll save you a ton of time/effort, etc.

JK - TD-PS
Arizona Territories

anees1176 1 comment Joined 12/10
14 Dec 2010

JK How much did it cost you buying Atana Suite with compress tool for urself ... Just a ball park range ... ?

jkwight 13 comments Joined 06/10
14 Dec 2010

anees1176 - Sorry, I do not have a Compress Tool license and do not have any pricing. Please contact AtanaSoft directly through their website (atanasoft.com) or you can email Dave Richardson (AtanaSoft) dir3ectdly with an inquiry @ dbrichardson@atanasoft for pricing. Please feel free to mention me (Jk Wight).

JK - TD-PS
Arizona Territories

TDlearner 2 comments Joined 09/10
22 Dec 2010

Hii,

I got to know that we can't compress on VARCHAR and any of the PI columns.
Can somebody let me know whether we can compress secondary index columns/or PPI columns? Or any other exceptions for compression?

David.Roth 17 comments Joined 05/09
08 Jan 2011

Secondary Index columns are no problem, but PPI columns are excluded just like PI columns. On a bright note, the VARCHAR restriction is lifted in version 13.10

TDlearner 2 comments Joined 09/10
19 Apr 2011

Thanks David

super25 4 comments Joined 07/11
18 Sep 2011

David, I have a general question using MVC, say if i am creating the DDL using COMPRESS for a column without specifying the values, what does it do?
Does it compress all the values in the column by default or MVC is ignored?

David.Roth 17 comments Joined 05/09
18 Sep 2011

Using COMPRESS by itself will compress the NULL values in the column. There is no option to automatically compress all values.

SPOLISETTI 2 comments Joined 06/09
08 Jun 2012

David, Do you have the Compression Tool Kit that runs on UNIX/LINUX platform? If so, can you please share it with me? We have the WINMVS but it doe not run on Windows 7. So, I am looking for UNIX version of the compression scripts. Any help is highly appreciated.

ramubindu 10 comments Joined 06/08
09 Jun 2012

Hi Spolisetti,
i am looking for the WINMVS compression tool could you help me out.

My Id : ramu.teradata@gmail.com

amisaxen 1 comment Joined 03/13
08 Mar 2013

Hi,
Compression proposed isa saving table size by 30% +, but client has concern that some of these columns are derived from bases that can change ,
If any of these bases change the profile of the data in the tables will change, which means that a totally new set of ‘ideal’ compression values would apply.
How often would the compression values be reviewed?
 
Regards,
Amit
 

David.Roth 17 comments Joined 05/09
17 Mar 2013

Amit,
Most of the time the actual demographics of compressed data changes infrequently, i.e Flags continue to have Y/N values. etc.
There are two instances that cause clients to re-evaluate the compression values:
First, when the structure or contents are changing due to development. Obviously, if you are adding substantially new data a re-check of the compression values is probably called for.
Second, most clients set up a shedule for "re-checks". Most often, I have seen this scheduled once or twice a year for tables that are experiencing organic growth, i.e. transaction type tables. Obviously, reference or static table will not need this type of re-checking but would only fall under the first category.
Regards,
Dave

amittera 10 comments Joined 12/09
19 Mar 2013

Thanks Dave,
I proposed  MVC to my client for Teradata 12 Tables. As per the analysis, I got around 800 GB approx saving on 2.3 TB of tables giving table level and column level savings. After too much of investigation on MVC, Client has come up with a concern as below:
 
Concern: Some of these columns are derived from bases that can change e.g. pricing strategies, cost price changes, tax (vat).
If any of these bases change the profile of the data in the tables will change, which means that a totally new set of ‘ideal’ compression values would apply.
As per my understanding , If the column values are more volatile for derived columns then we do not suggest applying the compression
But if columns values are more duplicate and static then apply the compress to save the space.  But on the whole ,  I am still confused that even if the columns are derived, but I somehow still got the savings for that table, around 30%-40 %.
Can you please advise , if there is a way, we can apply compression on tables with some some/all derived columns, as i can see much saving ..
 
Regards,
Amit

Amit Saxena
Teradata Consultant

amittera 10 comments Joined 12/09
02 Apr 2013

Hi Dave,
 
Can you please help me on above request.
 
Regards,
Amit

Amit Saxena
Teradata Consultant

David.Roth 17 comments Joined 05/09
02 Apr 2013

I am not sure what you are asking for here. Either the values are persistent enough (months or more) or they are not, and I would agree to not compress those values.
Dave

07 Oct 2014

Hi Dave,
We are in a phase to implement MVC across QA and PRD. Our MS Team recommending to do analysis directly on target table but i am suggesting them to load 60% of source table (big table) data in to the compression database (which we are going to use for compression activity). so that we can avoid ETL window, CPU resource consumption and can run analysis on this table during off hours but MS team suggesting that there is

  1. No need to copy the data to other database or environment
  2. 100% data gives accurate analysis
  3. MVC needs only Access lock, hence tables will be completely available to users

What would you suggest?
 

You must sign in to leave a comment.