
Why don't you use ALTER TABLE to alter a table?
To add or drop a column or modify the list of compressed values of an existing table is a quite expensive operation. For a large table it might result in a huge amount of CPU and IO usage and a loooooooong runtime. This blog discusses the pros and cons of the different ways to do it.
Alter Table vs. Insert Select vs. Merge Into
As always in SQL one got multiple choices to reach the same goal: modify a table directly or move the data to a new table. The former is ALTER TABLE (Alter), the latter INSERT SELECT (InsSel) or its less well-known variation MERGE INTO (Merge).
Let's start with a list of pros and cons (red = negative, green = positive):
ALTER TABLE | INSERT SELECT | MERGE INTO | |
---|---|---|---|
Needs Transient Journal? | no | no | no |
ABORT possible? | no | yes (fast) | yes (fast) |
Rollback during system restart? | no | yes (fast) | yes (fast) |
LOCK on source table |
exclusive |
read |
read |
Spoolspace used |
no |
yes, same as source |
no, when PI doesn't change |
Additional Permspace used |
low, 2 cylinders per AMP |
high, same as source |
high, same as source |
Works on a table copy? | no | yes | yes |
Must Create/Drop/Rename Table? |
no |
yes |
yes |
Must recreate Secondary/Hash/Join Indexes Foreign Keys/Statistics/Comments Access Rights? |
no | yes | yes |
Supports changing Primary Index/Partitioning? |
no | yes | yes |
You can easily spot that InsSel and Merge are quite similar, but Alter is usually different.
The only common ground is the Transient Journal, all three don't use it (of course there are some entries indicating there some work going on, but the actual rows are not journaled). Due to that fact InsSel and Merge can be easily aborted and will rollback quite fast (just deleting all rows in the target table), but once Alter started it must finish, there's no way to abort it. Even a system shutdown can't stop it, it will simply continue after the restart. Some will consider this as positive others as negative :-)
The most important difference is the availability during the restructure process: Both InsSel and Merge apply a read lock allowing concurrent read access while Alter needs an exclusive lock blocking any access to the target table. That's the main reason why Alter is not used in most environments. Additionally before TD13 there was a table level write lock on dbc.AccessRights which was held throughout the whole process easily blocking other sessions. Yet in current releases this lock duration has been greatly reduced, now other requests will only be blocked for a short period. Some additional RowHash locks on system tables usually don't interfere with other requests, but might block backups.
Alter doesn't use Spool, it just moves blocks on a cylinder level, Merge directly merges the source rows into the target table (when source and target share the same [P]PI). But InsSel always needs to spool the source data, of course this is especially bad for large tables when explain shows "The result spool file will not be cached in memory".
Keeping a copy of the original table is often regarded as an advantage of Merge and InsSel ("just in case"), but when you're constraint on permspace you might prefer Alter's low overhead of a few megabytes per AMP.
However the biggest advantage of Alter is its simplicity, just submit "ALTER TABLE tab ADD new_col int, ADD existing_col COPRESS ('bla');", that's it.
Compare this to all the additional steps needed for InsSel or Merge. It's not only CREATE/DROP/RENAME, all those COMMENTs, GRANTs, COLLECT STATS must be scripted before and then reapplied, too. Maintaining Referential Integrity might be complicated when the table is referenced in a Foreign Key. And to speed up processing the target table will be created with the Primary Index only, any additional index must be recreated subsequently.
Resource usage and runtime
I'm not showing exact numbers because your mileage may vary, but for tables without secondary indexes the CPU/IO scoring is usually:
- Alter Table
- Merge Into
- Insert Select
In my test cases (no change of [P]PI) Merge needed almost twice the CPU and IO of an Alter and InsSel added another 20%.
When Secondary/Hash/Join indexes exist InsSel gets closer to Merge but the gap to Alter increases drastically: Alter still needs to modify only the base rows instead of re-building all the indexes.
Runtime differences should be similar to CPU/IO, but they will vary greatly amongst systems due to different bottlenecks and you should run some tests on your own system.
Conclusion
I would strongly recommend implementing Alter Table, at least start considering it. If you're concerned about availability you should bear in mind that this process will probably be scheduled out of business hours anyway.
When you want the safeness of a copy of the old table you should definitely prefer Merge Into over good ol' Insert Select, only if you need to change the PI or partitioning of a table there's not much difference between both.
This is something we struggle with deciding how to roll changes into tables multiple terabytes and hundreds of columns wide. Often the business would like us to keep a backup of the table until the change has been verified so I typically elect to use MERGE. One thing that I stumbled across is the ability to copy statistics from one table to another.
So my typical approach to applying table changes to a large production table is as follows:
For the most part I have the luxury that space to store a redundant copy of the table has not been an issue. It's always a gamble to determine if this table restatement is faster than simply applying the change(s) via ALTER but when a backup of the table is required it almost becomes a moot point.
Thanks for the write-up though because it helps clarify things and provide options on how to manage change within your data model.
Thnk for the useful information. I was thinking about this in the past but i never had strong arguments to use it.
Hi Rob,
thanks for your supplementary notes.
Of course COLLECT STATS FROM avoids recollecting, it was one of the TD13 features i liked best, but maybe it's less well-know.
Dieter
Dieter
Dieter, are there any concerns about physical data fragmentation when perfoming an ALTER on a large table? In past versions we have been advised against ALTER statements for this reason.
Thanks a lot dieter & rob for such a nice information.
Just want to clear one doubt in the steps mentioned by rob above-->
In case , we miss to perform the step 2 & simple execute refresh stats script in the end.
Collect stats on new_tab ;
Is it equivalent or are we missing some stats info. here.?
Thanks in advance!
Cheers!
Nishant
Hi Nishant,
when you define NO DATA the stats definitions are copied, but the actual values are set to zero rows.
Now when you re-collect the stats you need resources to do it, while a COLLECT FROM simply copies the existing stats without any overhead. So you better do step #2 any time before you drop the original table :-)
Dieter
Hello Dieter,
When you run an alter statement on a very large non empty table to extend its current ppi date range, do you expect such an alter statement to run quickly without any transient journalling or rollback concerns(in case of an abort)?
The case in point is a 3TB table and I need to extend the current date range of this table(which is 31-dec-2015 )all the way to 31-12-3999.
Regards,
Suhail
Hi Suhail,
there might be one specific case where this is slow:
NO RANGE is defined and there are lots of rows in it. All rows matching the new partitioning will automatically be moved to the new partitions and deleted from NO RANGE using Transient Journal.
Otherwise it's just a simple modification of the system tables to update to the new definition.
Dieter
ok thanks for the confirmation Dieter. I reviewed a sample explain plan and verified that too.
-Suhail
I'm starting to use ALTER's more and more, particularly where the Column naturally resides at the end of the Table.
Interesting point about the Locking level required.
Oh by the way, for NOT NULL Columns I tend to use a Default value (Blank on Character based Columns) as part of the Alter Statement.