All Forums Database
tsk 13 posts Joined 10/11
15 Nov 2011
ROW LEVEL COMMITS IN TERADATA

Hi,

 

I have a requirement in my project that the view has to be available to the users 24*7 with the latest data.

 

I am maitaining type 1 records in my base table ( from which view is builit) and the table is getting upserted every time we have data from the source using a BTEQ.

My question here is ,

Lets say I have a record with Primary Key P1, first name, last name, balance in my base table .

P1 JACK MARY 100

 

On my next batch load if I get a matching primary key record for the same P1 record in my base and also 1 million new records, then I have to update the P1 record and insert the rest as new records .

Will the 1 updated row show up in the view only after my whole bteq runs or will it be reflected as and when the row is processed. Other way of asking is it a row level commit or batch level commit.

Essentially I do not want users who are querying for Primary Key P1 to see old data because rest of the 999999 records are getting inserted  because this is a real time system and users expect the latest data.

 

Please advise

 

 

 

robpaller 159 posts Joined 05/09
15 Nov 2011

If you are operating in Teradata mode and allowing dirty reads of the table (access locking or row access locking) your users should be able to see the data as the table is being loading.

 

Jimm 298 posts Joined 09/07
15 Nov 2011

If the data is loaded into a load table and updated to then copied inot your base table using bteq, then the base table will be locked for writing during the upsert operation and all the updates commited on completion.

During the whole of this operation the data will be locked out from other runs (including user queries) unless you use access locking within your views or directly from your user SQL. If you do use access locking, then you may get inconsistent data, including two rows (before and after image) or no rows. The chance is very slight, but a one in a million chance is a real risk when you are doing a million operations!

To prevent it, do not use access locking on sensitive tables and use TPump to do the upsert operation. That is the main tool which will easily allow the low level commit granularity you require. It means that you have to to the load from an external file directly to the target (in your terminology base) table.

You could do a similar thing with bteq using an Import operation on an external file - this will give you the same level of control but at greater development cost and much greater operational overhead. 

 

tsk 13 posts Joined 10/11
16 Nov 2011

Hi Rob,

Even if I use a row access locking, wouldn't the entire transcation(which is the UPSERT in my case) in bteq has to complete to even for the data to be committed into the base table?

I presume atomicity doesn't allow partial commits with an transcaction(UPSERT QUERY in my bteq)? If thats true how could the latest rows be reflected in the view despite the fact that batch process is still running?

 

Thanks,

Kaushik

 

tsk 13 posts Joined 10/11
16 Nov 2011

Hi Jimm,

I would need to run lot of data quality checks and other operations on the file I receive.

 

So I thought of the following structure:

 

1) Load the file into a staging table using MLOAD/FAST LOAD

2)Run data quality bteq and load the good records to a TEMPORARY TABLE.

3)Run a bteq to load from temporary to base table or the target table.

 

So I am not sure if I can use a Tpump and also for such huge volumes I don't think Tpump is a good utility.

 

Please correct me if I am wrong.

Thanks,

Kaushik

 

 

robpaller 159 posts Joined 05/09
16 Nov 2011

Kaushik,

I would defer to Jimm's more complete answer. Good luck with your testing.

Jimm 298 posts Joined 09/07
17 Nov 2011

If your bteq job does any sort of table-to-table copy (Insert/ Update/ Delete/ Upsert) then your target is locked for the duration of the copy. Access Locking will allow other to read the table but at the risk of momentary inconsistencies.

The easiest way is probably to do a carry forward approach by upserting an identical table and switching your views to the copy after the updating is complete. Do not try renaming the copy tables - if you do that there will be a point in time when the target table does not exist with the correct name! And make sure any dictionary updates (replace view. etc) are kept to a minimum.

With this method, there will be a brief moment when user access will be delayed if one user is running and your update is trying to replace the view. The Replace view will lock until the first job has finished with that view.

A second (high development/ high availability) approach is to load to staging and quality check as you suggest above but then extract two files - one for updates and one for inserts. Use TPump to load the updates and multiload to load the inserts. (Multiload is still deemed unsuitable for volume updates by many but unless you are talking of tens of millions of records, it is not significantly worse than Fastload and Ins/ Select.)

Give your manager the development estimate for option 2, and he may well reconsider the damage of momentary inconsistency!

tsk 13 posts Joined 10/11
17 Nov 2011

Hi Jimm,

Thanks for the response. I am struggling a little to understand the carry forward approach.

I have the following understanding from your approach:

To start with,Lets say I have a staging table T 1 and target table T 2 and indentical target table T3

Now I have a view defintion in place defined on T2. Now when I run my bteq ,I would be upserting table T2 and the view still points to T2.

Once the UPSERT is done, I run another job to upsert the identical table(T3) and switch my view to point to T3 once the job or bteq has run .

On Next instance when I run my upsert bteq for table T2, my view defintion still points to T3 table. And then when I run my job or  bteq to upsert T3 from T2, we again end up with the same problem where you are running a bteq on T3 and the view is still poiting to T3. Please let me know if I missing something here.

 

Regards,

Kaushik

 

 

 

 

 

 

Jimm 298 posts Joined 09/07
17 Nov 2011

Not quite.

You need two set of views - one for loading, another for user access.

You have a user view pointing to T3 in your scenario and you run the upsert into T2 via the load view.

At this point, users are reading T3 while you upsert T2.

At the end of the process, you switch the user view(s) to T2 and load view to T3.

You then run the upsert again - this time it updates T3. (If you are updating a high percentage of the table, it may be faster to Delete All and Insert/ Select - you will need to benchmark this if the percentage of records updated is over c. 10%.)

The next night (assuming daily update) users are reading T2 while you upsert T3 with the next data files...... and the loop repeats.

anjanpaul 6 posts Joined 12/10
23 Nov 2011

Hi Kaushik,

You may try the steps below:

STEP 1) Create a view of the table as below:

REPLACE VIEW <DATABASENAME>.<BASETABLENAME>_VW AS

LOCKING ROW FOR ACCESS

SELECT

     P1,

     FIRSTNAME,

     LASTNAME,

     BALANCE

FROM 

      <DATABASENAME>.<BASETABLENAME>;

 

STEP 2) Use MERGE statement on this view in the BTEQ as below:

MERGE INTO <DATABASENAME>.<BASETABLENAME>_VW TGT

USING

(

<put here the SELECT statement that is using for INSERT or UPDATE the base table>

) SRC

ON TGT.PI = SRC.PI

WHEN MATCHED THEN

UPDATE

SET

TGT.FIRSTNAME=SRC.FIRSTNAME,

TGT.LASTNAME=SRC.LASTNAME,

TGT.BALANCE=SRC.BALANCE

WHEN NOT MATCHED

THEN 

INSERT (SRC.PI, SRC.FIRSTNAME, SRC.LASTNAME, SRC.BALANCE);

STEP 3) Allow end users to query from the VIEW not from the base table.

AbeK 24 posts Joined 08/09
06 Dec 2011

For consistent read, you could use the READ lock view with the MERGE statement and that should solve the problem. MERGE = Teradata mode = rowhash lock for write & READ view= committed data.

Paul Johnson 12 posts Joined 01/10
15 Dec 2011

Is there an inherent conflict between a 'real time system' and 'huge volumes'?

tsk 13 posts Joined 10/11
25 Jan 2012

Thanks all for the comments...was really helpful.... I am trying out carry forward and other approaches suggested here to figure out the optimal solution...

ToddAWalter 316 posts Joined 10/11
25 Jan 2012

There are a couple comments above about "momentary inconsistencies". These are not correct for normal updates of a rowin place - the normal cases of update or upset setting a new value for afild in te row. For all in place updates being read with an access lock you will see either the row before the update or the row after the update depending on when the query gets to that position.

The only exception to this is when the primary index is being updated which results in a plan which first deletes the old row and then inserts the new ones. If this type of update is performed, then possibly many rows will appear to be missing depending on the timing of the read. I recommend using an exclusive lock on updates that update the PI in order to hold access lock reads from seeing inconsistent data.

The other comments are correct - as soon as an insert or update row operation is performed, that row is visible to access lock queries.

ToddAWalter 316 posts Joined 10/11
25 Jan 2012

No - there is no conflict between real time and huge volumes. It is all a matter of capacity planning and service level planning as long as you have a fully scalable technology. Call detail at 10 minute latency, market basket at 15 min latency, web logs at 5 min latency and stock tiks at few second latency are all in production on Teradata.

You must sign in to leave a comment.