Your proposed solution wouldn't guarantee data is available either. A dirty read is a read uncommitted. That means you can see changes to data before a transaction has completed (successfully or not). A multi-statement request means that each statement must complete successfully before the transaction is committed. You can still see changes to data happening with an access lock in that scenario.
One option is to use a kind of table swapping approach to ETL. Basically you'll need two copies of the table in question (A and B). A is being used by the job that is updating the data and B is being used by the job that is reading data. Once table A has been updated, you can swap the tables so that the job that is reading reads from A and the job that is writing writes to B. You'll never have to worry about getting incomplete or in-process data with this approach.
However, there are some complexities. You'll also want a process to re-sync the tables after they have been swapped so the next time your write job runs, it has the correct set of data to operate on. And you'll also need a process to handle swapping the tables (either through a view change or rename).
One way to make this easier is to use views to hide the multiple tables. That way your ETL jobs don't need to be aware of which table is which. They point to a view and the view definition can be changed to point to the other table.
Lot of our jobs needs to wait for some upstream tables that take a really long time to complete. We have some SLA's to be met
for our jobs and are OK with the data as of yesterday in these upstream tables.
Now, the problem is, the upstream tables have logics written as DELETE & INSERT (in the same BTEQ).
delete from DBNAME.TABLENAME
where UPI_COLUMNNAME in
( select UPI_COLUMNNAME
from DBNAME.INCR_DELTA_TABLENAME
)
;
insert into DBNAME.TABLENAME
(
COLUMN1,
COLUMN2
etc.,
)
This presents a serious risk, because when our ETL process is running we may not see a bunch of records in the table
(if the script that refreshes the upstream table is running concurrently). To avoid this, is it OK to re-write the
upstream scripts as below (modify it as a multi statement request)
delete from DBNAME.TABLENAME
where UPI_COLUMNNAME in
( select UPI_COLUMNNAME
from DBNAME.INCR_DELTA_TABLENAME
)
; insert into DBNAME.TABLENAME
(
);
Could anyone let us know whether this would solve the problem. Also please suggest any other way to mitigate this problem.
P.S: We are accessing the upstream tables through views which places access lock on the rows.