I have two primary instances of SQL Server, one for OLTP and one for reporting. The data warehouse for reporting is basically a Type 2 SCD: every table has DWDateStart and DWDateEnd columns and the row with the current data has DWDateEnd = '12/31/9999'. An SSIS package gets the new data from the OLTP instance using Change Tracking (whole row, not just changed values) and copies the data into staging tables in the reporting instance. Then a stored procedure executes in which the rows to be updated get a new DWDateEnd and the new values (i.e. current) from the new or updated rows get inserted. Here's an example:
UPDATE A
SET DWDateEnd = GETDATE()
FROM DWOrders A JOIN StagingOrders A1 ON A.UserId = A1.UserId AND DWDateEnd = '12/31/9999'
INSERT INTO DWOrders (OrderID, ApplicationId, UserId)
SELECT p.OrderID, p.ApplicationId, p.UserId FROM [StagingOrders] p WHERE StagingIsDeleted = 0
Here's my problem: When 30,000 rows are updated, for example, the update and insert from staging to DW can take over half an hour, whereas under normal load the whole process finishes in 40 seconds.
This happens even with tables that aren't very wide. There are lots of indexes on these tables but in my testing it seems like removing the indexes doesn't help much. I see tons of disk waits (PAGEIOLATCH_EX and SH) but my SAN is good; not SSD-type amazing, but very good. I can't redesign the database aside from trying partitioning but I worry about partitioning on DWDateEnd since a lot of rows would be moving in and out of the "active" partition all the time.
Any thoughts on what I should do to get my data updated more quickly from staging to DW?
Aucun commentaire:
Enregistrer un commentaire