An example from this question shows that SQL Server will choose a full index scan to solve a query like this:
select distinct [typeName] from [types]
Where [typeName] has a non-clustered, non-unique ascending index on it. His example has 200M rows but only 76 unique values. It seems like a seek plan would be a better choice with that density (~76 multiple binary searchs)?
His case could be normalized but the reason for the question is that I really want to solve something like this:
select TransactionId, max(CreatedUtc)
from TxLog
group by TransactionId
There is an index on (TransactionId, MaxCreatedUtc)
.
Re-writing using a normalized source (dt) does not not change the plan.
select dt.TransactionId, MaxCreatedUtc
from [Transaction] dt -- distinct transactions
cross apply
(
select Max(CreatedUtc) as MaxCreatedUtc
from TxLog tl
where tl.TransactionId = dt.TransactionId
) ca
Running just the CA subquery as a Scalar UDF does show a plan of 1 seek.
select max(CreatedUtc) as MaxCreatedUtc
from Pub.TransactionLog
where TransactionID = @TxId;
Using that Scalar UDF in the original query seems to work but loses parallelism (known issue with UDFs):
select t.typeName,
Pub.ufn_TransactionMaxCreatedUtc(t.TransactionId) as MaxCreatedUtc
from Pub.[Transaction] t
Rewriting using an Inline TVF reverts it back to the scan based plan.
From answer/comment @ypercube:
select TransactionId, MaxCreatedUtc
from Pub.[Transaction] t
cross apply
(
select top (1) CreatedUtc as MaxCreatedUtc
from Pub.TransactionLog l
where l.TransactionID = t.TransactionId
order by CreatedUtc desc
) ca
Plan looks good. No parallelism but pointless since so fast. Will have to try this on a larger problem sometime. Thanks.
Aucun commentaire:
Enregistrer un commentaire