vendredi 27 mars 2015

Is it possible to get seek based parallel plan for distinct/group by?


An example from this question shows that SQL Server will choose a full index scan to solve a query like this:



select distinct [typeName] from [types]


Where [typeName] has a non-clustered, non-unique ascending index on it. His example has 200M rows but only 76 unique values. It seems like a seek plan would be a better choice with that density (~76 multiple binary searchs)?


His case could be normalized but the reason for the question is that I really want to solve something like this:



select TransactionId, max(CreatedUtc)
from TxLog
group by TransactionId


There is an index on (TransactionId, MaxCreatedUtc).


Re-writing using a normalized source (dt) does not not change the plan.



select dt.TransactionId, MaxCreatedUtc
from [Transaction] dt -- distinct transactions
cross apply
(
select Max(CreatedUtc) as MaxCreatedUtc
from TxLog tl
where tl.TransactionId = dt.TransactionId
) ca


Running just the CA subquery as a Scalar UDF does show a plan of 1 seek.



select max(CreatedUtc) as MaxCreatedUtc
from Pub.TransactionLog
where TransactionID = @TxId;


Using that Scalar UDF in the original query seems to work but loses parallelism (known issue with UDFs):



select t.typeName,
Pub.ufn_TransactionMaxCreatedUtc(t.TransactionId) as MaxCreatedUtc
from Pub.[Transaction] t


Plans for Cross apply, Just the UDF, Using UDF


Rewriting using an Inline TVF reverts it back to the scan based plan.


From answer/comment @ypercube:



select TransactionId, MaxCreatedUtc
from Pub.[Transaction] t
cross apply
(
select top (1) CreatedUtc as MaxCreatedUtc
from Pub.TransactionLog l
where l.TransactionID = t.TransactionId
order by CreatedUtc desc
) ca


Plan using top/order


Plan looks good. No parallelism but pointless since so fast. Will have to try this on a larger problem sometime. Thanks.





Aucun commentaire:

Enregistrer un commentaire