mercredi 3 décembre 2014

Can an EXISTS clause be improved when the condition is checked for the same value multiple times?


Assume that I'm modelling a Blog engine. Blogs have n Posts. Each User can be granted a role on different Blogs (Admin, Approver or Editor stored as enums/ints). Each Post can have a State (Draft, PendingApproval, Published stored as enums/ints). Here's how it's modeled:



Blogs
-----
Id (PK, int, not-null)
Name (varchar, not-null)

Users
-----
Id (PK, int, not-null)
Name (varchar, not-null)

Posts
-----
Id (PK, int, not-null)
BlogId (FK, int, not-null)
State (int, not-null)
Title (varchar, not-null)
Body (varchar, not-null)
(and there is an index on (BlogId, State))

UserBlogRoles
-------------
Id (PK, int, not-null)
UserId (FK, int, not-null)
BlogId (FK, int, not-null)
Role (int, not-null)
(and there is a unique index on (UserId, BlogId))


And here are the enums:



UserBlogRole
------------
Editor = 0
Approver = 1
Admin = 2

PostState
---------
Draft = 0
PendingApproval = 1
Published = 2


As an Approver, I want to view a list of all Posts that have the State = PendingApproval for all blogs that I have at least the Approver role.


Here's the query:



declare @approverRole int = 1;
declare @pendingApprovalState int = 1;
declare @myUserId int = 12345;

select
p.Id, p.BlogId, p.State, p.Title, p.Body
from Posts as p
where
p.State = @pendingApprovalState and
exists (
select r.Id
from UserBlogRoles as r
where r.Role >= @approverRole and r.BlogId = p.BlogId and r.UserId = @myUserId
)


This works as expected. But, I've been wondering about the performance of the exists(). If the same values are passed to the exists() multiple times, is it cached? For example, there could be many pending approval posts for a particular blog. Will it need to check the permissions for each of those posts and their associated blogs?


I've been thinking of changing it to a where in (query) instead:



select
p.Id, p.BlogId, p.State, p.Title, p.Body
from Posts as p
where
p.State = @pendingApprovalState and
p.BlogId in (
select r.BlogId
from UserBlogRoles as r
where r.Role >= @approverRole and r.UserId = @myUserId
)


But I'm not sure if that'd be better or worse performance? Does SqlServer execute the "in" query once and then use those values to check against the BlogId?


I guess this could be changed into a join too.


The execution plans of both showed the same (50% index scan on Posts PK and 50% index scan on UserBlogRoles PK) I think because my sample database doesn't have any data in it.





Aucun commentaire:

Enregistrer un commentaire