lundi 2 mars 2015

MySql GROUP BY on two large joined tables


I have two tables:


Person

PersonID (PK)

GroupID

DeletedDate


PersonData

PersonID (PK)

FieldID (PK)

Value


The fields are custom-created; hence the schema. I am trying to run a histogram-type query to discover the range of values for a given FieldID together with a count of the times each value occurs. Here is the query:



SELECT Value, count(*)
FROM Person p
INNER JOIN PersonData pd ON p.PersonID=cd.PersonID
WHERE p.DeletedDate IS NULL AND GroupID=‘cAPHSWovx9d2yaN’


We use GUIDs for the GroupIDs.


Performance is fine for small groups but degrades rapidly. The group in question has 5 million records in table Person and an order of magnitude beyond that in PersonData. The above query takes an hour to run. It would be acceptable to get it down to a few minutes.


I created an index on PersonData on PersonID/FieldID/Value in that order. MySql generally chooses not to use it when I run explain. (Actual results of explain vary based on FieldID.) I also have an index on Person on GroupID/DeletedDate.


I have two questions:




  • On the explain I always get a “Using where; Using index; Using temporary; Using filesort” on the Person table. Why does MySql need to sort the Person table before joining to PersonData?




  • Why does MySql generally decline to use the PersonID/FieldID/Value index? Is there a better index I could use? I assumed I needed the PersonID in the index to help with the join.







Aucun commentaire:

Enregistrer un commentaire