I have two tables:
Person
PersonID (PK)
GroupID
DeletedDate
PersonData
PersonID (PK)
FieldID (PK)
Value
The fields are custom-created; hence the schema. I am trying to run a histogram-type query to discover the range of values for a given FieldID together with a count of the times each value occurs. Here is the query:
SELECT Value, count(*)
FROM Person p
INNER JOIN PersonData pd ON p.PersonID=cd.PersonID
WHERE p.DeletedDate IS NULL AND GroupID=‘cAPHSWovx9d2yaN’
GROUP BY Value
We use GUIDs for the GroupIDs.
Performance is fine for small groups but degrades rapidly. The group in question has 5 million records in table Person and an order of magnitude beyond that in PersonData. The above query takes an hour to run. It would be acceptable to get it down to a few minutes.
I created an index on PersonData on PersonID/FieldID/Value in that order. MySql generally chooses not to use it when I run explain. (Actual results of explain vary based on FieldID.) I also have an index on Person on GroupID/DeletedDate.
I have two questions:
On the explain I always get a “Using where; Using index; Using temporary; Using filesort” on the Person table. Why does MySql need to sort the Person table before joining to PersonData?
Why does MySql generally decline to use the PersonID/FieldID/Value index? Is there a better index I could use? I assumed I needed the PersonID in the index to help with the join.
Edit Here is the relevant portion of the explain:
id select_type table type key key_len ref rows Extra
1 SIMPLE p ref deleted_person 67 const,const 3476105 Using where; Using index; Using temporary; Using filesort
1 SIMPLE pd ref person_field_val 62 p.PersonID 10 Using index
Edit 2 Forgot the GROUP BY in the query above (face palm)
Aucun commentaire:
Enregistrer un commentaire