My question is about 3 tables that stores data of a single entity, and there is a report generation feature that refers these tables and calculate COUNT
and AVG
information.
There are three tables: review
, rating
, and rating_values
.
Review table
CREATE TABLE IF NOT EXISTS `review` (
`review_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Unique review ID of this review.',
`nid` int(10) unsigned NOT NULL,
`sub_target` int(10) unsigned NOT NULL,
`timestamp` int(10) unsigned NOT NULL DEFAULT '0' COMMENT,
`score` float DEFAULT NULL COMMENT,
PRIMARY KEY (`review_id`),
KEY `nid` (`nid`),
KEY `uid` (`uid`),
KEY `nid_uid` (`nid`,`uid`),
KEY `score` (`score`),
KEY `timestamp` (`timestamp`),
);
Review table acts as the the primary table. Each review
can have 0 or more (we have a limit of 18 at the moment in application level) rating
values. A rating
value is something like "Food", "Location", etc for a restaurant review.
Rating table
'rating' (
`review_id` int(10) unsigned NOT NULL, //foreign key to {review}.review_id
`type` varchar(255) NOT NULL,
`score` float DEFAULT NULL,
KEY `review_id` (`review_id`),
KEY `score` (`score`)
);
In this table, type
is a machine name of the "food", "location", etc I mentioned above. There is a foreign key to the base table that stores these rating type information. In our application, each rating can have multiple rating values that count towards the main rating. Say, there is "food", and there can multiple sub questions such as "food - appearance", "food - taste". These values can be 0 or more. If there are more than 1 sub question, this score
field contains the average of the sub questions (please see the table below).
Rating value table
'rating_value' (
`review_id` int(10) unsigned NOT NULL,// foreign key to the {review}.review_id
`type` varchar(255) NOT NULL,
`sid` int(10) unsigned DEFAULT NULL,
`score` int(10) unsigned DEFAULT NULL,
KEY `review_id` (`review_id`),
KEY `type` (`type`)
)
This is where I expect most of the data would go into. Each sub question has an sid
that is referred here.
A typical review would add rows like this:
Review:
review_id | nid | sub_target | timestamp | score
1 | 583 | 786| 1388167200| 4.32
Rating (please note how the two floats add up to 4.32 above):
review_id|type |score
1 |food |2.10
1 |location |2.22
Rating value:
review_id|type |sid|score
1 |food |1 |2.00
1 |food |1 |2.30
1 |location |2 |2.12
1 |location |3 |2.22
1 |location |3 |2.22
- We will be generating reports with
COUNT
andAVG
onrating_value
tables. However, there will be always aJOIN
withreview
table, and the filter would bereview.nid
, which is indexed. - When accessing reviews, application will almost every time use the
review_id
, which are either primary or foreign keys in all tables. - Sometimes we need to
CAST
thescore
field to nearest integer.
Admittedly I'm not an expert in databases, but I spent a lot of time normalizing the above, and I ended up with the above database after all three steps. I'm eager to learn, so if you have suggestions to above table structure, please don't hesitate to mention them. Note that all review
and rating
tables will contain some extra data that I didn't mention here. the rating
table has a text field to allow users to write a text review. That's why rating values and rating
are separated.
The reports we generate are never going to run on all rows. Just a small set of rows filtered by the nid
(plus additional filters).
My questions are:
Is it common to spread data of a single user operation across multiple rows? I expect the
rating_value
table to grow almost 10 times (in record count) than thereview
table.While (inner)
JOIN
ing tables, and calculating averages, can database engines filter down the eligible rows using the indexes first?If you have any thoughts about the above schema, please don't hesitate to comment on.
Aucun commentaire:
Enregistrer un commentaire