I have rows of the following type:
identifier (string), type (int), date (date), count (int), extra info
Rows have a unique constraint on (identifier, type, date)
.
I want to perform the following online queries:
- all entries, ordered by date
- all entries for identifier, ordered by date
- all entries for type, ordered by date
These are not log files, but they are similar. Data is inserted in batches, not in a continuous stream. It is not inserted in any particular order. I want a database for querying this data.
I had some success using MySQL MyISAM tables, but now I've got to 300 million entries it's getting unwieldy and I expect to reach at least 800 million.
The nature of the queries mean that I need indexes on identifier
, type
and to sort by date
.
I've considered perhaps using MongoDB and de-normalising so I'm storing the data once per query type (i.e. tables for per-identifier
, per-type
).
The only snag is that I need to enforce the uniqueness constraint, which may involve over-writing duplicate values.
Aucun commentaire:
Enregistrer un commentaire