lundi 9 février 2015

Distributed Database Query on Homogeneous Data


We have a medium-sized database table (6.5 Million rows) that is used to drive reports (thus, it is read only and does not need to support OLTP). It is partitioned on a YEAR field, which gives us even distribution of records and quick results, so long as YEAR is included as a filter....up until now this has always been the case.


The client now has requested that virtually every column in this table be filter-able. Performing ETL to pre-aggregate is not a great option because there are so many possible filter combinations in play.


In theory, I would love to distribute the data across homogeneous 'nodes', one for each year. Each 'node' would be its own machine hosting a year's worth of data. Queries might then be executed in parallel against each machine, with each returning a data table that can then be aggregated and the desired result produced.


Basically, I want some of the big-data-ish features for parallel processing of tasks without losing the advantages of a traditional RDMS system.


Is this a pipe dream or does a framework for something like this exist? Or should someone take me and my love for relational data out behind the shed and shoot me?





Aucun commentaire:

Enregistrer un commentaire