Reading the official PostgreSQL documentation for version 9.0 I read an interesting escamotage that performs better than CLUSTER
for big tables:
The CLUSTER command reorders the original table by scanning it using the index you specify. This can be slow on large tables because the rows are fetched from the table in index order, and if the table is disordered, the entries are on random pages, so there is one disk page retrieved for every row moved. (PostgreSQL has a cache, but the majority of a big table will not fit in the cache.) The other way to cluster a table is to use:
CREATE TABLE newtable AS
SELECT * FROM table ORDER BY columnlist;which uses the PostgreSQL sorting code to produce the desired order; this is usually much faster than an index scan for disordered data. Then you drop the old table, use ALTER TABLE ... RENAME to rename newtable to the old name, and recreate the table's indexes. The big disadvantage of this approach is that it does not preserve OIDs, constraints, foreign key relationships, granted privileges, and other ancillary properties of the table — all such items must be manually recreated. Another disadvantage is that this way requires a sort temporary file about the same size as the table itself, so peak disk usage is about three times the table size instead of twice the table size.
The problem is that this suggestion doesn't appear in > 9.0 versions of the official documentation.
My question is if this escamotage is still valid for 9.1, 9.2, 9.3 and 9.4 because I'm stucked with a CLUSTER
operation over two big tables (on of ~750milion rows and the other of ~1650milion rows) and average disk write/read speed is 3MB/s due to the CLUSTER algorithm explained in the official doc. It's a slow process over big tables, so I'd like to avoid it doing the "create ordered table over index-associated-column" trick. This will saves me days of DB processing.
Aucun commentaire:
Enregistrer un commentaire