I have a Table where price information is stored in with approx 13 million rows stored in a PostgreSQL 9.5 database.
CREATE TABLE public.de_tt_priceinfo (
id integer NOT NULL DEFAULT nextval('priceinfo_id_seq'::regclass),
station_id character varying(60),
recieved timestamp with time zone NOT NULL DEFAULT now(),
e5 numeric(4,3),
e10 numeric(4,3),
diesel numeric(4,3),
CONSTRAINT de_tt_priceinfo_id_pkey PRIMARY KEY (id)
);
CREATE INDEX de_tt_priceinfo_recieved_station_id_idx
ON public.de_tt_priceinfo (recieved, station_id COLLATE pg_catalog."default");
CREATE INDEX index_station_id
ON public.de_tt_priceinfo (station_id COLLATE pg_catalog."default");
From this table I need to extract the latest prices at a certain point in time with maximum performance, since I have to simulate 32 million commuters which query this table (not at once, but still).
I have a working query!
SELECT station_id, e5, e10, diesel, recieved FROM de_tt_priceinfo a
WHERE a.recieved = (SELECT MAX(recieved) FROM de_tt_priceinfo b
WHERE a.station_id = b.station_id
AND recieved <= '2014-09-25 08:45:12'::TIMESTAMPTZ)
AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
, '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
, 'F5137488-a-b-c-d')
The performance of this Query is not usable. Execution time is varying around 900ms. The result looks like this
0C91A93A-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:17:50.000000"
578C44BB-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:00:09.000000"
6F2F48A8-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 07:08:57.000000"
9982BE74-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:29:55.000000"
A24C612B-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:00:09.000000"
BEC3EF55-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 06:53:49.000000"
F5137488-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 07:44:55.000000"
So I searched around a little bit and found buzz words like recursive CTE, loose indexscan and some answers on DBA which seemed very close, but I could not modify them to my needs.
- How do I efficiently get "the most recent corresponding row"?
- GROUP BY one column, while sorting by another in PostgreSQL
If I understood right, a recursive CTE would be the quickest way to query the data I want.
What I got so far is this:
WITH RECURSIVE cte AS (
(
SELECT station_id, e5, e10, diesel, recieved
FROM de_tt_priceinfo
WHERE recieved <= '2014-09-25 08:45:00'::TIMESTAMPTZ
AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
, '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
, 'F5137488-a-b-c-d')
ORDER BY station_id, recieved DESC NULLS LAST
LIMIT 1
)
UNION ALL
(
SELECT u.station_id, u.e5, u.e10, u.diesel, u.recieved
FROM cte c
JOIN de_tt_priceinfo u ON u.recieved > c.recieved
WHERE u.recieved <= '2014-09-25 08:45:00'::TIMESTAMPTZ -- repeat condition!
AND u.station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
, '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
, 'F5137488-a-b-c-d')
ORDER BY u.station_id, u.recieved DESC NULLS LAST LIMIT 1
)
)
SELECT * FROM cte;
But this just returns the following two lines:
0C91A93A-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:17:50.000000"
9982BE74-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:29:55.000000"
Update:
SELECT Version();PostgreSQL 9.5devel on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3, 64-bit- EXPLAIN ANALYSE: http://ift.tt/1B9KE1p
- XEON 1231v3, 16 GB Ram, Samsung 840 PRO SSD
- Changes to the default postgresql.conf
# Connection
listen_addresses = '*'
max_connections = 16
# Logging
log_destination = 'csvlog'
log_directory = 'pg_log'
logging_collector = on
log_filename = 'postgres-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 1GB
log_min_duration_statement = 500ms
#log_checkpoints = on
#log_connections = on
#log_disconnections = on
log_lock_waits = on
#log_temp_files = 0
# Memory
shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 256MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB
# Checkpoint ( When to write to disk )
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
checkpoint_segments = 32
random_page_cost = 1.1
# Import only!
#autovacuum = off
fsync = off
synchronous_commit = off
full_page_writes = off
Aucun commentaire:
Enregistrer un commentaire