dimanche 22 février 2015

Finding the current prices for n Fuelstations at a certain point in time


I have a Table where price information is stored in with approx 13 million rows stored in a PostgreSQL 9.5 database.



CREATE TABLE public.de_tt_priceinfo (
id integer NOT NULL DEFAULT nextval('priceinfo_id_seq'::regclass),
station_id character varying(60),
recieved timestamp with time zone NOT NULL DEFAULT now(),
e5 numeric(4,3),
e10 numeric(4,3),
diesel numeric(4,3),
CONSTRAINT de_tt_priceinfo_id_pkey PRIMARY KEY (id)
);

CREATE INDEX de_tt_priceinfo_recieved_station_id_idx
ON public.de_tt_priceinfo (recieved, station_id COLLATE pg_catalog."default");

CREATE INDEX index_station_id
ON public.de_tt_priceinfo (station_id COLLATE pg_catalog."default");


From this table I need to extract the latest prices at a certain point in time with maximum performance, since I have to simulate 32 million commuters which query this table (not at once, but still).


I have a working query!



SELECT station_id, e5, e10, diesel, recieved FROM de_tt_priceinfo a
WHERE a.recieved = (SELECT MAX(recieved) FROM de_tt_priceinfo b
WHERE a.station_id = b.station_id
AND recieved <= '2014-09-25 08:45:12'::TIMESTAMPTZ)
AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
, '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
, 'F5137488-a-b-c-d')


The performance of this Query is not usable. Execution time is varying around 900ms. The result looks like this



0C91A93A-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:17:50.000000"
578C44BB-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:00:09.000000"
6F2F48A8-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 07:08:57.000000"
9982BE74-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:29:55.000000"
A24C612B-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:00:09.000000"
BEC3EF55-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 06:53:49.000000"
F5137488-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 07:44:55.000000"


So I searched around a little bit and found buzz words like recursive CTE, loose indexscan and some answers on DBA which seemed very close, but I could not modify them to my needs.



If I understood right, a recursive CTE would be the quickest way to query the data I want.


What I got so far is this:



WITH RECURSIVE cte AS (
(
SELECT station_id, e5, e10, diesel, recieved
FROM de_tt_priceinfo
WHERE recieved <= '2014-09-25 08:45:00'::TIMESTAMPTZ
AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
, '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
, 'F5137488-a-b-c-d')
ORDER BY station_id, recieved DESC NULLS LAST
LIMIT 1
)
UNION ALL
(
SELECT u.station_id, u.e5, u.e10, u.diesel, u.recieved
FROM cte c
JOIN de_tt_priceinfo u ON u.recieved > c.recieved
WHERE u.recieved <= '2014-09-25 08:45:00'::TIMESTAMPTZ -- repeat condition!
AND u.station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
, '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
, 'F5137488-a-b-c-d')
ORDER BY u.station_id, u.recieved DESC NULLS LAST LIMIT 1
)
)
SELECT * FROM cte;


But this just returns the following two lines:



0C91A93A-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:17:50.000000"
9982BE74-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:29:55.000000"


Update:



  • SELECT Version(); PostgreSQL 9.5devel on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3, 64-bit

  • EXPLAIN ANALYSE: http://ift.tt/1B9KE1p

  • XEON 1231v3, 16 GB Ram, Samsung 840 PRO SSD

  • Changes to the default postgresql.conf



# Connection
listen_addresses = '*'
max_connections = 16

# Logging
log_destination = 'csvlog'
log_directory = 'pg_log'
logging_collector = on
log_filename = 'postgres-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 1GB
log_min_duration_statement = 500ms
#log_checkpoints = on
#log_connections = on
#log_disconnections = on
log_lock_waits = on
#log_temp_files = 0

# Memory
shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 256MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB

# Checkpoint ( When to write to disk )
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
checkpoint_segments = 32

random_page_cost = 1.1

# Import only!
#autovacuum = off
fsync = off
synchronous_commit = off
full_page_writes = off



Aucun commentaire:

Enregistrer un commentaire