I have an application that writes raw log files to a table called raw in a PostgreSQL database -- lots of rows and lots of duplicate data. Via a cron job running every minute, I normalize the raw data and insert the normalized results into a table called summary, after which I truncate the new table.
The normalization is done all via SQL, but I realize that it may take some time do execute the SQL statement, during which time additional data may get inserted into the new table. This leads me to believe that said additional data inserted while the SQL statement is running may get lost when the table is truncated.
Not knowing exactly how PostgreSQL (or any other SQL-compliant database) handles access to tables when SQL statements are being executed, I'm not exactly sure what to do to ensure no data is lost. Is there a way to "move" all the data in the new table into a temporary table and perform the normalization on the data in the temporary table, leaving the new table empty until new data gets inserted (which will likely happen during the normalization of the data in the temporary table)? Perhaps there's additional/better approaches to this problem, perhaps using transaction blocks?
---- EDIT ----
It never fails... every time I write up a question on Stack Exchange, it gets me to think about the question in different ways and thus leads me to new terms to search for on Google. Happened this time too. As a result, is this answer a potential solution for what I'm wanting to do here? Is it the best solution, given it's now quite dated?
Aucun commentaire:
Enregistrer un commentaire