mardi 24 février 2015

Is it a better idea (performance is key) to store in a normalized table and retrieve by JOIN or in a single nvarchar column and retrieve using LIKE?


NOTE: The application is not really about websites or keywords, I am using that here only to provide reference points for discussion.


I have a data set that is really just a string of words. Think of at "SEO" keywords for a website. I have a table that contains data that relates to this data. For example, say a list of websites.


The key scenario for this data set is retrieval against the master data, in two-way fashion. That is, while I can get a website with all its associated keywords, I should also be able to input a keyword and get a list of websites attached to that keyword. There is no other metadata attached (or attachable) to the keyword itself.


Now, given that query performance while retrieving the website -> keyword and keyword -> website (significantly more use cases) is paramount, which of these design scenarios makes more sense?




  1. In the websites table, I have a single nvarchar column that contains a string of all the keywords for that website, possibly comma-separated. Retrieval in this case would be using a LIKE operator on that row.




  2. I create a separate keywords table, with two columns (Id, Keyword), put all the keywords in there, and then have a third table WebsiteKeywords that contain the mappings between the websites and keywords tables. To retrieve, we do a three-way join between websites, websitekeywords and keywords tables.




The retrieval is designed to happen via web service infrastructure, so there will be multiple look ups from the middle-tier layer before it thinks it has all the data. So a single "search" fired as a call will result in multiple such look ups, all results will be aggregated by that middle tier before being returned to the caller.


What are your suggestions?





Aucun commentaire:

Enregistrer un commentaire