Thursday, August 14, 2014

Content based filtering

In CBF, the minimum distance between 2 vectors in the vector space model denotes the vector pair with maximum similarity. Distance between 2 vectors is denoted by Cosine of the angle between 2 or more vectors. Both user tastes and item descriptions can be modeled as vectors in a vector space model. In the Vector space model (also see Dot product), each keyword is one dimension. Each dimension is weighted using TFIDF/ BM25, etc.

Keyword weighting: TF- IDF & BM25 ( link1 + wikipedia definition)