Clustering for Information Retrieval, Recommender Systems


What is Clustering?

Definition:The fundamental task of grouping a set of objects (documents) such that objects in the same group (cluster) are more similarto each other than to those in other groups.

•It is the most prevalent form of unsupervised learningin IR.

Goal:To achieve high intra-cluster similarity(documents inside a cluster are highly similar) and high inter-cluster dissimilarity(documents in different clusters are highly dissimilar).

Contrast with Classification:

Classificationis supervised: documents are assigned to predefined, known classes (labels).

Clusteringis unsupervised: the classes (clusters) are discovered fromthe data.


Recommender Systems