Opinion Mining - Master DMKM
You will use 2 unsupervised methods to build cluster on the cora dataset:
- PLSA (content only)
- Louvain (graph based)
Then, we aim at measuring the similarities between the 2 results. You will use purity and Rand index.
NB: to perform the evaluation (purity), in each cluster:
- retrieve the class of the samples in this cluster
- find the most important class
- consider all samples from this class as good classified and all other samples as badly classified.
NB2: to compute the distance between the 2 clustering approaches (Rand index): refer to wikipedia: http://en.wikipedia.org/wiki/Rand_index
You can work on some methods which combine content and graph structure to perform the clustering.