Teaching - DMKM


Opinion Mining - Master DMKM

Useful link: Patrick's slides pages: 25-27 in particular

 What was the new idea in Pagerank (with respect to the other systems in 1998)?
 How is it implemented?

Download

Download a small web graph like hollins.dat and load the data in octave format using loaddat.m

Data and algorithms are here

Preliminary works

Check the loaded data using whos command

Then, proceed to a rapid analysis of the data: compute the top10 of pages that contain a lot of incoming links, mean incoming links per pages…

Pagerank

Once, this preliminary work is done, apply PageRank on this data. Compare the resulting authorities with your preliminary work.

Give a quick implementation of the hits algorithm (slide 48 in P. Gallinari’s slides here). Explain the difference of ranking of the different methods.

Optional work:

  • Define and understand the keyword webspam. Propose some basic algorithms to eliminate some webspam pages.
  • Find some new dataset on the web and apply those algorithms on it to extract particular information.
  • Download Gephi and visualize some graphical data