#### Teaching - DMKM

## Opinion Mining - Master DMKM

Useful link: Patrick's slides pages: 25-27 in particular

What was the new idea in Pagerank (with respect to the other systems in 1998)? How is it implemented?

### Download

Download a small web graph like `hollins.dat`

and load the data in octave format using `loaddat.m`

Data and algorithms are here

### Preliminary works

Check the loaded data using `whos`

command

Then, proceed to a rapid analysis of the data: compute the top10 of pages that contain a lot of incoming links, mean incoming links per pages…

### Pagerank

Once, this preliminary work is done, apply PageRank on this data. Compare the resulting authorities with your preliminary work.

Give a quick implementation of the hits algorithm (slide 48 in P. Gallinari’s slides here). Explain the difference of ranking of the different methods.

### Optional work:

- Define and understand the keyword
**webspam**. Propose some basic algorithms to eliminate some webspam pages. - Find some new dataset on the web and apply those algorithms on it to extract particular information.
- Download Gephi and visualize some graphical data