Text classification software (link)

(1. Pre-Processing -> 2. Dictionary -> 3. Data Processing -> 4. Learning a model)

Learning step

This stem aims at learning a model from a numerical corpus, it relies on a regularized perceptron.

Usage:

 java -jar bin/classif2012_classif.jar -fileSavePerf=perf.txt 
 -fileSaveModel=model.txt -corpusNumFile=corpusnum.txt 
 -labelNumFile=corpusnumL.txt -nIterations=200 -step=1e-3 -loss=hinge 
 -lambdaL1=0 -lambdaL2=0 -rateLearn=0.9 -adapt=false

Arguments:

INPUTS

  • -corpusNumFile=path to the numerical corpus (BOW)
  • -labelNumFile=path to the labels
  • -nIterations=maximum number of iterations (some early stopping criteria are implemented)
  • -step=gradient step (1e-3 by default)
  • -loss=hinge/leastsquare
  • -lambdaL1= -lambdaL2= L1 and L2 regularizations
  • -rateLearn= percentage of data used for training the model (0<x<1)
  • -adapt=BOOLEAN special regularization for sentiment classification

OUTPUTS

  • -fileSaveModel
  • -fileSavePerf=