Learning classifiers

Creative Commons License

aGrUM

interactive online version

In [1]:
import pyAgrum.skbn as skbn
import pyAgrum.lib.notebook as gnb

import os
import pandas

skbn is a pyAgrum’s module that allows to use bayesian networks as classifier in the scikit-learn environment. ## Initialization of parameters

First, we initialize the parameters to indicate properties we want our classifier to have.

In [2]:
BNTest= skbn.BNClassifier(learningMethod = 'Chow-Liu', prior= 'Smoothing', priorWeight = 0.5,
                          discretizationStrategy = 'quantile', usePR = True, significant_digit = 13)

Then, we train the classifier thanks to two types of objects.

Learn from csv file

In [3]:
BNTest.fit(data = 'res/creditCardTest.csv', targetName = 'Class')
In [4]:
for i in BNTest.bn.nodes():
    print(BNTest.bn.variable(i))
Class:Labelized({0.0|1.0})
Time:Discretized(<(0;1578[,[1578;3733[,[3733;6982[,[6982;11033[,[11033;170348)>)
V1:Discretized(<(-30.5524;-1.33295[,[-1.33295;-0.654664[,[-0.654664;0.305375[,[0.305375;1.18346[,[1.18346;2.13239)>)
V2:Discretized(<(-25.6405;-0.362408[,[-0.362408;0.104022[,[0.104022;0.582468[,[0.582468;1.12626[,[1.12626;22.0577)>)
V3:Discretized(<(-31.1037;0.107723[,[0.107723;0.675277[,[0.675277;1.14525[,[1.14525;1.73106[,[1.73106;4.10172)>)
V4:Discretized(<(-4.65755;-0.835683[,[-0.835683;0.0334235[,[0.0334235;0.648386[,[0.648386;1.44563[,[1.44563;12.1147)>)
V5:Discretized(<(-22.1055;-0.813666[,[-0.813666;-0.355923[,[-0.355923;0.0329468[,[0.0329468;0.534605[,[0.534605;11.9743)>)
V6:Discretized(<(-7.5748;-0.789778[,[-0.789778;-0.370597[,[-0.370597;0.0353554[,[0.0353554;0.711815[,[0.711815;10.0339)>)
V7:Discretized(<(-43.5572;-0.691953[,[-0.691953;-0.264738[,[-0.264738;0.111993[,[0.111993;0.57616[,[0.57616;12.2192)>)
V8:Discretized(<(-41.0443;-0.248778[,[-0.248778;-0.0618973[,[-0.0618973;0.101159[,[0.101159;0.417327[,[0.417327;20.0072)>)
V9:Discretized(<(-13.4341;-0.258886[,[-0.258886;0.432783[,[0.432783;1.00315[,[1.00315;1.60675[,[1.60675;10.3929)>)
V10:Discretized(<(-24.5883;-0.887242[,[-0.887242;-0.486914[,[-0.486914;-0.17427[,[-0.17427;0.281998[,[0.281998;12.2599)>)
V11:Discretized(<(-2.59533;-0.21685[,[-0.21685;0.467606[,[0.467606;1.06928[,[1.06928;1.89436[,[1.89436;12.0189)>)
V12:Discretized(<(-18.6837;-2.60336[,[-2.60336;-1.98917[,[-1.98917;-1.01028[,[-1.01028;0.297745[,[0.297745;3.77484)>)
V13:Discretized(<(-3.38951;-0.277526[,[-0.277526;0.487335[,[0.487335;1.192[,[1.192;1.87168[,[1.87168;4.46541)>)
V14:Discretized(<(-19.2143;-0.198436[,[-0.198436;0.39438[,[0.39438;1.12921[,[1.12921;1.5604[,[1.5604;5.74873)>)
V15:Discretized(<(-4.49894;-0.898218[,[-0.898218;-0.252119[,[-0.252119;0.228109[,[0.228109;0.673846[,[0.673846;2.53366)>)
V16:Discretized(<(-14.1299;-0.73753[,[-0.73753;-0.191439[,[-0.191439;0.226074[,[0.226074;0.649708[,[0.649708;3.93088)>)
V17:Discretized(<(-25.1628;-0.37327[,[-0.37327;0.0631357[,[0.0631357;0.445363[,[0.445363;0.906548[,[0.906548;7.89339)>)
V18:Discretized(<(-9.49875;-0.642528[,[-0.642528;-0.179343[,[-0.179343;0.16627[,[0.16627;0.556347[,[0.556347;4.11556)>)
V19:Discretized(<(-4.93273;-0.673233[,[-0.673233;-0.228783[,[-0.228783;0.150301[,[0.150301;0.636972[,[0.636972;5.22834)>)
V20:Discretized(<(-13.276;-0.183662[,[-0.183662;-0.0677703[,[-0.0677703;0.0444333[,[0.0444333;0.232763[,[0.232763;11.059)>)
V21:Discretized(<(-22.7976;-0.298193[,[-0.298193;-0.179497[,[-0.179497;-0.0548622[,[-0.0548622;0.105119[,[0.105119;27.2028)>)
V22:Discretized(<(-8.88702;-0.649014[,[-0.649014;-0.291808[,[-0.291808;0.00962799[,[0.00962799;0.351258[,[0.351258;8.36199)>)
V23:Discretized(<(-19.2543;-0.215265[,[-0.215265;-0.0924607[,[-0.0924607;-1.53e-05[,[-1.53e-05;0.122579[,[0.122579;13.8762)>)
V24:Discretized(<(-2.51238;-0.441547[,[-0.441547;-0.0137249[,[-0.0137249;0.248364[,[0.248364;0.468669[,[0.468669;3.2002)>)
V25:Discretized(<(-4.78161;-0.238382[,[-0.238382;0.024469[,[0.024469;0.212094[,[0.212094;0.411607[,[0.411607;5.52509)>)
V26:Discretized(<(-1.33856;-0.390763[,[-0.390763;-0.124995[,[-0.124995;0.128837[,[0.128837;0.66481[,[0.66481;3.51735)>)
V27:Discretized(<(-7.9761;-0.0970824[,[-0.0970824;-0.0255692[,[-0.0255692;0.0347242[,[0.0347242;0.216123[,[0.216123;4.17339)>)
V28:Discretized(<(-3.05408;-0.0430296[,[-0.0430296;0.00633497[,[0.00633497;0.0299365[,[0.0299365;0.111331[,[0.111331;4.86077)>)
Amount:Discretized(<(0;2.78[,[2.78;11.66[,[11.66;25.52[,[25.52;73.5[,[73.5;4002.88)>)
In [5]:
gnb.sideBySide(BNTest.bn,gnb.getInference(BNTest.bn,size='15!'))
G V18 V18 V21 V21 V22 V22 V21->V22 V25 V25 V11 V11 Class Class Time Time Class->Time V28 V28 V19 V19 V27 V27 V13 V13 V16 V16 V16->V18 V3 V3 V2 V2 V7 V7 V2->V7 V1 V1 V2->V1 Amount Amount V2->Amount V5 V5 V7->V5 V8 V8 V7->V8 V20 V20 V20->V19 V1->V21 V1->V25 V1->V28 V1->V27 V1->V3 V1->V20 V23 V23 V1->V23 V4 V4 V26 V26 V4->V26 V9 V9 V9->V2 V10 V10 V9->V10 V24 V24 V6 V6 V6->V24 V8->V6 V12 V12 V12->V11 V12->V13 V15 V15 V12->V15 Time->V9 Time->V12 V17 V17 Time->V17 V14 V14 Time->V14 V17->V16 V10->V4
structs Inference in   7.68ms Class 2024-06-07T18:12:11.446891 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ Time 2024-06-07T18:12:11.544348 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ Class->Time V9 2024-06-07T18:12:11.864820 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ Time->V9 V12 2024-06-07T18:12:11.943539 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ Time->V12 V14 2024-06-07T18:12:11.996132 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ Time->V14 V17 2024-06-07T18:12:12.075903 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ Time->V17 V1 2024-06-07T18:12:11.585643 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V3 2024-06-07T18:12:11.703595 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V3 V20 2024-06-07T18:12:12.189772 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V20 V21 2024-06-07T18:12:12.216602 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V21 V23 2024-06-07T18:12:12.269799 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V23 V25 2024-06-07T18:12:12.321674 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V25 V27 2024-06-07T18:12:12.373798 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V27 V28 2024-06-07T18:12:12.399519 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V1->V28 V2 2024-06-07T18:12:11.653091 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V2->V1 V7 2024-06-07T18:12:11.811155 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V2->V7 Amount 2024-06-07T18:12:12.425482 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V2->Amount V4 2024-06-07T18:12:11.731272 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V26 2024-06-07T18:12:12.347348 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V4->V26 V5 2024-06-07T18:12:11.757929 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V6 2024-06-07T18:12:11.783990 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V24 2024-06-07T18:12:12.295332 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V6->V24 V7->V5 V8 2024-06-07T18:12:11.838384 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V7->V8 V8->V6 V9->V2 V10 2024-06-07T18:12:11.891054 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V9->V10 V10->V4 V11 2024-06-07T18:12:11.917250 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V12->V11 V13 2024-06-07T18:12:11.969981 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V12->V13 V15 2024-06-07T18:12:12.022183 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V12->V15 V16 2024-06-07T18:12:12.048427 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V18 2024-06-07T18:12:12.134641 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V16->V18 V17->V16 V19 2024-06-07T18:12:12.162402 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V20->V19 V22 2024-06-07T18:12:12.242905 image/svg+xml Matplotlib v3.9.0, https://matplotlib.org/ V21->V22
In [6]:
gnb.showBN(BNTest.MarkovBlanket)
../_images/notebooks_51-Classifier_Learning_11_0.svg

Learn from array-likes

We use a method to transform the csv file in two array-likes in order to train from the same database.

In [7]:
#we use now another method to learn the BN (MIIC)
BNTest= skbn.BNClassifier(learningMethod = 'MIIC', prior= 'Smoothing', priorWeight = 0.5,
                          discretizationStrategy = 'quantile', usePR = True, significant_digit = 13)

xTrain, yTrain = BNTest.XYfromCSV(filename = 'res/creditCardTest.csv', target = 'Class')
In [8]:
BNTest.fit(xTrain, yTrain)
In [9]:
gnb.showBN(BNTest.bn)
../_images/notebooks_51-Classifier_Learning_16_0.svg
In [10]:
gnb.showBN(BNTest.MarkovBlanket)
../_images/notebooks_51-Classifier_Learning_17_0.svg

Create a classifier from a Bayesian network

If we already have a Bayesian network with learned parameters, we can create a classifier that uses it. In this case we do not have to train the classifier on data since it the Bayesian network is already trained.

In [11]:
ClassfromBN = skbn.BNClassifier(significant_digit = 7)
In [12]:
ClassfromBN.fromTrainedModel(bn = BNTest.bn, targetAttribute = 'Class', targetModality = '1.0',
                             threshold = BNTest.threshold, variableList = xTrain.columns.tolist())
In [13]:
gnb.showBN(ClassfromBN.bn)
../_images/notebooks_51-Classifier_Learning_22_0.svg
In [14]:
gnb.showBN(ClassfromBN.MarkovBlanket)
../_images/notebooks_51-Classifier_Learning_23_0.svg

Then, we work with functions from scikit-learn like score. We can also call it with a csv file or two array-likes.

In [15]:
xTest, yTest = ClassfromBN.XYfromCSV(filename = 'res/creditCardTest.csv', target = 'Class')

Prediction for classifier

Prediction with csv file

In [16]:
scoreCSV1 = BNTest.score('res/creditCardTest.csv', y = yTest)
print("{0:.2f}% good predictions".format(100*scoreCSV1))
99.77% good predictions
In [17]:
scoreCSV2 = ClassfromBN.score('res/creditCardTest.csv', y = yTest)
print("{0:.2f}% good predictions".format(100*scoreCSV2))
99.77% good predictions

Prediction with array-like

In [18]:
scoreAR1 = BNTest.score(xTest, yTest)
print("{0:.2f}% good predictions".format(100*scoreAR1))
99.77% good predictions
In [19]:
scoreAR2 = ClassfromBN.score(xTest, yTest)
print("{0:.2f}% good predictions".format(100*scoreAR2))
99.77% good predictions

ROC and Precision-Recall curves with all methods

In addition (and of course), we can work with functions from pyAgrum (from pyAgrum.lib.bn2roc).

In [20]:
BNTest.showROC_PR('res/creditCardTest.csv')
../_images/notebooks_51-Classifier_Learning_34_0.svg
In [ ]: