Naive modeling of credit defaults using a Markov Random Field

Creative Commons License

aGrUM

interactive online version

This notebook is the adaptation to pyAgrum of the model proposed by Gautier Marti which is itself inspired by the paper Graphical Models for Correlated Defaults (adaptation by Marvin Lasserre and Pierre-Henri Wuillemin).

In [1]:
import numpy as np
import matplotlib.pyplot as plt

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.mrf2graph as m2g

Constructing the model

Three sectors co-dependent in their stress state are modelled: (Finance, Energy, Retail). For each of these sectors, we consider a universe of three issuers.

Within Finance

Within Energy

Within Retail

Deutsche Bank

EDF

New Look

Metro Bank

Petrobras

Matalan

Barclays

EnQuest

Marks & Spencer

The probability of default of these issuers is partly idiosyncratic and partly depending on the stress within their respective sectors. Some distressed names such as New Look can have a high marginal probability of default, so high that the state of their sector (normal or stressed) does not matter much. Some other high-yield risky names such as Petrobras may strongly depend on how the whole energy sector is doing. On the other hand, a company such as EDF should be quite robust, and would require a very acute and persistent sector stress to move its default probability significantly higher.

We can encode this basic knowledge in terms of potentials:

image-2.png

Conventions:

Sector variables can be either in the state no stress or in the state under stress while issuer variables can be either in the state no default or in the state default.

The choice of these potentials is not an easy task; It can be driven by historical fundamental or statistical relationships or forward-looking views.

Given a network structure, and a set of potentials, one can mechanically do inference using the pyAgrum library.

We will show how to obtain a distribution of the number of defaults (and the expected number of defaults), a distribution of the losses (and the expected loss). From the joint probability table, we could also extract the default correlation

tl;dr We use here the library pyAgrum to illustrate, on a toy example, how one can use a Markov Random Field (MRF) for modelling the distribution of defaults and losses in a credit portfolio.

Building accurate PGM/MRF models require expert knowledge for considering a relevant graph structure, and attributing useful potentials. For this toy model, we use the following structure:

image.png

The first step consists in creating the model. For that, we use the pyAgrum class pyAgrum.MarkovRandomField.

In [2]:
# building nodes (with types)
mn = gum.MarkovRandomField('Credit default modeling')

# Adding sector variables
sectors = ['Finance', 'Energy', 'Retail']
finance, energy, retail = [mn.add(gum.LabelizedVariable(name, '', ['no stress', 'under stress']))
                           for name in sectors]

# Adding issuer variables
edf, petro, eq = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['EDF','Petrobras','EnQuest']]

mb, db, ba = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['Metro Bank', 'Deutsche Bank', 'Barclays']]

nl, matalan, ms = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['New Look', 'Matalan', 'Marks & Spencer']]
In [3]:
# Adding and filling factors between sectors
mn.addFactor([finance, energy])[:] = [[90, 70],
                                      [60, 10]]
mn.addFactor([finance, retail])[:] = [[80, 10],
                                      [30, 80]]
mn.addFactor([energy, retail])[:]  = [[60,  5],
                                      [70, 95]]

# Adding factors between sector and issuer
mn.addFactor([db, finance])[:] = [[90, 30],
                                  [10, 60]]
mn.addFactor([mb, finance])[:] = [[80, 40],
                                  [ 5, 60]]
mn.addFactor([ba, finance])[:] = [[90, 20],
                                  [20, 50]]

mn.addFactor([edf, energy])[:]   = [[90, 5],
                                    [80, 40]]
mn.addFactor([petro, energy])[:] = [[60, 50],
                                    [ 5, 60]]
mn.addFactor([eq, energy])[:]    = [[80, 20],
                                    [10, 50]]


mn.addFactor([nl, retail])[:]      = [[ 5, 60],
                                      [ 2, 90]]
mn.addFactor([matalan, retail])[:] = [[40, 30],
                                      [20, 70]]
mn.addFactor([ms, retail])[:]       = [[80, 10],
                                      [30, 50]]
In [4]:
nodetypes={n:0.1 if n in sectors else
             0.2 for n in mn.names()}

gnb.sideBySide(gnb.getMRF(mn,view='graph',nodeColor=nodetypes),
               gnb.getMRF(mn,view='factorgraph',nodeColor=nodetypes),
              captions=['The model as a Markov random field','The model as a factor graph'])

gnb.sideBySide(mn.factor({'Energy', 'Finance'}),
               mn.factor({'Finance', 'Retail'}),
               mn.factor({'Energy', 'Retail'}),
               mn.factor({'Deutsche Bank', 'Finance'}),
               mn.factor({'Metro Bank', 'Finance'}),
               mn.factor({'Barclays', 'Finance'}),
               mn.factor({'EDF', 'Energy'}),
               mn.factor({'Petrobras', 'Energy'}),
               mn.factor({'EnQuest', 'Energy'}),
               mn.factor({'New Look', 'Retail'}),
               mn.factor({'Matalan', 'Retail'}),
               mn.factor({'Marks & Spencer', 'Retail'}),
              ncols=3)
G Metro Bank Metro Bank Matalan Matalan Retail Retail Retail--Matalan New Look New Look Retail--New Look Marks & Spencer Marks & Spencer Retail--Marks & Spencer Finance Finance Finance--Metro Bank Finance--Retail Energy Energy Finance--Energy Deutsche Bank Deutsche Bank Finance--Deutsche Bank Barclays Barclays Finance--Barclays Petrobras Petrobras EDF EDF Energy--Retail Energy--Petrobras Energy--EDF EnQuest EnQuest Energy--EnQuest
The model as a Markov random field
G Metro Bank Metro Bank Matalan Matalan Retail Retail New Look New Look Finance Finance Petrobras Petrobras EDF EDF Energy Energy Deutsche Bank Deutsche Bank Marks & Spencer Marks & Spencer EnQuest EnQuest Barclays Barclays f0#1 f0#1--Finance f0#1--Energy f1#2 f1#2--Retail f1#2--Energy f0#7 f0#7--Finance f0#7--Deutsche Bank f1#4 f1#4--Petrobras f1#4--Energy f2#9 f2#9--Retail f2#9--New Look f2#11 f2#11--Retail f2#11--Marks & Spencer f0#2 f0#2--Retail f0#2--Finance f0#6 f0#6--Metro Bank f0#6--Finance f0#8 f0#8--Finance f0#8--Barclays f1#3 f1#3--EDF f1#3--Energy f1#5 f1#5--Energy f1#5--EnQuest f2#10 f2#10--Matalan f2#10--Retail
The model as a factor graph
Finance
Energy
no stress
under stress
no stress
90.000070.0000
under stress
60.000010.0000
Finance
Retail
no stress
under stress
no stress
80.000010.0000
under stress
30.000080.0000
Energy
Retail
no stress
under stress
no stress
60.00005.0000
under stress
70.000095.0000
Deutsche Bank
Finance
no default
default
no stress
90.000030.0000
under stress
10.000060.0000
Metro Bank
Finance
no default
default
no stress
80.000040.0000
under stress
5.000060.0000
Barclays
Finance
no default
default
no stress
90.000020.0000
under stress
20.000050.0000
EDF
Energy
no default
default
no stress
90.00005.0000
under stress
80.000040.0000
Petrobras
Energy
no default
default
no stress
60.000050.0000
under stress
5.000060.0000
EnQuest
Energy
no default
default
no stress
80.000020.0000
under stress
10.000050.0000
New Look
Retail
no default
default
no stress
5.000060.0000
under stress
2.000090.0000
Matalan
Retail
no default
default
no stress
40.000030.0000
under stress
20.000070.0000
Marks & Spencer
Retail
no default
default
no stress
80.000010.0000
under stress
30.000050.0000

Making inferences

Example 1 & 2

The first two examples consists in finding the chance of EDF to default with no evidence and knowing that the energy sector is not under stress. For that, we can use the getPosterior method:

In [5]:
gnb.sideBySide(gum.getPosterior(mn, target='EDF', evs={}),
               gum.getPosterior(mn, target='EDF', evs={'Energy':'no stress'}),
               captions=['$P(EDF)$','$P(EDF|Energy=$under stress$)$'])
EDF
no default
default
0.90720.0928

$P(EDF)$
EDF
no default
default
0.94740.0526

$P(EDF|Energy=$under stress$)$

Hence, if we cannot observe the state of the energy sector, then EDF has a higher chance of defaulting (9%) as it is possible that the energy sector is under stress.

Example 3 & 4

Given that Marks & Spencer (a much safer name) has defaulted, New Look has now an increased chance of defaulting: 97%.

Even if the retail sector is doing ok, New Look is a distressed name with a high probability of default (92%).

In [6]:
gnb.sideBySide(gum.getPosterior(mn, target='New Look', evs={'Marks & Spencer': 'default'}),
               gum.getPosterior(mn, target='New Look', evs={'Retail':'no stress'}),
               captions=['P(New Look|Marks & Spencer=default)',
                         'P(New Look | Retail = no stress)'])
New Look
no default
default
0.02860.9714

P(New Look|Marks & Spencer=default)
New Look
no default
default
0.07690.9231

P(New Look | Retail = no stress)

pyAgrum offers the option to vizualise the marginal probability of each variable using showInference.

In [7]:
#gum.config['factorgraph','edge_length_inference']=1.
gnb.showInference(mn, size='7!',nodeColor=nodetypes)
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_28_0.svg

Impact of defaults on a credit portfolio and the distribution of losses

Amongst the 9 available issuers, let’s consider the following portfolio containing bonds from these 6 issuers:

In [8]:
portfolio_names = ['EDF', 'Petrobras', 'EnQuest', 'Matalan', 'Barclays']
portfolio_exposures = [2e6, 400e3, 300e3, 600e3, 3e6]
LGD = 0.9
losses = {portfolio_names[i]:(portfolio_exposures[i]*LGD) for i in range(len(portfolio_names))}

print('** Total notional: USD {} MM'.format(sum(portfolio_exposures)/1e6))
** Total notional: USD 6.3 MM

We have a USD 6.3MM total exposure to these names (notional).

We assume that in case of default, we recover only 10% (LGD = 90%).

Let’s start using the model:

In [9]:
ie = gum.ShaferShenoyMRFInference(mn)
ie.makeInference()
p=ie.jointPosterior(set(sectors))

print(f'** There is {100*p[{"Retail":0,"Energy":0,"Finance":0}]:.2f}% chance that no sector in under stress.')
print()
p
** There is 42.38% chance that no sector in under stress.

Out[9]:
Retail
Energy
Finance
no stress
under stress
no stress
no stress
0.42380.2999
under stress
0.00830.1251
under stress
no stress
0.01050.1215
under stress
0.00000.0109

If we observe that the finance sector is doing ok, and that Marks & Spencer is still doing business as usual, then this is the current joint probabilities associated to the potential defaults in the portfolio:

In [10]:
ie = gum.ShaferShenoyMRFInference(mn)
ie.addJointTarget(set(portfolio_names))
ie.setEvidence({'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ie.makeInference()
portfolio_posterior = ie.jointPosterior(set(portfolio_names))

Note that the line ie.addJointTarget(set(portfolio_names)) is important here because adding hard evidences removes the corresponding nodes in the graph and then no factor containing the variables in portfolio_names can be found. The method addJointTarget ensure that such a factor exists.

In [11]:
portfolio_posterior
Out[11]:
Barclays
Matalan
Petrobras
EnQuest
EDF
no default
default
no default
no default
no default
no default
0.14950.0332
default
0.00840.0019
default
no default
0.03830.0085
default
0.00260.0006
default
no default
no default
0.12680.0282
default
0.00810.0018
default
no default
0.04310.0096
default
0.00770.0017
default
no default
no default
no default
0.15520.0345
default
0.00880.0020
default
no default
0.04120.0092
default
0.00340.0008
default
no default
no default
0.13500.0300
default
0.01020.0023
default
no default
0.06270.0139
default
0.01700.0038

We now want to compute the distribution of the number of defaults. For that, we create a new Markov random field containing an additional node Number of defaults connected to each issuers:

In [12]:
mn2 = gum.MarkovRandomField(mn)

mn2.add(gum.RangeVariable('Number of defaults', '', 0, len(portfolio_names)))
factor = mn2.addFactor(['Number of defaults',*portfolio_names]).fillFromFunction('+'.join(portfolio_names))
In [13]:
gum.config['factorgraph','edge_length']=1.1
nodetypes={n:0.1 if n in sectors else
             0.3 if "Number of defaults"==n else
             0.2 for n in mn2.names()}
gnb.flow.add(gnb.getMRF(mn2, view="graph",size='7!',nodeColor=nodetypes))
gnb.flow.add(gnb.getMRF(mn2, view="factorgraph",size='7!',nodeColor=nodetypes))
gnb.flow.display()
G Metro Bank Metro Bank Matalan Matalan Number of defaults Number of defaults Matalan--Number of defaults Retail Retail Retail--Matalan New Look New Look Retail--New Look Marks & Spencer Marks & Spencer Retail--Marks & Spencer Finance Finance Finance--Metro Bank Finance--Retail Energy Energy Finance--Energy Deutsche Bank Deutsche Bank Finance--Deutsche Bank Barclays Barclays Finance--Barclays Petrobras Petrobras Petrobras--Matalan Petrobras--Number of defaults EnQuest EnQuest Petrobras--EnQuest Petrobras--Barclays EDF EDF EDF--Matalan EDF--Number of defaults EDF--Petrobras EDF--EnQuest EDF--Barclays Energy--Retail Energy--Petrobras Energy--EDF Energy--EnQuest EnQuest--Matalan EnQuest--Number of defaults EnQuest--Barclays Barclays--Matalan Barclays--Number of defaults
G Metro Bank Metro Bank Matalan Matalan Retail Retail New Look New Look Number of defaults Number of defaults Finance Finance Petrobras Petrobras EDF EDF Energy Energy Deutsche Bank Deutsche Bank Marks & Spencer Marks & Spencer EnQuest EnQuest Barclays Barclays f1#5 f1#5--Energy f1#5--EnQuest f1#3 f1#3--EDF f1#3--Energy f3#4#5#8#10#12 f3#4#5#8#10#12--Matalan f3#4#5#8#10#12--Number of defaults f3#4#5#8#10#12--Petrobras f3#4#5#8#10#12--EDF f3#4#5#8#10#12--EnQuest f3#4#5#8#10#12--Barclays f2#11 f2#11--Retail f2#11--Marks & Spencer f2#9 f2#9--Retail f2#9--New Look f1#4 f1#4--Petrobras f1#4--Energy f0#7 f0#7--Finance f0#7--Deutsche Bank f1#2 f1#2--Retail f1#2--Energy f0#1 f0#1--Finance f0#1--Energy f2#10 f2#10--Matalan f2#10--Retail f0#8 f0#8--Finance f0#8--Barclays f0#6 f0#6--Metro Bank f0#6--Finance f0#2 f0#2--Retail f0#2--Finance

Once this new network is created, we can obtain the distribution of Number of defaults using getPosterior :

In [14]:
ndefault_posterior = gum.getPosterior(mn2, target='Number of defaults',
                                           evs={'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ndefault_posterior
Out[14]:
Number of defaults
0
1
2
3
4
5
0.14950.36200.31190.13710.03570.0038
In [15]:
fig, ax = plt.subplots()
l = ndefault_posterior.tolist()
ax.bar(range(len(l)), l)
ax.set_xlabel('Number of defaults')
ax.set_title('Distribution of the number of defaults in the portfolio')
plt.show()
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_45_0.svg
In [16]:
q=gum.Potential(ndefault_posterior).fillWith(range(ndefault_posterior.domainSize()))
print(f'Expected number of defaults in the portfolio: {(q*ndefault_posterior).sum():.1f}')
Expected number of defaults in the portfolio: 1.6

Besides the expected number of defaults, the distribution of losses is even more informative:

In [17]:
pot_losses = gum.Potential(portfolio_posterior)
for i in pot_losses.loopIn():
    pot_losses[i] = sum([losses[key]*i.val(key) for key in losses])
print("Table of losses")
pot_losses
Table of losses
Out[17]:
Barclays
Matalan
Petrobras
EnQuest
EDF
no default
default
no default
no default
no default
no default
0.00002700000.0000
default
1800000.00004500000.0000
default
no default
270000.00002970000.0000
default
2070000.00004770000.0000
default
no default
no default
360000.00003060000.0000
default
2160000.00004860000.0000
default
no default
630000.00003330000.0000
default
2430000.00005130000.0000
default
no default
no default
no default
540000.00003240000.0000
default
2340000.00005040000.0000
default
no default
810000.00003510000.0000
default
2610000.00005310000.0000
default
no default
no default
900000.00003600000.0000
default
2700000.00005400000.0000
default
no default
1170000.00003870000.0000
default
2970000.00005670000.0000
In [18]:
expected_loss = (portfolio_posterior * pot_losses ).sum()

print(f'Expected loss with respect to a par value of USD {sum(portfolio_exposures)/1e6:.1f} MM is USD {expected_loss / 1e6:.1f} MM.')
print(f'Expected loss: {100 * expected_loss / sum(portfolio_exposures):2.1f}% of the notional.')
Expected loss with respect to a par value of USD 6.3 MM is USD 1.2 MM.
Expected loss: 18.6% of the notional.
In [19]:
cuts=list(np.percentile([pot_losses[i] for i in pot_losses.loopIn()],range(0,101,10)))
cuts[0] = cuts[0] - 0.001

loss_bucket_distribution = []
d = gum.Potential(pot_losses)
for j in range(len(cuts) - 1):
    for i in pot_losses.loopIn():
        d[i] = 1 if cuts[j]<pot_losses[i]<=cuts[j+1] else 0
    loss_bucket_distribution.append(((portfolio_posterior*d).sum()))
In [20]:
buckets = [f'({cuts[i]:>10_.0f}, {cuts[i+1]:>10_.0f}]' for i in range(len(cuts)-1)]

fig, ax = plt.subplots()
ax.bar(range(10), height=loss_bucket_distribution)
ax.set_xticks(range(10), labels=buckets, rotation=90)
ax.set_title('Distribution of losses with respect to par value')
ax.set_xlabel('Loss buckets')
plt.show()
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_51_0.svg
In [21]:
cumsum = 0.
for i in range(len(loss_bucket_distribution)):
    cumsum += loss_bucket_distribution[i]
    print('{:24} | {:0.6f}'.format(buckets[i], cumsum))
(        -0,    549_000] | 0.469960
(   549_000,    954_000] | 0.689263
(   954_000,  2_097_000] | 0.762897
( 2_097_000,  2_502_000] | 0.787553
( 2_502_000,  2_835_000] | 0.834408
( 2_835_000,  3_168_000] | 0.888119
( 3_168_000,  3_573_000] | 0.941345
( 3_573_000,  4_716_000] | 0.987143
( 4_716_000,  5_121_000] | 0.991483
( 5_121_000,  5_670_000] | 1.000000

Impact of a belief of stress on a sector for the number of defaults in the portfolio

Belief from \([1,0]\) (no stress) to \([1,1]\) (no specific belief) to \([0,1]\) (stress in a sector).

In [22]:
def show_expected_number_of_defaults(sector,q,enod_nobelief):
  v1=[]
  for i in range(101):
     posterior=gum.getPosterior(mn2, target='Number of defaults',evs={sector:[100-i,i]})
     v1.append((posterior*q).sum())

  fig, ax = plt.subplots()
  ax.set_title(f"Expected number of defaults w.r.t stress in {sector}")
  ax.set_xlabel(sector, fontweight='bold')
  ax.set_ylabel('Number of defaults', style='italic')
  ax.plot(np.arange(0,1.01,0.01),v1)
  ax.plot(0.5,enod_nobelief,'x',color="red")
  ax.text(0.5, enod_nobelief-0.05, f' No specific evidence on {sector}')
  return fig

enod_nobelief=(gum.getPosterior(mn2, target='Number of defaults',evs={})*q).sum()

gnb.flow.add(show_expected_number_of_defaults('Retail',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Finance',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Energy',q,enod_nobelief))
gnb.flow.display()
In [ ]:

In [ ]: