Walking Example (p135)

Creative Commons License

aGrUM

interactive online version

Authors: Aymen Merrouche and Pierre-Henri Wuillemin.

This notebook follows the example from “The Book Of Why” (Pearl, 2018) chapter 4 page135

Confounding

In [1]:
from IPython.display import display, Math, Latex,HTML

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb
import os

In 1998 a study unveiled a correlation between physical exercise and longevity among nonsmoking retired men. Of course what we want to know is whether men who exercise more live longer, suggesting a causal relationship. Study measurements are to be found at the end of this notebook.

corresponding causal diagram

The corresponding causal diagram is the following:

In [2]:
# We create the causal diagram
we = gum.fastBN("Walking{casual|normal|intense}->Mortality{dead|alive}")

# We fill the CPTs
we.cpt("Walking")[:]=[151/707,379/707,177/707]
we.cpt("Mortality")[{"Walking":"casual"}]=[0.43,0.57]
we.cpt("Mortality")[{"Walking":"intense"}]=[0.215,0.785]
we.cpt("Mortality")[{"Walking":"normal"}]=[0.277,0.723]

gnb.sideBySide(we,we.cpt("Walking")*we.cpt("Mortality"),we.cpt("Walking"),we.cpt("Mortality"),
               captions=["the BN","the joint distribution","the marginal for $Walking$","the CPT for $Mortality$"])
G Mortality Mortality Walking Walking Walking->Mortality
the BN
Walking
Mortality
casual
normal
intense
dead
0.09180.14850.0538
alive
0.12170.38760.1965

the joint distribution
Walking
casual
normal
intense
0.21360.53610.2504

the marginal for $Walking$
Mortality
Walking
dead
alive
casual
0.43000.5700
normal
0.27700.7230
intense
0.21500.7850

the CPT for $Mortality$

The study showed that after 12 years, 43% of casual walkers died while only 21,5% of intense walkers died.

Causal effect of walking on mortality in this model:

In [3]:
weModele = csl.CausalModel(we)
cslnb.showCausalImpact(weModele,"Mortality",doing="Walking",values={})
G Walking Walking Mortality Mortality Walking->Mortality
Causal Model
$$\begin{equation*}P( Mortality \mid \text{do}(Walking)) = P\left(Mortality\mid Walking\right)\end{equation*}$$
Explanation : Do-calculus computations
Mortality
Walking
dead
alive
casual
0.43000.5700
normal
0.27700.7230
intense
0.21500.7850

Impact

Before jumping to any conclusions, we should consider the presence of possible confounders. We need to ask the following question: what characterizes intense walkers from casual walkers? Without abandoning the idea of a possible cause-and-effect relationship between walking and mortality, we introduce a third variable, a “confounder”, a common cause of the two variables that could explain the correlation that exists between them. Our aim is to distinguish between the causal effect of walking on mortality (if there is a cause and effect relationship) the bias induced by this third variable. For this purpose, we need to adjust for it.

In [4]:
weModele1 = csl.CausalModel(we, [("confounder", ["Walking","Mortality"])], True)
gnb.show(weModele1)
../_images/notebooks_BoW-c4p135-walkingExample_11_0.svg
In [5]:
cslnb.showCausalImpact(weModele1, "Mortality", "Walking",values={"Walking":"intense"})
G confounder Walking Walking confounder->Walking Mortality Mortality confounder->Mortality Walking->Mortality
Causal Model
Hedge Error: G={'Mortality', 'Walking'}, G[S]={'Mortality'}
Impossible
No result
Impact

Introducing age as a confounder:

We want to measure the causal effect of walking on mortality, the introduction of a confounding bias occurs when a third variable called “confounding variable” influences both walking and mortality. An obvious confounder is age, younger subjects exercise more and have more time to live! (there are other confounders)

Let’s use fictitious data:

In [6]:
wea = gum.fastBN("Age{cat1|cat2|cat3}->Walking{casual|normal|intense}->Mortality{dead|alive}<-Age{cat1|cat2|cat3}")

gnb.sideBySide(wea,wea.cpt("Age"),wea.cpt("Walking"),wea.cpt("Mortality"),
               captions=["the BN","the marginal for $Age$","the CPT for $Walking$","the CPT for $Mortality$"])
G Mortality Mortality Walking Walking Walking->Mortality Age Age Age->Mortality Age->Walking
the BN
Age
cat1
cat2
cat3
0.22030.35110.4286

the marginal for $Age$
Walking
Age
casual
normal
intense
cat1
0.26510.39630.3386
cat2
0.10980.19320.6970
cat3
0.41780.36560.2166

the CPT for $Walking$
Mortality
Age
Walking
dead
alive
cat1
casual
0.48790.5121
normal
0.41290.5871
intense
0.39120.6088
cat2
casual
0.91290.0871
normal
0.59040.4096
intense
0.20610.7939
cat3
casual
0.07710.9229
normal
0.11840.8816
intense
0.60250.3975

the CPT for $Mortality$

Causal effect of walking on mortality with age as a confounder:

In [7]:
weModele2 = csl.CausalModel(wea)
cslnb.showCausalImpact(weModele2, "Mortality", "Walking",values={})
G Age Age Walking Walking Age->Walking Mortality Mortality Age->Mortality Walking->Mortality
Causal Model
$$\begin{equation*}P( Mortality \mid \text{do}(Walking)) = \sum_{Age}{P\left(Mortality\mid Age,Walking\right) \cdot P\left(Age\right)}\end{equation*}$$
Explanation : backdoor ['Age'] found.
Mortality
Walking
dead
alive
casual
0.46110.5389
normal
0.34900.6510
intense
0.41680.5832

Impact

We adjusted for Age using the back-door criterion (Age blocks all back-door paths from Walking to Mortality, setting Walking= “intense” or conditioning on Walking=”intense” has the same effect on Mortality)

Conclusion:

After adjusting for age, we obtain that 40.5% (43% unadjusted) of casual walkers died, whereas only 23.8% (21,5% unadjusted) of intense walkers died. The correlation induced by Age between the two variables is negligible. Even after adjusting for all plausible confounders, after getting rid of all confounding bias, Walking is still associated to Mortality. Unless we missed any other confounders, in which case the remaining uncertainty is proportional to the correlation induced by these hidden variables, we can say that intentional walking prolongs life among the studied population.

In an observational study, adjusting for confounding factors is systematic in order to measure the causal effect of a treatment on an outcome.

Study measurements both unadjusted and age-adjusted:

title

In [ ]: