# The Table 2 Fallacy

This is based on lecture notes prepared together with Mark Gilthorpe for his module "Advanced Modelling Strategies".

As you know, the covariates in a statistical analysis can have a variety of different roles from a causal inference perspective: they can be mediators, confounders, proxy confounders, or competing exposures. If a suitable set of covariates can be identified that removes confounding, we may proceed to estimate our causal effect using a multivariable regression model. In linear regression models, there are only two types of variables: the dependent variable (DV) and independent variables (IVs, or predictors). No further distinction is made between the IVs – specifically, the exposure is by no means a "special" IV and is treated just like any other IV. Thus, as you can see, there is a conceptual mismatch between causal theory (DAG) that leads us to formulate a multivariable regression model (that highlights the exposure-outcome relationship and associated statistical adjustment for confounding) and the regression model itself. This conceptual mismatch can easily lead to misinterpretation of the results from a multivariable regression model.

One particularly widespread misconception is known as mutual adjustment, recently called the Table 2 fallacy since the first table in most epidemiological articles usually describes the study data, and the second table reports the results of a multivariable regression model where the erroneous efforts to illustrate mutual adjustment often appear. To illustrate the fallacy, let us assume that we estimate the effect of X on Y. We know (e.g. from a DAG) that there is only one confounder, Z, so we run the regression Y~X+Z. If our background knowledge and the statistical assumptions of the regression (e.g. normality) hold, then the coefficient of X estimates the causal effect of X on Y. The ‘Table 2 fallacy’ is the belief that we can also interpret the coefficient of Z as the effect of Z on Y; indeed, in larger models, the fallacy is the belief that all coefficients have a similar interpretation with respect to Y.

To see why this is not true, let us look at an example DAG that matches our scenario.

Figure 1
digraph G { X [pos="1,1"] Y [pos="2,0"] Z [pos="0,0"] Z -> Y Z -> X -> Y }

With respect to the effect of X on Y, adjustment for Z removes all confounding, but what does including X in the model mean for the effect of Z on Y?

As we can see, X mediates the effect of Z on Y, but adjustment for a mediator is erroneous when estimating the total causal effect. Thus, the Z coefficient in our model cannot be interpreted as a total causal effect. Instead, we could interpret it as the direct effect of Z on Y when X is held constant; this could be stronger than, weaker than, or opposite to the total effect (see Simpson's paradox).

Suppose Y is regressed on all other variables in the DAG below. Now look to see which coefficients in this model can be interpreted as (unconfounded) total effects? Please click on all coefficients that can be interpreted as total effects, and then click !

digraph G { }

## The role of confounding

Thus far it would seem that we can at least interpret every coefficient in a multivariable regression model as eiter a total or a direct causal effect. To see that this can also fail, let us add another variable to our DAG. We include U, which affects both Z and Y:

Figure 2
```	dag G {
X [pos="1,1"]
Y [pos="2,0"]
Z [pos="0,0"]
U [pos=".7,-1"]
Z -> Y
U -> Y
U -> Z -> X -> Y
}
```

Despite this new variable, it is still sufficient to adjust for Z alone to unconfound the X→Y effect (can you see why?). Thus, the validity of the X coefficient is unchanged. Upon examining Z in this situation, however, we encounter difficulties.

The new variable U acts as a confounder of the Z→Y relationship, which means that we would have to interpret the Z coefficient as a ‘direct effect that is confounded by U’ – not exactly a helpful interpretation. Indeed, no single multivariable regression model could ever estimate the causal effects of X and Z at the same time: estimating the X effect means we must include X as an IV, but to estimate the Z effect we must not include X. In general, it is impossible to identify multiple causal effects using a single regression model, and we can usually interpret at most one coefficient in such a model as a (total) causal effect.