Covariate Roles in DAGs

This is based on lecture notes prepared together with Mark Gilthorpe for his module "Advanced Modelling Strategies".

Roles of Covariates in DAGs

In empirical studies we often distinguish two variables of interest: the exposure, or independent variable, or cause, and the outcome, or dependent variable, or effect. Once these two special variables are selected, the other variables in the study (whether measured or not measured) are called covariates.

Covariates can be categorized into several roles; not all of these roles are mutually exclusive, but some are. We will define four of these roles below; in our definition we will make use of kinship terminology. In all of the following, we assume that X is the exposure and Y is the outcome.


Confounders are variables that is both an ancestor of the exposure and an ancestor of the outcome (along a path that does not include the exposure). For instance, Z is a confounder in the following DAG:

dag { X [exposure,pos="0.000,1.000"] Y [outcome,pos="1.000,1.100"] Z [pos="0.500,2.000"] X -> Y Z -> X Z -> Y }

To understand why the restriction "along a path that does not lead via the exposure" in the definition above, consider the following example:

dag { X [exposure,pos="0.000,1.000"] Y [outcome,pos="1.000,1.100"] Z [pos="0.500,2.000"] X -> Y Z -> X }

Here, Z is no longer a confounder, even though it is an ancestor of both the exposure and the outcome. This is because the only path from Z to Y leads through the exposure X.


A mediator is a variable that lies "between" the exposure and the outcome; in other words, it is a descendant of the exposure and an ancestor of the outcome. M is a mediator in the following example:

dag { M [pos="0.500,1.050"] X [exposure,pos="0.000,1.000"] Y [outcome,pos="1.000,1.100"] M -> Y X -> M }

A mediator cannot be a confounder. Can you explain why?

Proxy Confounders

Proxy confounders are covariates that are not themselves confounders, but lie "between" confounders and the exposure or outcome. In other words, a proxy confounder is a descendant of a confounder and an ancestor of either the exposure or the outcome (but not both; else it would be a confounder).

In the example below, Z is a confounder and A and M are proxy confounders. Note that M is also a mediator; the roles as mediator and proxy confounder are not mutually exclusive.

dag { A [pos="0.700,1.500"] M [pos="0.500,0.000"] X [exposure,pos="0.000,1.000"] Y [outcome,pos="1.000,1.100"] Z [pos="0.350,2.000"] A -> Y M -> Y X -> M Z -> A Z -> M Z -> X }

Competing Exposures

Lastly, a competing exposure is an ancestor of the outcome that is not related with the exposure -- i.e., it is neither a confounder, nor a proxy confounder, nor a mediator. Including competing exposures in a regression model does not affect bias, but should improve precision.

In the two examples below, Z is a competing exposure.

dag { X [exposure,pos="0.000,1.000"] Y [outcome,pos="1.000,1.100"] Z [pos="0.500,2.000"] X -> Y Z -> Y }
dag { A [pos="0.350,2.000"] X [exposure,pos="0.000,1.000"] Y [outcome,pos="1.000,1.100"] Z [pos="0.700,2.000"] X -> A X -> Y Z -> A Z -> Y }

Test your knowledge!

Below you can play a little game to test your knowledge of DAG terminology. Do you manage to give correct answers in a row?

In each example, X is the exposure and Y is the outcome!

digraph G { }

Do you now feel eager to apply your graphical knowledge to an important statistical concept? Great! Read on about the Table 2 Fallacy.