# Covariate Roles in DAGs

This is based on lecture notes prepared together with Mark Gilthorpe for his module "Advanced Modelling Strategies".

## Roles of Covariates in DAGs

In empirical studies we often distinguish two variables of interest: the
**exposure**, or independent variable, or cause, and the **outcome**,
or dependent variable, or effect. Once these two special variables are
selected, the other variables in the study (whether measured or not measured)
are called **covariates**.

Covariates can be categorized into several roles; not all of these roles are mutually exclusive, but some are. We will define four of these roles below; in our definition we will make use of kinship terminology. In all of the following, we assume that X is the exposure and Y is the outcome.

### Confounders

Confounders are variables that is both an ancestor of the exposure and an ancestor of the outcome (along a path that does include the exposure). For instance, Z is a confounder in the following DAG:

X E @0,1 Y O @1,1.1 Z 1 @0.5,2 X Y Z Y X

To understand why the restriction "along a path that does not lead via the exposure" in the definition above, consider the following example:

X E @0,1 Y O @1,1.1 Z 1 @0.5,2 X Y Z X

Here, Z is no longer a confounder, even though it is an ancestor of both the exposure and the outcome. This is because the only path from Z to Y leads through the exposure X.

### Mediators

A **mediator** is a variable that lies "between" the exposure and
the outcome; in other words, it is a descendant of the exposure and an
ancestor of the outcome. M is a mediator in the following example:

X E @0,1 Y O @1,1.1 M 1 @0.5,1.05 X M M Y

A mediator cannot be a confounder. Can you explain why?

### Proxy Confounders

Proxy confounders are covariates that are not themselves confounders, but lie "between" confounders and the exposure or outcome. In other words, a proxy confounder is a descendant of a confounder and an ancestor of either the exposure or the outcome (but not both; else it would be a confounder).

In the example below, Z is a confounder and A and M are proxy confounders. Note that M is also a mediator; the roles as mediator and proxy confounder are not mutually exclusive.

X E @0,1 Y O @1,1.1 Z 1 @0.35,2 M 1 @0.5,0 A 2 @0.7,1.5 X M M Y Z M X Z A A Y

### Competing Exposures

Lastly, a competing exposure is an ancestor of the outcome that is not related with the exposure -- i.e., it is neither a confounder, nor a proxy confounder, nor a mediator. Including competing exposures in a regression model does not affect bias, but should improve precision.

In the two examples below, Z is a competing exposure.

X E @0,1 Y O @1,1.1 Z 1 @0.5,2 X Y Z Y

X E @0,1 Y O @1,1.1 A 1 @0.35,2 Z 2 @0.7,2 X Y Z A Y X A

## Test your knowledge!

Below you can play a little game to test your knowledge of DAG terminology. Do you manage to give correct answers in a row?

In each example, X is the **exposure** and Y is the **outcome**!

digraph G { }

Do you now feel eager to apply your graphical knowledge to an important statistical concept? Great! Read on about the Table 2 Fallacy.