## Contingency Tables

### Definition

A contingency table lists the frequency distributions of variables from a study and is a convenient way to look at any relationships between variables.

### Table structure

- A 2×2 grid comparing whether 2 different variables are associated with each other
- Each box in the table indicates the number of people in the study who have that specific combination of variables and is labeled with a letter A–D.
- Formulas to calculate different measures of risk use these letters.
- In order for the formulas to work, the tables need to be set up the same way each time; for clinical trials:
- The rows refer to whether or not a patient was exposed to the risk factor being tested (e.g., smoking)
- The columns refer to whether or not a patient developed the outcome being studied (e.g., lung cancer)
- The “Yes” answer comes 1st, the “No” answer comes 2nd.

- Letter labels:
- A = A patient was exposed to the risk factor and developed the outcome (e.g., a smoker develops lung cancer).
- B = A patient was exposed to the risk factor but did not develop the outcome (e.g., a smoker does not develop lung cancer).
- C = A patient was not exposed to the risk factor but developed the outcome anyway (e.g., nonsmoker develops lung cancer).
- D = A patient was not exposed to the risk factor and did not develop the outcome (e.g., nonsmoker does not develop lung cancer).

- Totals:
- Each row and column has subtotals called marginal totals in the far right column and bottom row.
- N is the total number of people in the population set; this grand total is reported in the bottom right box

- Allows for the rapid calculation of several measures of association and risk

### Example

Below is an example of a 2×2 contingency table. The cells show the frequencies of distribution (A, B, C, D) for different combinations of the two variables (outcome, exposure), for a population of size N.

## Absolute Risk

### Definition

The AR is the risk of developing a disease or condition after an exposure.

- The AR is also the cumulative incidence rate
- Note: Both absolute risk and attributable risk (discussed below) are frequently abbreviated as AR, which is why absolute risk is often written as incidence (I) instead.

### Calculations of Absolute Risk

The AR** **is calculated as the number of people who have a particular outcome divided by the total number of people with the same exposure (or the same nonexposure). This risk can be calculated for both exposed and unexposed populations.

**Steps:**

Start by setting up a contingency table:

Using the contingency table, the AR in the exposed group is calculated as:

$$ Absolute\ risk\ of\ the\ exposed\ group = \frac{A}{A + B} $$where A = a patient was exposed to the risk factor and developed the outcome and B = a patient was exposed to the risk factor but did not develop the outcome.

Using the contingency table, the AR in the unexposed group is calculated as:

$$ Absolute\ risk\ of\ the\ unexposed\ group = \frac{C}{C + D} $$where C = a patient who was not exposed to the risk factor but developed the outcome anyway and D = a patient who was not exposed to the risk factor and did not develop the outcome.

### Example of calculating AR

**Example 1: **In a population of 100 smokers, 75 developed lung cancer and 25 did not. What is the AR of developing lung cancer if you are a smoker?

- This question is asking about the AR in the exposed group
- Set up a contingency table with the exposure (smoking) on the vertical axis and the outcome (lung cancer) on the horizontal axis (see below)
**Answer:**AR = A / (A + B) = 75 / (75 + 25) = 75 / 100 = 0.75

**Example 2: **In a population of 100 nonsmokers, 10 developed lung cancer and 90 did not. What is the AR of developing lung cancer if you are not a smoker?

- This question is asking about the AR in the unexposed group
- Set up a contingency table with the exposure (smoking) on the vertical axis and the outcome (lung cancer) on the horizontal axis (see below)
**Answer:**AR = C / (C + D) = 10 / (10 + 90) = 10 / 100 = 0.1

### Absolute risk reduction (ARR)/absolute risk increase (ARI)

ARR or ARI is a measure of the reduction or increase in risk of developing a disease or condition as the result of an exposure.

Other ways to conceptualize ARR:

- The difference in AR between exposed and unexposed groups
- The difference in incidence rates
- The difference

ARR can be interpreted as the health “gained” or ”lost” as a result of the exposure. For example, if you don’t smoke, by what percent can you reduce your risk of lung cancer?

The ARR between the exposed and unexposed groups can be calculated as:

$$ ARR = I_{Exposed} – I_{Unexposed} $$where I = incidence rate. Because I is the same as AR, this formula can be calculated from a contingency table:

$$ ARR = \frac{A}{A + B} – \frac{C}{C + D} $$### Example of calculating ARR

In a population of 100 smokers, 75 developed lung cancer and 25 did not. In a population of 100 nonsmokers, 10 developed lung cancer and 90 did not. By how much does not smoking reduce the AR of developing lung cancer?

- This question is asking about the ARR achieved by avoidance of the exposure (smoking).
- Set up a contingency table with the exposure (smoking) on the vertical axis and the outcome (lung cancer) on the horizontal axis (see below).
**Answer:**- AR
_{Exposed}= A / (A + B) = 75 / (75 + 25) = 75 / 100 = 0.75 - AR
_{Unexposed}= C / (C + D) = 10 / (10 + 90) = 10 / 100 = 0.1 **ARR =**I_{Exposed}‒ I_{Unexposed}= AR_{Exposed}‒ AR_{Unexposed}= 0.75 ‒ 0.1 = 0.65

- AR
**Interpretation:**Based on this data set, not smoking reduces a person’s absolute risk of developing lung cancer by 65%.

### Number needed to treat/harm

These are numbers typically reported when testing new therapeutic options. In these cases:

- The “exposure” is the new medication/procedure.
- The “outcome” is either the benefit of the procedure or a potential adverse effect.

**Number needed to treat (NNT):**

- Represents the number of patients who would need to be treated to achieve 1 additional case of the outcome
- NNT = 1 / ARR (i.e., the inverse of the ARR)

**Number needed to harm (NNH):**

- Typically used when reporting on experimental treatments with potential adverse effects
- Represents the number of patients who would need to be treated to achieve the outcome (the harmful effect)
- NNH = 1 / ARI (i.e., the inverse of the ARI)

## Relative Risk

### Definition

Relative risk (RR)** **is the risk of a disease or condition occurring in a group or population with a particular exposure relative to a control (unexposed) group.

- Can be equivalently stated as
- RR shows how strongly exposure is associated with the risk of the disease.

### Calculations of RR

RR is typically among the most important numbers calculated. Cohort studies are the only type of observational** **study that can determine the RR.

The relative risk is calculated as the frequency of a disease or condition in the exposed group (I_{E}) divided by the frequency in an unexposed control group (I_{O}), which is represented by the formula:

Again, since the incidence rates are the same as the AR, the RR can be calculated from a contingency table by using the following expanded formula:

$$ RR = \frac{\frac{A}{A + B}}{\frac{C}{C + D}} $$### Interpretation of RR

- RR = 1: The risk of the outcome for the exposed group and the unexposed group is the same.
- RR > 1: The risk of the exposed group is greater than the risk of the unexposed group; evidence of positive association/possible causal factor.
- RR < 1: The risk of the exposed group is lower than the risk of the unexposed group; evidence of negative association/possible protective factor.

### Example of calculating RR

In a population of 100 smokers, 75 developed lung cancer and 25 did not. In a population of 100 nonsmokers, 10 developed lung cancer and 90 did not. What is the risk of getting lung cancer if you smoke compared to the risk of getting lung cancer if you do not smoke?

- This question is asking about the RR of getting lung cancer
- Set up a contingency table (see below)
**Answer:**- RR = I
_{E}÷ I_{O }= AR_{Exposed}÷ AR_{Unexposed}= [ A / (A + B) ] ÷ [ C / (C + D) ] - RR = 0.75 ÷ 0.1 = 7.5

- RR = I
**Interpretation:**Based on this data sample, smokers are 7.5 times more likely to develop lung cancer than nonsmokers.

### Relative risk reduction (RRR)/relative risk increase (RRI)

**Definition:**

- The RRR is defined as the reduction (or increase) in risk of a particular outcome, in a group with a known exposure, relative to the unexposed control group.
- Put another way: RRR is the proportion of baseline risk that is reduced through nonexposure.
- Example: If people don’t smoke, how much less lung cancer can you expect?

**Calculation:**

The RRR is calculated as the difference between the incidence of a disease in an exposed (I_{E}) and an unexposed (I_{O} ) group divided by the incidence in the unexposed group, which is calculated with the following formula:

### Example of calculating RRR

In a population of 100 smokers, 75 developed lung cancer and 25 did not after 10 years in a cohort study. In a population of 100 nonsmokers, 10 developed lung cancer and 90 did not. If people didn’t smoke, how much less lung cancer can be expected in the population?

- This question is asking about the RRR achieved by not smoking.
- Set up a contingency table (see below).
**Answer:**RRR = ARR ÷ I_{O}= 0.65 ÷ 0.1 = 6.5**Interpretation:**Based on this data set, avoiding smoking reduces the RR of lung cancer by 650%.

## Attributable Risk

### Definition

The attributable risk is a measure of the risk of developing an outcome associated with a particular exposure.

- That is, how much of the outcome can we contribute to the behavior?
- Attributable risk is sometimes called the excess risk because it represents how much increase in risk a particular behavior will add to the baseline risk of developing a particular outcome.
- The formula for attributable risk is similar to that for ARR, but attributable risk is used in epidemiologic studies.
- 2 kinds of attributable risk:
- Attributable risk in the exposed group
- Population attributable risk (PAR)

- Note: Both absolute risk and attributable risk are frequently abbreviated as AR, which is why absolute risk is often described as incidence (I) instead.

### Attributable risk in the exposed group

The attributable risk in the exposed group is the difference in the rate of a disease between the exposed and the unexposed groups. For example, what percentage of lung cancer cases are likely due to smoking?

The attributable risk is calculated by subtracting the incidence in the unexposed group (I_{O}) from the incidence of the exposed group (I_{E}) and dividing by the incidence in the exposed group, which is expressed as:

**Example:** In a population of 100 smokers, 75 developed lung cancer and 25 did not after 10 years in a cohort study. In a population of 100 nonsmokers, 10 developed lung cancer and 90 did not. What percentage of lung cancer cases are likely due to smoking?

- This question is asking about the attributable risk in the exposed group.
**Answer:**(0.75 – 0.1 ) ÷ 0.75 = 0.65 ÷ 0.75 = 0.867**Interpretation:**86.7% of lung cancer cases are due to smoking.

### Population attributable risk (PAR)

The PAR is the attributable risk for an entire population. It represents the fraction of cases that would not occur in a population if the exposure was eliminated.

For example, what percentage of lung cancer cases could be prevented if nobody smoked?

The PAR is calculated by subtracting the incidence rate in the unexposed population from the incidence rate in the entire population:

$$ PAR = \frac{(I_{T} – I_{O})}{I_{T}} = \frac{(\frac{A + C}{N} – \frac{C}{C + D})}{\frac{A + C}{N}} $$**Example:** In a population of 100 smokers, 75 developed lung cancer and 25 did not after 10 years in a cohort study. In a population of 100 nonsmokers, 10 developed lung cancer and 90 did not. What percentage of lung cancer cases could be prevented if nobody smoked?

- This question is asking about the population attributable risk.
**Answer:**76.5% of lung cancers could be prevented if no one smoked

## Odds Ratio

### Definition

An odds ratio (OR) is a statistic that quantifies the strength of association between 2 variables or events.

- OR calculates the odds of one variable (A) in the presence or absence of another variable (B).
- Odds are the probability of a thing happening divided by the probability of that thing not happening.
- For example, the probability of “heads” on a coin toss is 50%; the odds are 1 (50% ÷ 50%).

- In clinical studies, OR measures the association between an exposure and an outcome.
- Does not imply causation
- OR can be used to estimate the RR when incidence rates cannot be calculated, such as in case–control studies.
- Used when diseases are rare (typically when the prevalence is ≤ about 10%)
- Also used in more complicated statistical analyses:
- Can be thought of as a type of RR
- Used to determine risk factors in studies

### Calculations of the OR

An OR is used as an estimation of RR in case–control studies. OR is calculated by determining the odds of exposure among the diseased divided by the odds of exposure among the undiseased. This is represented as:

$$ OR = \frac{(Odds\ of\ exposure\ among\ diseased)}{(Odds\ of\ exposure\ among\ undiseased)} = \frac{A \div C}{B \div D} $$where (A ÷ C) represents the number of exposed cases divided by the number of unexposed cases among those with the disease and (B ÷ D) is the number of exposed undiseased divided by the number of unexposed undiseased.

Rearranging the formula gives the simplified equation:

$$ OR = (AD) + (BC) $$### Interpreting the OR

OR is interpreted in the same way as RR:

- OR = 1:
- The risk of the outcome for the exposed group and the unexposed group is the same.
- No association between the exposure and outcome

- OR > 1:
- The risk in the exposed group is greater
- Evidence of a positive association/possible causal factor

- The risk in the exposed group is greater
- OR < 1:
- The risk in the exposed group is lower
- Evidence of a negative association/possible protective factor

- The risk in the exposed group is lower

### Example

In this example, 6 people developed Creutzfeldt-Jakob disease (CJD); 3 of them ate a significant amount of beef and 3 of them did not. These patients were compared to a control population in a case–control study; in the control population of 10 people, 4 of them ate a significant amount of beef and 6 of them did not. What are the odds of developing CJD after eating a significant amount of beef?

**Answer:**OR = (AD) ÷ (BC) = (3 x 6) ÷ (3 x 4) = 18 ÷ 12 = 1.5**Interpretation:**The odds of developing CJD are 50% higher in the group that ate beef, supporting the idea that eating beef is a risk factor for the development of CJD.

## References

- Celentano, D., Szklo, M. (2019). Estimating risk: is there an association? Gordis Epidemiology (pp. 240–258).
- Szumilas M. (2010). Explaining odds ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry 19:227–229.
- Beaglehole, R. (2006). Basic epidemiology. Geneva: WHO.
- Bhopal, R. S. (2016). Concepts of Epidemiology. Oxford University Press. https://global.oup.com/academic/product/concepts-of-epidemiology-9780198739685?cc=de&lang=en&
- Edwards, A. W. F. (1963). The measure of association in a 2×2 table. Journal of the Royal Statistical Society A (General) 126:109–114.
- Morris, A., Gardner, M. (1988). Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates. British Medical Journal Clinical Research 296: 1313–1316.
- Zhang, J., Yu, K. (1998). What’s the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA. 280:1690–1691.