Poisson Regression

Author

Julius Ndung’u

Published

June 3, 2024

Poisson regression models the relationship between one or more independent variables and a count-dependent variable. It assumes that the dependent variable follows a Poisson distribution, which is appropriate for count data with non-negative integer values.

Assumptions of Poisson Regression

Poisson regression relies on several assumptions to ensure the validity of the model

  • Independence of Observations: The counts are independent of each other.

  • Linearity: The relationship between the log mean of counts and the predictors is linear.

  • Homogeneity of Variance: The variance of the counts is equal to the mean (equidispersion).

  • No Overdispersion: The observed variability in the counts is consistent with the Poisson distribution.

Applications of Poisson Regression

Poisson regression finds applications across various domains, including:

  • Healthcare: Modeling the number of hospital visits or disease occurrences.

  • Finance: Analyzing the frequency of credit card transactions or insurance claims.

  • Traffic Engineering: Predicting the number of traffic accidents or vehicle breakdowns.

  • Environmental Science: Studying the number of species in a given area or pollution incidents.

Performing Poisson Regression in R

Let’s walk through an example of performing Poisson regression in R using a dataset containing information about the children ever born by a woman.

Variables in the Data

CEB is children ever born by a single woman.

Education_level is education level attained.

birth_age is age at first birth

Loading the Data

Code
library(tidyverse)
library(gtsummary)
library(pscl)
library(pROC)
poisson <- read.csv("poison.csv", header = TRUE)
poisson$Education_level <- factor(poisson$Education_level, levels = c(0,1,2,3),
                                 labels = c("no education","primary","secondary","higher"))
head(poisson)
  CEB Education_level birth_age
1   8    no education        15
2   8    no education        15
3   8    no education        15
4   8    no education        15
5   8    no education        15
6   8    no education        15

Fitting the Model

Code
p_m <- glm(CEB ~ ., data = poisson, family = "poisson")

Interpreting the Coefficients

Code
tbl_regression(p_m,pvalue_fun = ~style_pvalue(.x, digits = 3)) %>% 
  bold_p() %>% 
  as_flex_table()

Characteristic

log(IRR)1

95% CI1

p-value

Education_level

no education

primary

-0.33

-0.34, -0.31

<0.001

secondary

-0.64

-0.66, -0.61

<0.001

higher

-0.75

-0.79, -0.71

<0.001

birth_age

-0.03

-0.03, -0.03

<0.001

1IRR = Incidence Rate Ratio, CI = Confidence Interval

Code
sjPlot::tab_model(p_m, show.intercept = F,show.reflvl = T)
Profiled confidence intervals may take longer time to compute.
  Use `ci_method="wald"` for faster computation of CIs.
  CEB
Predictors Incidence Rate Ratios CI p
birth_age 0.97 0.97 – 0.97 <0.001
no education Reference
primary 0.72 0.71 – 0.73 <0.001
secondary 0.53 0.51 – 0.54 <0.001
higher 0.47 0.45 – 0.49 <0.001
Observations 41392
R2 Nagelkerke 0.304
  • Birth_age

The incidence rate ratio (IRR) for birth_age is 0.97, which means that for each additional year of age at first birth, the expected number of children ever born decreases by 3% (since 1 - 0.97 = 0.03 or 3%).The p-value is less than 0.001, indicating that the effect of birth age on the number of children ever born is statistically significant.

  • No Education

This is the reference category against which other education levels are compared.

  • Primary Education IRR: 0.72, Women with primary education have 28% fewer children ever born compared to those with no education (since 1 - 0.72 = 0.28 or 28%).This effect is statistically significant.

  • Secondary Education

IRR: 0.53, Women with secondary education have 47% fewer children ever born compared to those with no education (since 1 - 0.53 = 0.47 or 47%).This effect is statistically significant.

  • Higher Education

IRR: 0.47, Women with higher education have 53% fewer children ever born compared to those with no education (since 1 - 0.47 = 0.53 or 53%). This effect is statistically significant.

  • Model Fit R2 Nagelkerke: 0.304 The Nagelkerke R2 value of 0.304 indicates that approximately 30.4% of the variance in the number of children ever born is explained by the predictors included in the model. While not extremely high, this is a reasonable level of explanatory power for social science data.

Summary

Birth Age: The older a woman is when she has her first child, the fewer children she is expected to have overall.

Education Level: Higher levels of education are associated with significantly fewer children ever born. Specifically, women with primary, secondary, and higher education have progressively fewer children compared to women with no education.

Conclusion

Poisson regression is a powerful tool for modeling count data and understanding the influence of various predictors on the number of occurrences of an event. This blog post demonstrated how to perform Poisson regression in R, interpret the results, and assess the model’s performance.