Poisson Regression

Author

Julius Ndung’u

Published

June 3, 2024

Poisson regression models the relationship between one or more independent variables and a count-dependent variable. It assumes that the dependent variable follows a Poisson distribution, which is appropriate for count data with non-negative integer values.

Assumptions of Poisson Regression

Poisson regression relies on several assumptions to ensure the validity of the model

Independence of Observations: The counts are independent of each other.
Linearity: The relationship between the log mean of counts and the predictors is linear.
Homogeneity of Variance: The variance of the counts is equal to the mean (equidispersion).
No Overdispersion: The observed variability in the counts is consistent with the Poisson distribution.

Applications of Poisson Regression

Poisson regression finds applications across various domains, including:

Healthcare: Modeling the number of hospital visits or disease occurrences.
Finance: Analyzing the frequency of credit card transactions or insurance claims.
Traffic Engineering: Predicting the number of traffic accidents or vehicle breakdowns.
Environmental Science: Studying the number of species in a given area or pollution incidents.

Performing Poisson Regression in R

Let’s walk through an example of performing Poisson regression in R using a dataset containing information about the children ever born by a woman.

Variables in the Data

CEB is children ever born by a single woman.

Education_level is education level attained.

birth_age is age at first birth

Loading the Data

Code

library(tidyverse)
library(gtsummary)
library(pscl)
library(pROC)
poisson <- read.csv("poison.csv", header = TRUE)
poisson$Education_level <- factor(poisson$Education_level, levels = c(0,1,2,3),
                                 labels = c("no education","primary","secondary","higher"))
head(poisson)

  CEB Education_level birth_age
1   8    no education        15
2   8    no education        15
3   8    no education        15
4   8    no education        15
5   8    no education        15
6   8    no education        15

Fitting the Model

Code

p_m <- glm(CEB ~ ., data = poisson, family = "poisson")

Interpreting the Coefficients

Code

tbl_regression(p_m,pvalue_fun = ~style_pvalue(.x, digits = 3)) %>% 
  bold_p() %>% 
  as_flex_table()

Characteristic	log(IRR)1	95% CI1	p-value
Education_level
no education	—	—
primary	-0.33	-0.34, -0.31	<0.001
secondary	-0.64	-0.66, -0.61	<0.001
higher	-0.75	-0.79, -0.71	<0.001
birth_age	-0.03	-0.03, -0.03	<0.001
1IRR = Incidence Rate Ratio, CI = Confidence Interval

Code

sjPlot::tab_model(p_m, show.intercept = F,show.reflvl = T)

Profiled confidence intervals may take longer time to compute.
  Use `ci_method="wald"` for faster computation of CIs.

	CEB
Predictors	Incidence Rate Ratios	CI	p
birth_age	0.97	0.97 – 0.97	<0.001
no education	Reference
primary	0.72	0.71 – 0.73	<0.001
secondary	0.53	0.51 – 0.54	<0.001
higher	0.47	0.45 – 0.49	<0.001
Observations	41392
R² Nagelkerke	0.304

Birth_age

The incidence rate ratio (IRR) for birth_age is 0.97, which means that for each additional year of age at first birth, the expected number of children ever born decreases by 3% (since 1 - 0.97 = 0.03 or 3%).The p-value is less than 0.001, indicating that the effect of birth age on the number of children ever born is statistically significant.

No Education

This is the reference category against which other education levels are compared.

Primary Education IRR: 0.72, Women with primary education have 28% fewer children ever born compared to those with no education (since 1 - 0.72 = 0.28 or 28%).This effect is statistically significant.
Secondary Education

IRR: 0.53, Women with secondary education have 47% fewer children ever born compared to those with no education (since 1 - 0.53 = 0.47 or 47%).This effect is statistically significant.

Higher Education

IRR: 0.47, Women with higher education have 53% fewer children ever born compared to those with no education (since 1 - 0.47 = 0.53 or 53%). This effect is statistically significant.

Model Fit R2 Nagelkerke: 0.304 The Nagelkerke R2 value of 0.304 indicates that approximately 30.4% of the variance in the number of children ever born is explained by the predictors included in the model. While not extremely high, this is a reasonable level of explanatory power for social science data.

Summary

Birth Age: The older a woman is when she has her first child, the fewer children she is expected to have overall.

Education Level: Higher levels of education are associated with significantly fewer children ever born. Specifically, women with primary, secondary, and higher education have progressively fewer children compared to women with no education.

Conclusion

Poisson regression is a powerful tool for modeling count data and understanding the influence of various predictors on the number of occurrences of an event. This blog post demonstrated how to perform Poisson regression in R, interpret the results, and assess the model’s performance.