Code
CEB Education_level birth_age
1 8 no education 15
2 8 no education 15
3 8 no education 15
4 8 no education 15
5 8 no education 15
6 8 no education 15
Julius Ndung’u
June 3, 2024
Poisson regression models the relationship between one or more independent variables and a count-dependent variable. It assumes that the dependent variable follows a Poisson distribution, which is appropriate for count data with non-negative integer values.
Poisson regression relies on several assumptions to ensure the validity of the model
Independence of Observations: The counts are independent of each other.
Linearity: The relationship between the log mean of counts and the predictors is linear.
Homogeneity of Variance: The variance of the counts is equal to the mean (equidispersion).
No Overdispersion: The observed variability in the counts is consistent with the Poisson distribution.
Poisson regression finds applications across various domains, including:
Healthcare: Modeling the number of hospital visits or disease occurrences.
Finance: Analyzing the frequency of credit card transactions or insurance claims.
Traffic Engineering: Predicting the number of traffic accidents or vehicle breakdowns.
Environmental Science: Studying the number of species in a given area or pollution incidents.
Let’s walk through an example of performing Poisson regression in R using a dataset containing information about the children ever born by a woman.
CEB is children ever born by a single woman.
Education_level is education level attained.
birth_age is age at first birth
CEB Education_level birth_age
1 8 no education 15
2 8 no education 15
3 8 no education 15
4 8 no education 15
5 8 no education 15
6 8 no education 15
p_m <- glm(CEB ~ ., data = poisson, family = "poisson")
tbl_regression(p_m,pvalue_fun = ~style_pvalue(.x, digits = 3)) %>%
bold_p() %>%
as_flex_table()
Characteristic |
log(IRR)1 |
95% CI1 |
p-value |
---|---|---|---|
Education_level |
|||
no education |
— |
— |
|
primary |
-0.33 |
-0.34, -0.31 |
<0.001 |
secondary |
-0.64 |
-0.66, -0.61 |
<0.001 |
higher |
-0.75 |
-0.79, -0.71 |
<0.001 |
birth_age |
-0.03 |
-0.03, -0.03 |
<0.001 |
1IRR = Incidence Rate Ratio, CI = Confidence Interval |
sjPlot::tab_model(p_m, show.intercept = F,show.reflvl = T)
Profiled confidence intervals may take longer time to compute.
Use `ci_method="wald"` for faster computation of CIs.
CEB | |||
---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p |
birth_age | 0.97 | 0.97 – 0.97 | <0.001 |
no education | Reference | ||
primary | 0.72 | 0.71 – 0.73 | <0.001 |
secondary | 0.53 | 0.51 – 0.54 | <0.001 |
higher | 0.47 | 0.45 – 0.49 | <0.001 |
Observations | 41392 | ||
R2 Nagelkerke | 0.304 |
The incidence rate ratio (IRR) for birth_age is 0.97, which means that for each additional year of age at first birth, the expected number of children ever born decreases by 3% (since 1 - 0.97 = 0.03 or 3%).The p-value is less than 0.001, indicating that the effect of birth age on the number of children ever born is statistically significant.
This is the reference category against which other education levels are compared.
Primary Education IRR: 0.72, Women with primary education have 28% fewer children ever born compared to those with no education (since 1 - 0.72 = 0.28 or 28%).This effect is statistically significant.
Secondary Education
IRR: 0.53, Women with secondary education have 47% fewer children ever born compared to those with no education (since 1 - 0.53 = 0.47 or 47%).This effect is statistically significant.
IRR: 0.47, Women with higher education have 53% fewer children ever born compared to those with no education (since 1 - 0.47 = 0.53 or 53%). This effect is statistically significant.
Birth Age: The older a woman is when she has her first child, the fewer children she is expected to have overall.
Education Level: Higher levels of education are associated with significantly fewer children ever born. Specifically, women with primary, secondary, and higher education have progressively fewer children compared to women with no education.
Poisson regression is a powerful tool for modeling count data and understanding the influence of various predictors on the number of occurrences of an event. This blog post demonstrated how to perform Poisson regression in R, interpret the results, and assess the model’s performance.