Back To Projects
A Machine Learning Approach to Understanding the Determining Factors of the Gender Wage Gap
Sophia G. | Summer 2022 |
workspace_premium 2nd Place at San Diego BROADCOM Science Fair (Senior Division)

By studying the affect of different attributes on the gender wage gap, we can better understand both the scale of this issue and its possible solutions. So, we explore the question, how does a worker’s marital status, along with other variables, impact the gap in hourly wage between male and female workers? We seek to create a model able to predict the gender wage gap given a set of variables—age, years of education, race, state, and marital status.


Gender inequality is a complex subject consisting of a variety of issues and nuances. In this project, we choose to study gender income inequality—a prevalent issue in current society. Among the many factors that play a role in the gender wage gap, we focus on the affects of marital status, race, geographical location (by state), age, and years of education. By using these variables to create a model able to predict the hourly wage gap between a woman and their equivalent male counterpart, we can analyze the impact of each variable to better understand the role they play in the income gap. Utilizing income data from the Current Population Survey, we train and test five models—a Linear Regression, Decision Tree Regressor, Random Forest Regressor, KNeighbors Regressor, and MLP Regressor. Our Linear Regression model found that there is a correlation between being a never married worker and a smaller gender wage gap, as well as being a married worker with an absent spouse and a greater gender wage gap. In general, though, our models found little correlation between the variables provided and the predicted hourly age gap.

Explore More!

Source Code
Sophia G.

Related Projects

What Factors Correlate with the Relationship Between Gender and Race and Pursuing STEM?

In this paper, I sought to answer the question, “How do races and genders differ in the way they pursue STEM, and what factors correlate with these differences?” Identifying which factors cause this lack of representation is the most important step in fixing the diversity problem.
Saket R. | Summer 2022
Mentored by Bradley Yam
Telling Gunshots and Gunshot-like Sounds Appart

Sounds differ from each other. Having a strong ML model that can distinguish between the sounds and correctly classify them, may help with a variety of social problems, such as human and animal well being, (e.g. some sounds cause harm to autistic people and/or animals), safety drills (people with hearing disabilities may not be aware of the alarms), mass shootings, etc. Although reducing a false positive rate (e.g. a rate of other sounds falsely classified as gunshots) would help decrease the number of false alarms, it is more important (for the safety and security reason) to actually reduce the false negative rate (e.g. a rate of actual gunshots not being recognized as such).
Barbara T. | Winter 2024
Mentored by Ivan Villa-Renteria
workspace_premium
Predicting Climate Change Using an Autoregressive Long Short-Term Memory Model

This study aims to create a baseline machine learning model that utilizes an Autoregressive Recurrent Neural network with a Long Short term memory implementation for the purpose of predicting climate.
Seokhyun C. | Winter 2022
Mentored by Victoria Lloyd