Back To Projects
A Machine Learning Approach to Understanding the Determining Factors of the Gender Wage Gap
Sophia G. | Summer 2022 |
workspace_premium 2nd Place at San Diego BROADCOM Science Fair (Senior Division)

By studying the affect of different attributes on the gender wage gap, we can better understand both the scale of this issue and its possible solutions. So, we explore the question, how does a worker’s marital status, along with other variables, impact the gap in hourly wage between male and female workers? We seek to create a model able to predict the gender wage gap given a set of variables—age, years of education, race, state, and marital status.


Gender inequality is a complex subject consisting of a variety of issues and nuances. In this project, we choose to study gender income inequality—a prevalent issue in current society. Among the many factors that play a role in the gender wage gap, we focus on the affects of marital status, race, geographical location (by state), age, and years of education. By using these variables to create a model able to predict the hourly wage gap between a woman and their equivalent male counterpart, we can analyze the impact of each variable to better understand the role they play in the income gap. Utilizing income data from the Current Population Survey, we train and test five models—a Linear Regression, Decision Tree Regressor, Random Forest Regressor, KNeighbors Regressor, and MLP Regressor. Our Linear Regression model found that there is a correlation between being a never married worker and a smaller gender wage gap, as well as being a married worker with an absent spouse and a greater gender wage gap. In general, though, our models found little correlation between the variables provided and the predicted hourly age gap.

Explore More!

Source Code
Sophia G.

Related Projects

Exposing Undercounts in the Census through Regression Modelling

Although many community leaders have proposed that language barriers pose significant obstacles to Census outreach, this paper explores the viability of using predictive models to quantify the extent the role language plays.
Tarun S. | Fall 2022
Mentored by Katie O'Nell
How Does One’s Background Determine Their Mental State?

Mental health issues have become very prevalent in recent times, and although significant progress has been made in terms of treatment in the form of counseling, medication, and other methods, in order to truly find out the root cause of many mental health issues, a correlation has to be drawn between one’s mental state and another factor, such as family history, age, work environment, etc. It is also beneficial to correlate one’s mental state with a multitude of factors to see the compilation.
Sohum T. | Summer 2022
Mentored by Akshay Jagadeesh
workspace_premium
Differences in predicted rates of vaginal births after cesarean across racial groups in a ‘race-neutral’ model

A large body of work in machine learning has highlighted that supposedly de-biased systems often re-code sensitive variables like race in terms of proxy variables. In order to determine if this was the case in this calculator, we replicated their formula, then found base-rate statistics of all the input variables for three different racial groups: Black, White, and Asian.
Anjali S. | Summer 2022
Mentored by Katie O'Nell