Back To Projects
A Machine Learning Approach to Understanding the Determining Factors of the Gender Wage Gap
Sophia G. |
workspace_premium 2nd Place at San Diego BROADCOM Science Fair (Senior Division)

By studying the affect of different attributes on the gender wage gap, we can better understand both the scale of this issue and its possible solutions. So, we explore the question, how does a worker’s marital status, along with other variables, impact the gap in hourly wage between male and female workers? We seek to create a model able to predict the gender wage gap given a set of variables—age, years of education, race, state, and marital status.


Gender inequality is a complex subject consisting of a variety of issues and nuances. In this project, we choose to study gender income inequality—a prevalent issue in current society. Among the many factors that play a role in the gender wage gap, we focus on the affects of marital status, race, geographical location (by state), age, and years of education. By using these variables to create a model able to predict the hourly wage gap between a woman and their equivalent male counterpart, we can analyze the impact of each variable to better understand the role they play in the income gap. Utilizing income data from the Current Population Survey, we train and test five models—a Linear Regression, Decision Tree Regressor, Random Forest Regressor, KNeighbors Regressor, and MLP Regressor. Our Linear Regression model found that there is a correlation between being a never married worker and a smaller gender wage gap, as well as being a married worker with an absent spouse and a greater gender wage gap. In general, though, our models found little correlation between the variables provided and the predicted hourly age gap.

Explore More!

Source Code
Sophia G.

Related Projects

Identifying Parameters in Water Potability Analysis Through Machine Learning

Predicting whether the water is potable or not can be helpful for people who are reliant on bodies of water and redirect them to safer options. It will also be beneficial to apply the algorithm to other places where it is expensive and inefficient to send people out and collect water samples. Over the past couple of decades, researchers have often commented on the lack of funding as a source of error when it comes to data analysis and the accuracy of the research.
Molly H.
Mentored by Sharon Chen
workspace_premium
AI-Based Image Classification Used to Accurately Distinguish Recyclable Material Versus Non-Recyclable Material

One cause of this improper disposal of materials is that it can be difficult to tell if a material is able to be recycled. In response, I created a machine learning model that can distinguish recyclable materials from trash through image classification.
Katarina A.
Mentored by Ayush Pandit
DeepSolar Bangladesh: A Novel Convolutional Neural Network (CNN) Architecture for the Detection of Solar Panels from Low Resolution Satellite Imagery in Developing Countries

Due to its environmental benefits and decreasing costs, the supply of solar energy is growing at an accelerating pace globally. However, the decentralised nature of solar makes it difficult to keep track of the different photovoltaic (PV) systems deployed across a country. There is a critical need for highly accurate, comprehensive national databases of solar systems, which would allow policymakers, researchers, and the government to study socioeconomic trends in solar deployment. Manual surveys have shown to be inaccurate. The 2018 DeepSolar study by Yang et. al developed a deep-learning framework and national solar deployment database for the US using high-quality satellite imagery, which proved to be a much more efficient and accurate approach. However, satellite imagery in developing countries such as Bangladesh is of much lower resolution and quality, and performed poorly with the original DeepSolar model by Yang et. al. Our study highlights the implementation of a novel convolutional neural network (CNN) in detecting solar panels through low resolution Google Static Maps API satellite imagery data.
Khondoker F.
Mentored by Barbie Duckworth