Back To Projects
Predicting Fake Job Listings from Real Ones using Machine Learning Models
Ronit M. | Summer 2023

The main way this problem is going to be solved is through the creation of models that can predict whether these job listings are real or fake.


Nowadays, there are more and more fake job positions appearing online, whether that be companies listing more job positions than they actually need or malicious actors listing out job positions that are meant to take personal information. The main way this problem is going to be solved is through the creation of models that can predict whether these job listings are real or fake. The procedure involves finding the dataset needed to train the model. After the preprocessing, the different NLP models were tested out to see which one would end up giving the best prediction, with the only ones that ended up working in the end being a Logistic Regression Model and a BERT Model. The accuracy of the Logistic Regression model ended up being 99.48%,. And after running the data through Random Forest Hyperparameter Tuning, the model was left with a mean absolute error of 3.37% (low values are good!). The accuracy of the BERT model ended up being 99.81%. Overall, the accuracy of the logistic regression model on its own was excellent in separating the fake job listings from the real ones, showing its capability in that regard. The BERT model was even better, with its increase by 0.33% in terms of accuracy. Though, possible potential ways to further improve the accuracy to a greater level would be through implementing a LSTM model, given how it can be more accurate than Logistic Regression and as accurate as a BERT model.Though, both require their own amount of preprocessing of the data that is separate from the main preprocessing already completed. Alongside that, it would be beneficial to have the models be able to take in input and be easily accessible so that anyone who is concerned about whether or not a job position is real or fake can check through imputing the information into a search box and submitting it, giving the models more data to use to improve their accuracy along with giving the public peace of mind to know that their data isn't going to get stolen.

Explore More!

Ronit M.
Hassan Azmat
MS in Mechanical Engineering from CMU, startup co-founder

Related Projects

Using Neural Networks to Predict U.S. Corporate Profits on Electronic Goods

The goal of this project is to train two neural network AI models: a Multi-Layer Perceptron (MLP) neural network and a Long Short-Term Memory (LSTM) neural network, to predict U.S. corporate profits on electronic goods into the future.
Will K. | Summer 2023
Mentored by Ana Sofia Muñoz Valadez
Analysis of Trending YouTube Videos: Finding Patterns in Viral Content

As the digital world continues to grow, content creators frequently have trouble building a community and producing videos that will interest their audience. Especially as these people look toward the internet for both recreational and monetary reasons, finding out techniques to build a community is important in today’s age. This paper analyzes the issues of video performance, revealing the patterns of what makes a video successful and viral. By training different models and testing different datasets, we were able to find the correlation between the potential chances of popularity and the video’s content. Using the most accurate model, the Random Forest model, content creators can see whether or not they are likely to do well based on patterns found in trending videos.
Vincent P. | Summer 2022
Mentored by Amanda Wang
Predicting Shoe Prices Using Machine Learning Algorithms

This paper investigates the use of machine learning algorithms, particularly Support Vector Regression, to accurately predict shoe prices based on factors like material, color, brand, type, gender, and size, using a dataset of 5000 entries.
Akshar S. | Summer 2024
Mentored by John Basbagill