The main way this problem is going to be solved is through the creation of models that can predict whether these job listings are real or fake.
Nowadays, there are more and more fake job positions appearing online, whether that be companies listing more job positions than they actually need or malicious actors listing out job positions that are meant to take personal information. The main way this problem is going to be solved is through the creation of models that can predict whether these job listings are real or fake. The procedure involves finding the dataset needed to train the model. After the preprocessing, the different NLP models were tested out to see which one would end up giving the best prediction, with the only ones that ended up working in the end being a Logistic Regression Model and a BERT Model. The accuracy of the Logistic Regression model ended up being 99.48%,. And after running the data through Random Forest Hyperparameter Tuning, the model was left with a mean absolute error of 3.37% (low values are good!). The accuracy of the BERT model ended up being 99.81%. Overall, the accuracy of the logistic regression model on its own was excellent in separating the fake job listings from the real ones, showing its capability in that regard. The BERT model was even better, with its increase by 0.33% in terms of accuracy. Though, possible potential ways to further improve the accuracy to a greater level would be through implementing a LSTM model, given how it can be more accurate than Logistic Regression and as accurate as a BERT model.Though, both require their own amount of preprocessing of the data that is separate from the main preprocessing already completed. Alongside that, it would be beneficial to have the models be able to take in input and be easily accessible so that anyone who is concerned about whether or not a job position is real or fake can check through imputing the information into a search box and submitting it, giving the models more data to use to improve their accuracy along with giving the public peace of mind to know that their data isn't going to get stolen.
Explore More!
Related Projects