Back To Projects
Stellar Classification based on Numerous Characteristics using Machine Learning
Roberto T. | Fall 2022 |

The task of stellar classification can be tedious and lengthy when done manually. One can expedite stellar classification by creating an artificial intelligence model to automate the process. The current stellar classification model serves to effectively categorize stars for research purposes regarding their distribution around the universe, so automating the development of this resource would allow professionals to allocate more time to explore the bounds of our current understanding of space and the universe. After finding and analyzing a dataset containing numerical and categorical features, a supervised learning approach was then used to train and test different models on their ability to classify the stars in the given test set. A Decision Tree Classifier, Random Forest Classifier, Ridge Classifier, and Support Vector Classifier were trained and tested using the data.


The task of stellar classification can be tedious and lengthy when done manually. One can expedite stellar classifi- cation by creating an artificial intelligence model to automate the process. As we as a species continue to explore the frontier of the observable universe, we should seek to automate time intensive problems like stellar classification. The current stellar classification model serves to effectively categorize stars for re- search purposes regarding their distribution around the universe, so automating the development of this resource would allow professionals to allocate more time to explore the bounds of our current understanding of space and the universe. After finding and analyzing a dataset containing numerical and categorical features, a supervised learning approach was then used to train and test different models on their ability to classify the stars in the given test set. A Decision Tree Classifier, Random Forest Classifier, Ridge Classifier, and Support Vector Classifier were trained and tested using the data. The most successful models were the Decision Tree Classifier and Random Forest Classifier, each with about a 94 percent prediction accuracy across different accuracy metrics on the test data. Despite some drawbacks in regards to the availability of usable data, four models were trained and two were proven to be consistently and successfully accurate. Any future attempts at developing models for stellar classification should concentrate more on gathering data as to have a more thoroughly trained set of models.

Explore More!

Published Paper
Roberto T.
Sophia Barton
Computer Science MS from Stanford

Related Projects

Fast and Accurate Gamma-Ray/Hadronic Particle Shower Classification Using Machine Learning

This research applies dimensionality reduction techniques, such as Pearson Correlation and Principal Component Analysis, to machine learning models like Random Forests and Support Vector Classifiers to simplify the prediction of atmospheric gamma-ray particle showers, achieving a modest accuracy increase while reducing the number of features used.
Leonardo V. | Fall 2024
Mentored by Pablo Bonilla
Using Deep Learning to Predict the Half-Lives of Isotopes Given Proton and Neutron Count

This study explores using deep learning regression models to predict superheavy isotopes' half-lives based on proton and neutron counts, finding that additional complexity in input variables and model architecture may be necessary to improve prediction accuracy.
Akilan P. | Summer 2024
Mentored by Erick Ruiz
workspace_premium
Impact of Class Weights and Feature Importance in Automated Stroke Detection

In this research paper, we do a parametric study of class weighting as a way to tackle imbalance during training. We then infer the most important features that should be taken into consideration for stroke prediction.
Avyukth H. | Summer 2022
Mentored by