Back To Projects
Is GPT-3 smarter than a sixth-grader?
Anitej S. | Summer 2022

Question answering (QA) and Large Language models (LLM) have been a major research focus in Artificial Intelligence for several years. In 2017, a task called Textbook Question Answering (TQA) was introduced. The task included lessons from a middle school science textbook consisting of texts, diagrams, and natural questions. Many people attempted to create question answer models but reported sub-par accuracies.


I work with the Davinci model in GPT-3, an autoregressive language model, to answer middle-school science textbook questions in the textbook question answering (TQA) dataset. The dataset was split into training, test, and validation sets. In this task, I simulate a student taking a test in three different scenarios. First, the Zero-Shot-Learning experiment where I only provide the model with the questions from a specific lesson. This would be the equivalent of a student going into a test without studying as the model has not gathered any knowledge from the lesson. Second, the Few-Shot-Learning experiment, where I provide the model with specific lesson content from the textbook and the corresponding questions. This equates to a student skimming over the lesson content before taking the test. Lastly, I fine-tuned the Davinci model on some of the textbook questions and then fed it questions. This is similar to a student doing a thorough review of the material before taking the test. After conducting all three experiments, I compare their accuracies and in doing so, highlight the “intelligence” and limitations of GPT-3.

Explore More!

Source Code
Anitej S.
Eric Bradford
Electrical Engineering and Computer Science Masters from MIT, Technical PM at Apple

Related Projects

Generating Instagram Captions with ViT-GPT2 and GPT3

This paper presents a novel approach for generating Instagram captions based on visual features and language models. Our caption generator combines Vision Transformer-GPT2 and GPT3 to generate descriptive and engaging captions in the style of an Instagram post.
Ariel M. | Summer 2022
Mentored by Roger Jin
SaShiMi: Adapted for Google Colab

In this project, I convert SaShiMi, a music generation software, into something more resource-friendly.
Leo R. | Spring 2022
Mentored by Roger Jin
Generation of Research Paper Titles

Can NLP accurately and effectively generate research paper titles? In this research paper, an effective and accurate artificial intelligence NLP model is tried to be determined by evaluating various models and methods for title generation.
Christopher G. | Summer 2022
Mentored by Sean Konz