Our team developed a machine learning model to automatically score essays based on the Holistic Scoring Rubric.

Project period: May 2024 - June 2024

Project team:

  • Le Thi Minh Phuong (Team leader)
  • Vo Hoang Hoa Vien
  • Pham Le Tu Nhi
  • Huynh Tri Nhan
  • Hoang Trung Nam
  • Huynh Cao Khoi

Role:

  • Data Processing
  • Modeling and evaluating.

Tools Python, NLTK, SpellChecker, LGBM

Overview

This project is a part of my course: “Intelligent Data Analysis”. In this course, I was introduced to the fundamental concepts of data analysis, various methods for conducting effective analysis, and how to develop a analytical mindset.

Together with my teammates, I participated in a Kaggle competition where we try to automatically score essays based on the Holistic Scoring Rubric. Our team developed a machine learning model to tackle this challenge.

In this project, my main tasks are:

  • Applied feature engineering and NLP techniques to extract semantic features from essays.
  • Analyzed baseline’s performance and conducted experiments to improve model’s accuracy.

**Conclusion **

Finally, our team achieved a competitive model that improved the overall accuracy, contributed to a higher team ranking on the competition leaderboard