NUS Datathon 2025

🎯

The Challenge

Motivated by the challenge of classifying businesses as Global Ultimate vs Domestic Ultimate, where data redundancy and feature imbalance risked reducing model performance and accuracy.

My Contributions

🔧Feature Engineering

Created meaningful predictors from raw business data, raising model accuracy by identifying and engineering key business metrics features.

📊Feature Selection

Applied correlation matrix analysis to remove redundant features and implemented feature importance ranking to reduce noise, improving model accuracy through strategic feature selection and simplifying the model by removing low-impact variables.

🤖Model Implementation

Implemented a Random Forest Classifier in scikit-learn, overcoming class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) for optimal performance and robust model training.

📈Performance Results

Achieved 82% accuracy for Global Ultimate classification and 80% accuracy for Domestic Ultimate classification, successfully overcoming class imbalance issues and delivering robust model performance on test data.