Motivated by the challenge of classifying businesses as Global Ultimate vs Domestic Ultimate, where data redundancy and feature imbalance risked reducing model performance and accuracy.
Created meaningful predictors from raw business data, raising model accuracy by identifying and engineering key business metrics features.
Applied correlation matrix analysis to remove redundant features and implemented feature importance ranking to reduce noise, improving model accuracy through strategic feature selection and simplifying the model by removing low-impact variables.
Implemented a Random Forest Classifier in scikit-learn, overcoming class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) for optimal performance and robust model training.
Achieved 82% accuracy for Global Ultimate classification and 80% accuracy for Domestic Ultimate classification, successfully overcoming class imbalance issues and delivering robust model performance on test data.