Inspirit AI Project Presentation
2023-08-17
Can we help create a fully automated blockbuster by classifying new movies into the correct genre section based only on their description?
We’ll use a data set of 54214 movies with their:
Embedding Models Used:
Baseline Models: Logistic Regression
Accuracy
Next, I tried a series of models of increasing complexity
Note
Simpler, linear models underfit the data
More complex models overfit
Without further tweaking the simple models won out on validation set accuraacy for now
I recreated the embeddings using vectors of length 100 instead of the original 300…which seem to help to increase validation accuracy for some models like RandomForest
Perhaps the overfitting can be overcome by full-scale hyperparameter tuning?
Big shout out to Anil for being a great small group instructor!