What if BlockBuster Had ML to Help Them Stock Their Shelves…?

Inspirit AI Project Presentation

Jim Rose

2023-08-17

The Problem

Movie Genre Classification

Can we help create a fully automated blockbuster by classifying new movies into the correct genre section based only on their description?

We’ll use a data set of 54214 movies with their:

  • genre (label)
  • descriptions (input)

Custom-trained Word2Vec embeddings performed better than pretrained models

Embedding Models Used:

  • Glove pre-trained “Wiki-gigaword 300” model (“glove”)
  • Spacy pre-trained word2vec model (“SpCyw2v”)
  • Word2Vec model trained on my movie description data (“myw2v”)

Baseline Models: Logistic Regression

Validation Accuracy

Over/under fitting depending on model complexity

Accuracy

Next, I tried a series of models of increasing complexity

Note

Simpler, linear models underfit the data

More complex models overfit

Without further tweaking the simple models won out on validation set accuraacy for now

Shortening embedding vector lengths helped some models

I recreated the embeddings using vectors of length 100 instead of the original 300…which seem to help to increase validation accuracy for some models like RandomForest

What I’d do next:

Perhaps the overfitting can be overcome by full-scale hyperparameter tuning?

Thank you!

Big shout out to Anil for being a great small group instructor!