A project to explore IMDB movie ratings data and quickly show what influences movie ratings. Approach:
- download IMDB data and store it in AWS s3 bucket
- perform EDA and use only numerical data (to save time, one-hot encoding inflates the process)
- do simple ML analysis (RF and variable importance) and show features and their relationship to ratings
- provide guidelines for better/more accurate approaches
A jupyter notebook (IMDB_rating_analysis_primer.ipynb
) which performs the above procedures and provides figures and tables:
- sklearn
- matplotlib
- numpy
- pandas
!pip install s3fs
Please do not directly copy anything without my consent. Feel free to reach out to me at https://www.linkedin.com/in/mulugeta-semework-abebe/ for ways to collaborate or use some components.