Exploring New Techs!

Movie Recommendation System with GUI

 


Hello Technotizers, this article will present to you a simple Movie Recommendation System with GUI developed with Python.

Everybody loves watching movies irrespective of age, gender or geographical location. People from different location get connected to each other via this amazing medium. Yet what’s the most astonishing is the fact that how unique our choices are in terms of movie preferences. Some people like movies according to genres like action, romance, sci-fi, etc. whereas some people prefer to watch movies of their favorite actors and directors. Hence considering all this, it is really very difficult to predict whether a certain movie will be liked by everyone or not. But on that note, it is still observed that similar movies are liked by a specific part of the society.

So this is where data scientists come into play by extracting all the behavioral patterns of not only the users but also form the movies itself. Now let’s let’s jump right into the basics of a recommendation system.

·         What is a Recommendation System?

In simple words, a Recommendation System’s goal is to go through a filtering process and extract patterns of user preferences and various products and services to recommend similar products and services to the users. This is observed quite often nowadays through various social media sites or shopping sites. Search for a mobile phone and these sites will recommend to you various mobiles and related products for almost 1-2 weeks. Watch an action movie once and Netflix will be notifying you for watching other similar action movies. There are many advantages and applications of a Recommendation system and they are used insistently by all the companies to recommend their products and services.

·         What are the different filtration strategies?


  • Content-based Filtering

This filtration strategy is based on the data provided about the product and services. The algorithm is designed to recommend products that are similar to the ones that a user has liked previously. This similarity that is generally cosine similarity (a mathematical technique) is analyzed from the data available about the products and services as well as the user’s past preferences.

For example, if a user likes movies such as ‘Joker’ then we can recommend him the movies of ‘Joaquin  Phoenix ’ or movies with the genre ‘Thriller’ or maybe even movies directed by ‘Todd Phillips’. So what exactly happens here the recommendation system checks the previous preferences or searches of the user and finds the film “Joker”, then tries to find similar movies to that using the information available in the database such as the movie cast, the director, genre of the film, production house, etc. and based on this information find movies similar to “Joker” like ‘American Psycho’ and ‘Entertainment’.

Disadvantages

    1. Different products cannot be much exposed to the user.
    2. Businesses and companies cannot be expanded as the user does not try different types of products as inly similar products are recommended.

 

  • Collaborative Filtering

This filtering strategy relies on the combination of user behavior and comparison and contrast with the behavior of other users in the database. Every user's history plays an important role in this algorithm. The primary difference between content-based filtering and collaborative filtering as in the latter, the interaction of all users with the elements affects the recommendation algorithm, while for content-based filtering only the data of the user concerned is taken into consideration. There are multiple approaches to implementing collaborative screening. but the main concept to capture is that the collaborative filtering of multiple users' data influences the result of the recommendation and is not dependent on any single user's data for modeling purposes.

 

·         This is a basic movie recommendation system build with python which suggests movies based on the movie given by user.

·         Libraries used:

It uses python libraries like pandas, numpy, Tkinter and some modules from sklearn.

Pandas: Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 

Numpy: NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Tkinter: Tkinter is a Python binding to the Tk GUI toolkit. It is the standard Python interface to the Tk GUI toolkit, and is Python's de facto standard GUI. Tkinter is included with standard Linux, Microsoft Windows and Mac OS X installs of Python. The name Tkinter comes from Tk interface.

Sklearn: Scikit-learn is a free software machine learning library for the Python programming language.  It features various classificationregression and clustering algorithms including support vector machinesrandom forestsgradient boostingk-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

This project is a content based recommender which uses the “movie.csv” data file to recommend related movies. Features like cast, genres and director are included in it and movies are recommended based on these features. This dataset exclusively covers Bollywood movies but can be used for any data set by setting the features as wanted.

 

CountVectorizer method from sklearn is used to transform the text in combined features into a vector on the basis of the frequency (count) of each word that occurs in the entire text. In order to use textual data for predictive modeling, the text must be parsed to remove certain words – this process is called tokenization. These words need to then be encoded as integers, or floating-point values, for use as inputs in machine learning algorithms. This process is called feature extraction (or vectorization).

Scikit-learn’s CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the ​pre-processing of text data prior to generating the vector representation. This functionality makes it a highly flexible feature representation module for text.

Cosine similarity measures the similarity between two movies based on the matrix produced by the CountVectorizer. Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis. Cosine similarity is a metric used to determine how similar two entities are irrespective of their size. When we say two vectors, they could be two product descriptions, two titles of articles or simply two arrays of words.

 

The GUI is created using Tkinter which accepts user input and displays the result.

An example of the implementation and its output is given below.



With this we come to an end of this article. Hope it was helpful. Do provide your feedback and ideas through comments, it would be highly appreciated. See you soon!

Keep coding and exploring new techs!!

 

 

0 comments:

Post a Comment