Hello Technotizers, this article will present to
you a simple Movie Recommendation System with GUI developed with Python.
Everybody loves watching movies irrespective of age,
gender or geographical location. People from different location get connected
to each other via this amazing medium. Yet what’s the most astonishing is the
fact that how unique our choices are in terms of movie preferences. Some
people like movies according to genres like action, romance, sci-fi, etc.
whereas some people prefer to watch movies of their favorite actors and
directors. Hence considering all this, it is really very difficult to predict
whether a certain movie will be liked by everyone or not. But on that note, it
is still observed that similar movies are liked by a specific part of the
society.
So this is where data scientists come into play by
extracting all the behavioral patterns of not only the users but also form the
movies itself. Now let’s let’s jump right into the basics of a recommendation
system.
·
What is a Recommendation System?
In simple words, a Recommendation System’s goal is to go
through a filtering process and extract patterns of user preferences and
various products and services to recommend similar products and services to the
users. This is observed quite often nowadays through various social media sites
or shopping sites. Search for a mobile phone and these sites will recommend to
you various mobiles and related products for almost 1-2 weeks. Watch an action
movie once and Netflix will be notifying you for watching other similar action
movies. There are many advantages and applications of a Recommendation system
and they are used insistently by all the companies to recommend their products
and services.
·
What are the different filtration
strategies?
- Content-based Filtering
This filtration strategy
is based on the data provided about the product and services. The algorithm is
designed to recommend products that are similar to the ones that a
user has liked previously. This similarity that is generally cosine similarity
(a mathematical technique) is analyzed from the data available about the products
and services as well as the user’s past preferences.
For example, if a user
likes movies such as ‘Joker’ then we can recommend him the movies of ‘Joaquin Phoenix ’ or
movies with the genre ‘Thriller’ or maybe even movies directed by ‘Todd Phillips’. So
what exactly happens here the recommendation system checks the previous
preferences or searches of the user and finds the film “Joker”, then tries to
find similar movies to that using the information available in the database
such as the movie cast, the director, genre of the film, production house, etc.
and based on this information find movies similar to “Joker” like ‘American
Psycho’ and ‘Entertainment’.
Disadvantages
- Different products cannot be much exposed to
the user.
- Businesses and companies cannot be expanded as the
user does not try different types of products as inly similar products
are recommended.
- Collaborative Filtering
This
filtering strategy relies on the combination of user behavior and comparison
and contrast with the behavior of other users in the database. Every user's
history plays an important role in this algorithm. The primary difference
between content-based filtering and collaborative filtering as in the latter, the
interaction of all users with the elements affects the recommendation
algorithm, while for content-based filtering only the data of the user concerned
is taken into consideration. There are multiple approaches to implementing
collaborative screening. but the main concept to capture is that the
collaborative filtering of multiple users' data influences the result of the
recommendation and is not dependent on any single user's data for modeling
purposes.
·
This is
a basic movie recommendation system build with python which suggests movies
based on the movie given by user.
·
Libraries
used:
It uses python libraries like pandas, numpy,
Tkinter and some modules from sklearn.
Pandas: Pandas is a software library written
for the Python programming language for data manipulation and analysis. In
particular, it offers data structures and operations for manipulating numerical
tables and time series.
Numpy: NumPy is a library for the Python
programming language, adding support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level mathematical functions to
operate on these arrays.
Tkinter: Tkinter is a Python
binding to the Tk GUI toolkit. It is the standard Python interface to the Tk
GUI toolkit, and is Python's de facto standard GUI. Tkinter is included with
standard Linux, Microsoft Windows and Mac OS X installs of Python. The name
Tkinter comes from Tk interface.
Sklearn: Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
This project is a content based recommender
which uses the “movie.csv” data file to recommend related movies. Features like
cast, genres and director are included in it and movies are recommended based
on these features. This dataset exclusively covers Bollywood movies but can be
used for any data set by setting the features as wanted.
CountVectorizer method from sklearn is used to transform the text in
combined features into a vector on the basis of the frequency (count) of each
word that occurs in the entire text. In
order to use textual data for predictive modeling, the text must be parsed to
remove certain words – this process is called tokenization. These words
need to then be encoded as integers, or floating-point values, for use as
inputs in machine learning algorithms. This process is called feature
extraction (or vectorization).
Scikit-learn’s CountVectorizer is used to convert a
collection of text documents to a vector of term/token counts. It also enables
the pre-processing of text data prior to generating the vector representation.
This functionality makes it a highly flexible feature representation module for
text.
Cosine similarity measures
the similarity between two movies based on the matrix produced by the
CountVectorizer. Cosine similarity measures the similarity between two vectors
of an inner product space. It is measured by the cosine of the angle between
two vectors and determines whether two vectors are pointing in roughly the same
direction. It is often used to measure document similarity in text analysis. Cosine similarity is a metric used to determine
how similar two entities are irrespective of their size. When we say two
vectors, they could be two product descriptions, two titles of articles or
simply two arrays of words.
The GUI is created using
Tkinter which accepts user input and displays the result.
An example of the
implementation and its output is given below.
- For source code and detailed information of the
project visit Movie Recommendation System.
With this we come to an
end of this article. Hope it was helpful. Do provide your feedback and ideas
through comments, it would be highly appreciated. See you soon!
Keep
coding and exploring new techs!!