A Hybridized Recommendation System on Movie Data Using Content-Based and Collaborative Filtering

Filed in Computer Science Project Topics by on September 22, 2020

A Hybridized Recommendation System on Movie Data Using Content-Based and Collaborative Filtering.

ABSTRACT  

In recent times, the rate of growth in information available on the internet has resulted in large amounts of data and an increase in online users. The Recommendation System has been employed to empower users to make informed and accurate decisions from the vast abundance of information. In this Research, we propose a hybrid recommender engine which combines Content-Based and Collaborative filtering recommendations.

This seeks to explore how prediction accuracy can be enhanced in existing collaborative filtering frameworks. We investigate to see if a Recommendation System combining Content-based and Collaborative filtering, using a Mahout Framework and built on Hadoop will improve recommendation accuracy and also alleviate scalability issues currently experienced in processing large volumes of data for recommending items to users.

We employed the Feature augmentation hybrid technique where the output from the Content-based recommendation is used as an input to Collaborative filtering. The wellknown MovieLens data was matched with the Internet Movie Database (IMDB) in order to extract user and item content features. The input files generated from the integration of both databases was converted to text files which serve as an input into the Collaborative filtering framework in Mahout. 

TABLE OF CONTENTS

CERTIFICATION……………………………………………………………………………………………… ii
ABSTRACT……………………………………………………………………………………………………….iii
ACKNOWLEDGEMENT………………………………………………………………………………….. iv
DEDICATION……………………………………………………………………………………………………. v
TABLE OF CONTENTS …………………………………………………………………………………… vi
LIST OF ABBREVIATIONS …………………………………………………………………………….. ix
LIST OF FIGURES ……………………………………………………………………………………………. x
LIST OF TABLES …………………………………………………………………………………………….. xi

CHAPTER ONE ………………………………………………………………………………………………… 1
INTRODUCTION………………………………………………………………………………………………. 1
1.1 BACKGROUND OF THE STUDY……………………………………………………………. 1
1.2 PROBLEM STATEMENT ……………………………………………………………………….. 2
1.3 AIM AND OBJECTIVES …………………………………………………………………………. 3
1.4 SIGNIFICANCE OF THE STUDY ………………………………………………………….. 4
1.6 SYNOPSIS……………………………………………………………………………………………….. 4

LITERATURE REVIEW …………………………………………………………………………………… 5
2.1 INFORMATION RETRIEVAL AND FILTERING…………………………………… 5
2.2 RECOMMENDER SYSTEM TYPES AND TECHNIQUES……………………… 6
2.2.1 ENTITIES IN RECOMMENDATION SYSTEMS…………………………………….. 6
2.2.2 COLLABORATIVE FILTERING (CF)…………………………………………………….. 9
2.2.3 CONTENT-BASED RECOMMENDATION (CBR) ………………………………… 10
2.2.3.1 THE STRENGTH AND WEAKNESS OF CONTENT-BASED
RECOMMENDATION…………………………………………………………………………………….. 10
2.2.4 HYBRID RECOMMENDATION AND APPROACH ……………………………… 12
2.2.4.1 POSSIBLE COMBINATION OF HYBRID RECOMMENDATION……….. 13
2.3 APACHE MAHOUT ………………………………………………………………………………. 14
2.3.1 DEVELOPMENT OF A SIMPLE RECOMMENDER USING MAHOUT
LIBRARY………………………………………………………………………………………………………… 16
2.4 HADOOP……………………………………………………………………………………………….. 17
2.5 RELATED WORK………………………………………………………………………………… 17

CHAPTER THREE………………………………………………………………………………………….. 20
RESEARCH METHODOLOGY ………………………………………………………………………. 20
3.1 INTRODUCTION………………………………………………………………………………….. 20
3.2 METHODOLOGY …………………………………………………………………………………. 20
3.3 CONTENT BASED RECOMMENDATION……………………………………………. 22
3.4 COLLABORATIVE FILTERING USING MAHOUT …………………………….. 24
3.5 RECAP…………………………………………………………………………………………………… 25

CHAPTER FOUR…………………………………………………………………………………………….. 26
IMPLEMENTATION, RESULTS, PRESENTATION AND DISCUSSION ……….. 26
4.1 OVERVIEW OF THE IMPLEMENTATION APPROACH…………………….. 26
4.2 EXTRACTION OF IMDB DATA……………………………………………………………. 26
4.2.1 SOFTWARE TOOLS……………………………………………………………………………… 26
4.2.1.1 SQLObject……………………………………………………………………………………….. 27
4.2.1.2 PSYCOPG……………………………………………………………………………………….. 27
4.2.1.3 POSTGRESQL………………………………………………………………………………… 27
4.3 EXTRACTION OF MOVIELENS DATA………………………………………………. 28
4.3.1 MOVIELENS RATING INFORMATION ………………………………………………. 28
4.3.2 MOVIELENS ITEM INFORMATION………………………………………………….. 29
4.3.3 EXTRACTING MOVIELENS USER FEATURES………………………………….. 30
4.4 ITEM FEATURES EXTRACTION AND COMBINATION…………………….. 31
4.5 IMPLEMENTATION OF RECOMMENDER ENGINE BY APACHE
MAHOUT………………………………………………………………………………………………………… 32
4.5.1 CLOUDERA…………………………………………………………………………………………… 33
4.5.2 APACHE MAVEN………………………………………………………………………………….. 33
4.6 MAHOUT RECOMMENDER COMPONENTS – PARAMETERS
OPTIMIZATION……………………………………………………………………………………………… 34
4.6.1 DATASET………………………………………………………………………………………………. 34
4.6.2 SIMILARITY METRICS AND NEIGHBORHOOD CRITERIA ………………. 35
4.7 SYSTEM EVALUATION……………………………………………………………………….. 38
4.7.1 PERFORMANCE MEASURE………………………………………………………………… 38
4.7.2 USER CONTENT FEATURES……………………………………………………………….. 39
4.7.3 ITEM CONTENT FEATURES………………………………………………………………. 41
4.7.4 COMPARING USER/ITEM CONTENT FEATURES ………………………………. 43

CHAPTER FIVE ……………………………………………………………………………………………… 45
SUMMARY AND CONCLUSIONS ………………………………………………………………….. 45
5.1 SUMMARY ……………………………………………………………………………………………. 45
5.2 CONCLUSION ……………………………………………………………………………………… 45
5.3 RECOMMENDATION AND FUTURE WORKS …………………………………… 46

REFERENCES…………………………………………………………………………………………………. 47

INTRODUCTION  

The rate at which information is growing on the internet has resulted in large amounts of data and an increase in online users. This huge explosion of data has flooded users with large volumes of information and hence poses a great challenge in terms of information overload. Resultantly, this has made it very difficult for human beings to process such information manually and quite difficult for them to find the right information.

The ability to make informed and accurate decisions from the sheer abundance of information by users often creates immense confusion. Large internet companies like Amazon, Google, and Facebook have been faced with a difficulty in managing this explosion of information. Recommendation systems have been employed in order to transform this problem in a smart way.

The vast increase in online data and users led to the rise of big data. The Big Data world has paid the most attention to the Recommendation System. Big Data has improved the capacity to do recommendations on a large scale. It has made the Recommendation System more important for the users as it predicts right piece of information out of vast amounts of information.

The system is a particular form of information filtering that exploits users’ past behaviors or by the behavior of similar users to generate a list of information items that is personally tailored to an end user’s preferences. At present, in E-commerce, Recommendation Systems (RSs) are broadly used for information filtering processes to deliver personalized information by predicting user’s preferences to particular items.

REFERENCES

G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of Recommender
Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Trans. on
Knowledge and Data Engineering, vol. 17, pp. 734-749, June 2005.

D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, “Using Collaborative Filtering to
Weave an Information Tapestry,” Communication of the ACM, vol. 35, pp. 61-70, 1992.

B. Miller, I. Albert, S. Lam, J. Konstan, and J. Riedl, “MovieLens Unplugged:
Experiences with an Occasionally Connected Recommender System,” in Proc. ACM 2003
International Conference on Intelligent User Interfaces, ACM, 2003, pp. 263-266.

R. Burke, “Hybrid Recommender Systems: Survey and Experiments,” User Modeling
and User-Adapted Interaction, vol. 12, pp. 331-370, 2002.

M. Balabanovic and Y. Shoham, “Fab: Content-Based, Collaborative
Recommendation,” Comm. ACM, vol. 40, pp. 66-72, March 1997.

J. S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of Predictive
Algorithms for Collaborative Filtering,” in Proc. 14th Conf. Uncertainty in Artificial
Intelligence (UAI-98), Morgan Kaufmann, Madison, WI, 1998, pp. 43-52.

L. H. Ungar and D. P. Foster, “Clustering Methods for Collaborative Filtering,” in
Proc. Workshop on Recommender Systems, Papers from 1998 Workshop, Technical
Report WS-98-08, 1998.

CSN Team.

Comments are closed.

Hey Hi

Don't miss this opportunity

Enter Your Details