Data Classification using Various Learning Algorithms : Current School News

Data Classification using Various Learning Algorithms

 – Data Classification using Various Learning Algorithms –

Download Data Classification using Various Learning Algorithms project materials: This project material is ready for students who are in need of it to aid their research.

ABSTRACT  

Dimensionality reduction provides a compact representation of an original high-dimensional data, which means the reduced data is free from any further processing and only the vital information is retained. For this reason, it is an invaluable preprocessing step before the application of many machine learning algorithms that perform poorly on high-dimensional data. In this thesis, the perceptron classification algorithm – an eager learner – is applied to three two-class datasets (Student, Weather and Ionosphere datasets).

The k-Nearest Neighbors classification algorithm – a lazy learner – is also applied to the same two-class datasets. Each dataset is then reduced using fifteen different dimensionality reduction techniques. The perceptron and k-nearest neighbor classification algorithms are applied to each reduced set and the performance (evaluated using confusion matrix) of the dimensionality reduction techniques is compared on preserving the classification of a dataset by the k-nearest neighbors and perceptron classification algorithms.

This investigation revealed that the dimensionality reduction techniques implemented in this thesis seem to perform much better at preserving K-Nearest Neighbor classification than they do at preserving the classification of the original datasets using the perceptron. In general, the dimensionality reduction techniques prove to be very efficient in preserving the classification of both the lazy and eager learners used for this investigation. 

INTRODUCTION  

Data volumes and variety are increasing at an alarming rate making very tedious any attempt to glean useful information from these large data sets. Extracting or mining useful information and hidden patterns from the data is becoming more and more important but can be very challenging at the same time. A lot of research done in domains like Biology, Astronomy, Engineering, Consumer Transactions and Agriculture, deal with extensive sets of observations daily.

Traditional statistical techniques encounter some challenges in analyzing these datasets due to their large sizes. The biggest challenge is the number of variables (dimensions) associated with each observation. However, not all dimensions are required to understand the phenomenon under investigation in high-dimensional datasets; this means that reducing the dimension of the dataset can improve accuracy and efficiency of the analysis.

In other words, it is of great help if we can map a set of points, say n, in d-dimensional space into a p-dimensional space -where p << dso that the inherent properties of that set of points, such as their inter-point distances, their labels, etc., does not suffer great distortion. This process is known as Dimensionality reduction. A lot of methods exist for reducing the dimensionality of data.

There are two categories of these methods; in the first category, each attribute in the reduced dataset is a linear combination of the attributes of the original dataset. In the second category, the set of attributes in the reduced dataset is a subset of the set of attributes in the original dataset. 

REFERENCES

N. Sharma and K. Saroha, “Study of dimension reduction methodologies in data mining,”
in International Conference on Computing, Communication and Automation, 2015, pp.
133–137.

I. K. Fodor, “A survey of dimension reduction techniques,” Center for Applied Scientific
Computing, Lawrence Livermore National Laboratory, no. 1, pp. 1–18, 2002.

D. Achlioptas, “Database-friendly random projections: Johnson-Lindenstrauss with binary
coins,” J. Comput. Syst. Sci., vol. 66, no. 4, pp. 671–687, 2003.

A. S. Nsang, I. Diaz, and A. Ralescu, “Ensemble Clustering based on Heterogeneous
Dimensionality Reduction Methods and Context-dependent Similarity Measures,” Int. J.
Adv. Sci. Technol., vol. 64, pp. 101–118, 2014.

A. S. Nsang, A. Maikori, F. Oguntoyinbo and H. Yusuf, “A New Random Approach To
Dimensionality Reduction,” in Int’l Conf. on Advances in Big Data Analytics | ABDA’15 |,
2014, vol. 60, no. 6, pp. 2114–2142.

D. H. Deshmukh, T. Ghorpade, and P. Padiya, “Improving classification using
preprocessing and machine learning algorithms on NSL-KDD dataset,” in Proceedings –
2015 International Conference on Communication, Information and Computing
Technology, ICCICT 2015, 2015.

I. Kalamaras, “A novel approach for multimodal graph dimensionality reduction,”
Imperial college London, 2015.

I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda,
“Machine Learning and Data Mining Methods in Diabetes Research,” Comput. Struct.
Biotechnol. J., vol. 15, pp. 104–116, 2017.

T. M. Mitchell, Machine Learning, vol. 1, no. 3. 1997.

S. B. Kotsiantis, “Supervised machine learning: A review of classification techniques,”
Informatica, vol. 31, pp. 249–268, 2007.

CSN Team.

Tags: , , ,

Comments are closed.

Hey Hi

Don't miss this opportunity

Enter Your Details