1.
M. Kothainayaki
– Bannari Amman Institute Of Technology, Sathyamangalam, Tamil Nadu, India.
2.
P. Thangaraj
– Bannari Amman Institute Of Technology, Sathyamangalam, Tamil Nadu, India.
Abstract
The k-means algorithm is well known for its efficiency
in clustering large data sets. However, working only on
numeric values prohibits it from being used to cluster
real world data containing categorical values. In this
paper we present the Classification of diabetic’s data
set and the k-means algorithm to categorical domains.
Before classify the data set preprocessing of data set
is done to remove the noise in the data set. We use
the missing value algorithm to replace the null values
in the data set. This algorithm is also used to improve
the classification rate and cluster the data set using
two attributes namely plasma and pregnancy attribute.
Keywords Classification, Cluster Analysis, Clustering Algorithms, Categorical Data, Pre-processing