PREDICTING widespread disease and has endangered 2.5 billion populations

PREDICTING
DENGUE DISEASE

DINKY
KHATRI , HARSHIT WADHWA

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Department of Computer Science, The
NorthCap University

Gurugram,India

 

ABSTRACT

The main objective of this research
is to use o the classification techniques to predict the number of Dengue fever
prone cases in Jhelum district and in surrounding  near by areas. We have compared performance
rate of different classification techniques and algorithms through this
research paper. The general
agenda of this paper is to classify dataset so that users can fetch useful and
ample of information and easily identify a suitable algorithm and technique  for accurate and precise predictive model from
this paper . Naive Bayes, J48 and SMO are the best  suitable  algorithms for classified accuracy as they
achieved maximum accuracy= 100% with 98 correctly classified instances, maximum
ROC = 1 with least mean absolute error.

1. INTRODUCTION

Dengue infection is a
major disease caused by dengue germ, which infect in body of human by female
mosquito 1. Various  Symptoms include headache, sudden-onset fever, retro orbital pain, joint-pain, pain in muscles
and a rash 2.The other
name for dengue is, “breakbone fever”, that comes from the associated
muscle and joint pains. Dengue
infection is a widespread disease and has endangered 2.5 billion populations
all around the universe. Every year about 50 million of people suffer from this
life-taking disease globally 1.

According to world
health organization researches, dengue infection is divided into two major types,
i.e., type 1 and type 2, 3. First one is classical and traditional one dengue
called dengue fever and the other is called as dengue hemorrhagic fever. DHF1,
DHF2, DHF3 and DHF4 are further four 
categories of dengue hemorrhagic fever. DHF is initiated by start of
fever which continues for 3 to 7 days with 
signs including like leakage of plasma, shock and weak pulse.

Different techniques and algorithms  for dengue fever classification can be degined
and used such as Naïve Bayes classifier; decision tree, KNN Technique,
multilayered Technique and SVM 1,4,5. These techniques are evaluated based on
five common measures in data minning : accuracy, precision, sensitivity,
specificity and negative rate.

Some researchers have been working on
dengue  classification such as Mr. Tanner
et al. and Tarig et al. Tanner’s team used one of the best algorithms of data
mining the Decision tree approach and they classified upto 1200 patients record
 and found 6 remarkable  and important features and aspects3. They
got 84% accuracy rate . Tarig’s team used techni que of  Self Organizing MAP
(SOM) and ML feed-forward neural networks (MFNN). They grouped patients into
two sets and got only 70% correctness measure whereas Fatimah Ibrahim et.al
used ML perceptron’s (MLP) and got upto 90% accuracy. Daranee et al.
elaboarated using decision tree method to group dengue patients from two data
sets4. They got 97.6% and 96.6% accuracy  rate from first and second method respectively.
We use the following  algorithms and techniques: Naïve Bayesian,
J48, SMO, REP Tree and Random tree5. WEKA tool was used as Data mining tool
for classification of data.

Figure 1 : Symptoms of dengue disease

 

2.TOOL USED

2.1 WEKA

Waikato Environment for Knowledge Analysis(WEKA) is a  machine learning software coded
in Java, developed at the University of Waikato, New Zealand. It is free software licensed under
the GNU General Public License.WEKA is a very good data mining tool
for the users to classify the accuracy on the basis of datasets by applying
different algorithmic approaches 8.Our main objective is to identify that
whether the patient is affected by Dengue or not. Some of the parameter are
used for predicting the fever and compare the performance of the various
classification techniques.

The Explorer interface features several panels
providing access to the main components of the workbench:

·        
The Preprocess panel has
facilities for importing data from a database,
a CSV file, etc., and for pre-processing this data.

·        
The Classify panel enables
applying classification and regression algorithms
(called classifiersin WEKA) to the resulting dataset, to estimate
the accuracy of
the resulting predictive model, and to visualize erroneous predictions, receiver
operating characteristic (ROC) curves,
etc., or the model itself (if the model is amenable to visualization like,
e.g., a decision tree).

3.VARIOUS TERMS

 

1. Correctly Classified
Accuracy

It shows the accuracy
percentage of test that is correctly classified.

2. Incorrectly Classified
Accuracy

It shows the accuracy
percentage of test that is incorrectly classified.

3. Mean Absolute Error

It shows the number of errors
to analyse algorithm classification accuracy.

4. Time

It shows how much time is
required to build model in order to predict disease.

5. ROC Area

Receiver
Operating Characteristic represent test performance guide for classifications
accuracy of diagnostic test based on: excellent (0.90-1), good (0.80-0.90),
fair (0.60-0.70), poor (0.60-0.70), fail (0.50 – 0.60).

 

4.DATASET
USED

The dataset was
collected from District Headquarter Hospital (DHQ) Jhelum. For properly
categorizing our dataset, different classification techniques are used.

5. DATASET ATTRIBUTES

Figure 2 : Attributes of the dengue dataset

6.CLASSIFICATIONS

6.1.NAÏVE BAYES (Refer the links of the theorem
from journal/paper about from which journal you read about these theorems)

Naive
Bayes classifier or algorithm is based on applying Bayes’
theorem. This algorithm works as a probabilistic classifier, i.e. it predicts
class membership predictions.It is not a single algorithm, but a family of
algorithms based on a common principle: all naive Bayes classifiers assume that
the value of a particular feature is independent of the value of any other
feature.We applied Naïve Bayes algorithm to make predictions of so many
attributes by using 10 cross validation. This algorithm on running produced an
output of a 100% accuracy for 99 correctly classified instances. Also, the Mean
Absolute Error comes out to be 0.0011 i.e. some error rated are achieved.  Time taken for building the model is 0
seconds and ROC area is 1 as shown in the figure.

 

6.2.J48 TREE

J48 is
a Java implementation
of the C4.5 algorithm in the Weka data
mining tool.J48 Tree has been used to decide the target value based
on various attributes of dataset to predict machine learning model and classify
their accuracy. We applied this algorithm on the dengue prediction dataset to
analyse the outputs and the result gave many statistics on using the 10 cross
validation. The algorithm achieved a 100% correctly classified accuracy for a
total of 99 instances. The mean absolute error is exactly 0. Time required to
build the model was 0 second and the ROC area achieved is 0.97 as we can see in
the figure.

 

6.3.SMO

SMO is another method used for the classification of
dengue prediction dataset. This algorithm is used to split the data on the
basis of dataset. We run this algorithm on our dataset by using 10 cross
validation technique in the weka tool and obtained a result with different
statistics. This output is then analysed and the following table is obtained.
We achieved a 100% classification accuracy and no error rates as the mean
absolute error comes out to be 0. The time required to build such a model is 0
seconds and the ROC area obtained is 0.909.

6.4.REP TREE

Classification accuracy
achieved is 74.7475% correctly classified accuracy, 25.2525% are incorrectly
classified accuracy, error rates i.e. mean absolute error is 0.3655,time taken
to build model is 0.02 and ROC area is 0.547.

 

 

6.5.RANDOM TREE

Classification accuracy
achieved is 87.8788% correctly classified accuracy, 12.1212% are incorrectly
classified accuracy, error rates that is mean absolute error is 0.1853, time
taken to build model is 0 seconds and ROC area is 0.881 these are mentioned in
output.

 

 

7.CONCLUSION

 

Naïve Bayes, J48 and SMO
classified 100% correctly classified instances accuracy with minimum Mean
Absolute Error = 0 of J48 and SMO while Naïve Bayes has 0.011 error. Maximum ROC
is found in Naïve Bayes where ROC =1 and J48 Tree ROC Area comes out to be
0.979 while SMO’s is 0.909.The time taken to build model in all cases except
REP Tree is 0 seconds. In case of REP Tree, it is 0.02 seconds.

Maximum ROC Area means
excellent predictions performance as compared to other algorithms. Weka for
prediction of diseases is that it can easily diagnose a disease even in case
when the number of patients for whom the prediction has to be done is huge or
in case of very large data sets spanning lakhs of patients. Even though Weka is
a powerful data mining tool to analyse the overview of classification and visualization
of result in medical health to predict disease among patients but we can use
other tools such as Matlab in order to further classify different data sets.

                        Table 1: Accuracy prediction table of different
algorithms

Algorithm

Classified Accuracy
(%)

Incorrectly
Classified Instances (%)

Mean Absolute Error

ROC Area

Naïve Bayes

100

0

0.0011

1

J48 Tree

100

0

0

0.979

SMO

100

0

0

0.909

REP Tree

74.7475

25.2525

0.3655

0.547

Random Tree

87.8788

12.1212

0.1853

0.881

 

8.
REFERENCES

 

1.
Farooqi W, Ali S (2013), A Critical Study of Selected Classification Algorithms
for Dengue Fever and Dengue Hemorrhagic Fever. Frontiers of Information
Technology (FIT), 11th International Conference on IEEE.

2.
Farooqi W, Ali S, Abdul W (2014) Classification of Dengue Fever Using Decision
Tree. VAWKUM Transaction on Computer Sciences 3: 15-22.

3.
Rigau-Pérez JG, et.al. (1998), Dengue and dengue haemorrhagic fever, The Lancet
19: 971-977

4.
Wikipedia, http://en.m.wikipedia.org/wiki/Dengue_fever, accessed in January
2015. 12.

5.
Waikato, http://www.cs.waikato.ac.nz/ml/weka,accessed in January 2015.

6. KirkbyR,
Frank E, WEKA Explorer User Guide for version 3-4-3, November2004.

7. Shakil KA et.al. (2015), “Dengue disease
prediction using weka data mining tool”, arXiv preprint arXiv: 1502.05167.

 

 

                  [email protected] , [email protected]