Here, our proposed solution focused on the detection of guava disease from visual symptoms followed by a curative solution recommendation. Several necessary steps described in the subsections below are combined to make the recognition process more accurate.
A. Image Acquisition Most of the images for this proposed work were captured by us and some are collected from the internet. A Nikon D7200 DSLR (24.2megapixel, ISO 100 - 25,600, liquid crystal display: 3.2 in. diagonal, 1,228,800 dots, wide viewing angle, thin-film transistor-based liquid crystal display, TFT-LCD screen, 6 frame per second shooting capacity, 24×16 mm image sensor, 51 point autofocus system) camera with 18-140 mm lens is used to capture these photographs at diverse locations and situations. In order to distinguish healthy fruit from diseased ones, one more class was added in the dataset. It contains only images of healthy guavas. So, we are performing a multiclass classification having 4 different class labels, three for the diseases and one for benign guava. Some sample images from our dataset are shown in Figure 3.
To pull off better performance from our CNN model the dataset needs to be divided into training, validation and test set using the holdout method. We have collected 10000 images for our dataset comprising 2500 images for every class label. 1800 images for each class has been deployed for training the model. 200 images have been used for validation set and 500 images are used for test set to analyze the performance of the model. The resolution of the images have been sliced down to a square 350 × 350.
Some images may contain cluttered scene, hence, we employed manual zooming in and cropping to segment out only the desired fruit region
C. Elimination of Over-fitting Overfitting causes the model to learn the noise of training data too well but very weak in generalizing features to classify. Resulting in a high accuracy for training set but nowhere near accuracy in test set. To eliminate the barrier, different classical methods have been used in our model to boost the accuracy. 1) Image Augmentation: We have imposed data augmentation on the training dataset to improve overall prediction accuracy and reduce the overfitting problem. As shown in Figure 4 re-scaling, shearing, height shifting, width shifting, zoom, rotation, horizontal flip etc. techniques are used to augment data.
D. Fitting & Matching CNN has already been proved as state-of-art approach in solving image recognition and classification problems. The overall network contains several layers each of which has one or more planes. Input layer takes all training images as input and each unit in a plane receives input from the previous layer. The receptive field for each plane are formed using weights and at every points in the plane these weights are forced to be equal. Each plane can be considered as a feature map with a fixed feature. Each layer consists of multiple planes to detect multiple features. These layers are known as convolutional layers. The exact location of each feature is less considerable after detecting it. Then, local averaging and subsampling operation are performed in next layer. The backpropagation gradient-descent procedure is used to train network. The number of weights in the network is reduced by a connection strategy. For example, Le Cun et al. connect the feature maps in the second convolutional layer only to one or two of the maps in the first subsampling layer (the connection strategy was chosen manually).
IV. EXPERIMENTAL EVALUATION As shown in Table I have devised and deployed 3 CNN models for the disease detection system. All the three models have been executed for 50 epochs to make the model efficient for best possible results. The batch size of the image for every epoch is set as 32 and the optimizer used is Adam with the categorical loss function as the classification problem is multiclass. The learning rate is a very important factor for deep learning models as high learning rate yields longer jumps for the solutions that escapes the local optima whereas a low learning rate might cause slow convergence to the optimal solution. Keeping both the factors in play, we have applied three learning rates 0.01, 0.0001 and 0.001 respectively in the three models and found that the value 0.001 in the third model produce the best convergence.