From Getting Dataset To Model in Production — Part 1

Prerna Sharma

a year ago

This is a series that gives you will give you solid steps and understandings by gathering data to deploy the model in production🚀

What’s the project?

So the project we are building is to take a plant’s leaf🌿 from the user and predict is the leaf is healthy or not😷. Using Tensorflow 2.0. Let’s get started
Part 2 here

Part 1

In this part we will discuss things as follows:
  • Getting Dataset
  • Data Exploration
  • Making Training and Testing Dataset
  • Making the Model
  • Training and & Testing
  • Exporting the Model

Getting Dataset (Fuel)⛽

Getting Dataset | Insideaiml
Getting Dataset | Insideaiml
In this section, we will discuss where we got the dataset. And how you can get a dataset for your ideas.
So the dataset we are using is from a Kaggle competition Plant Pathology 2020 — FGVC7 to identify the category of foliar diseases in apple trees.
At the time when I am writing this blog, this competition is still running and anyone can participate in this competition.
Kaggle is home for data scientists, there are thousands of different dataset in Kaggle (and much more), whenever you are searching for any dataset, try kaggle first.

Data Exploration (Discovery)

Data Exploration | Insideaiml
Data Exploration | Insideaiml
In this section, we will discuss about the dataset (Exploratory Data Analysis) and target class distribution.
The Training Dataset looks something like that
The train images and in separate another folder named images, there are four classes — healthy, multiple_diseases, rust, scab.
There is a total of 1821 training images and 1821 testing images. YES!
These are the images with their label class. the shape of images is (1365, 2048)
Data Distribution

Making Training and Testing Dataset (Ready up)

To generator batch of inputs and outputs for model and generating MORE training data (Data Argumentation), we will use the Keras Data Generator ⚙
How Keras Generators Works
Keras Generator generates a Batch of Inputs and Outputs while training the model. The Advantages of Keras generators are :
  • They use very little memory.
  • The offer Data Argumentation (is a way for generating more training data by:
Randomly Flipping images horizontally or vertically, Randomly zooming and rotating the imagesRandomly Shifting the image (horizontally or vertically)And much more…
  • They made it very easy to read image data and making inputs and outputs for the model.
Fitting to Training Data
We read the image from /image folder and the corresponding label of the images from getting from x_col=”image_id”.
Resize the image to (512, 512), and the rest is pretty normal.

Making the Model (….)

In this section, we will make out Keras Model for training, and see the architecture of out model,
I experiment with many models, Xception, DenseNet121, InceptionResNetV2 ……..
After long experimentation (training different model), days of GPU, I finally come to this point
So this is what I do, after experimenting, I found that the combination of Xception and DenseNet121 (Ensembling Models) is performing best of all…
Architecture of networks
  • Xception
  • DenseNet121
Ensembled Network
  • The input layer is where the image comes
  • The sequestion_1 & sequential are Xception and DenseNet121with added GlobalAveragePooling2D and output layer in both networks
  • The average layer takes the output from Xception and DenseNet121 and averages them.
Here the code, Yeeee….

Training the Model (Rest time)

In this section, we will set LearningRateScheduler for our model and then train it.Seeing the results and test it.
LearningRateScheduler
Learning rate in very important hyperparameter in Deep Learning, It calculates how much to change the model weights after getting the estimated error while training
  • The too low learning rate can slow down the learning process of the model and will take much longer time to converge to optimal weights.
  • Too Large learning rates can unstable the training process.
MODEL TRAINING
RESULTS
After ~ 8 HOURS of Model training
The performance is pretty amazing, I don’t know what to say
After Predicting the testing dataset and submit it to competition …….. 😱

Exporting Model

In this section, we will save out model architecture and model weights into a .h5 file extension (we will need this in the next section)
Done.. too easy guys
Things Learned (MOST IMPORTANT)
  • Ensembling Model (Always)
  • Checkout for OVERFITTING
  • Combination of Xception and Densenet121 (to try in future again)
  • …..
Also if you want to see the whole code and try it yourself, you can see the code here       (rate if you like 😁).

Not Important but I still add it

Your Project Ideas 💡
This is a time of Coronavirus🦠, so why not to contribute as a Machine Learning Engineer. YES!
So the purpose of the project is to classify X-Ray images as COVID-19, Pneumonia, and Normal. Pretty Interesting, it’s it.
There are many datasets out there. But these are some popular ones
  • COVID-19 Xray Dataset (recommended)
  • COVID-19 image data collection
  • Chest Xray Images PNEUMONIA and COVID-19
So guys try the dataset, build models and deploy them (learn deployment in next part). And share it with me, I love seeing projects by different people😀.
Learn more about Data Science models InsideAIML.

Submit Review

We're Online!

Chat now for any query