Udacity Capstone Proposal

Convolutional Neural Networks (CNN) project
Dog Breed Identification

Ante Krištić
January 2020.

Domain Background

In the last decade, machine learning ( ML ) becomes more popular thanks to very powerful computers that can handle to process lots of data in a reasonable amount of time. Machine learning concept is introduced by Arthur Samuel in 1959 so it is not new, but today we can use lots of its potential. One more reason for this is that today there is a lot of digitized data that we need to successfully implement good ML models.

Dog breed identification problem is well known in the ML community. We can find it on Kaggle:
Here is a link to an article where some similar academic work is done but on flowers:

When I make this model I want to build an android app and publish it on google play.
One more reason for choosing this problem is because I believe this is a good path to solve a bigger problem: How many calories and carbs have food in a picture? Solving this problem is the main reason I want to learn ML.

Problem Statement

The idea is that we take a picture with a phone and the app tells us what dog breed is on the picture. Additionally, we want that app to tell us to what dog breed a human is most look alike. We want to have an answer to a few questions here. The main question is What dog breed is on the picture? The second question is: How would you look if you were a dog? To answer these two questions we first have to answer the question: Is on the picture human or a dog?
This is a supervised learning problem and because we have our dog images divided into breed classes we will use classification predictive modeling more precisely multi-class predictive model.

Datasets and Inputs

To solve our problem our input data must be images because we want to user takes an image of a dog (or human) with his phone, sent it to our server and we would return what dog breed is most likely in a picture (or to which dog breed is human most look like).
All data for this project is provided by Udacity. We have pictures of dogs and pictures of humans.
All dog pictures are sorted in train(6,680 Images), test(836 Images) and valid(835 Images) directory, and all the images in these directories are sorted in breed directories. We have 133 folders (dog breeds) in every train, test and valid directory.

Human pictures are sorted by name of each human. We have 13,234 Files (Images), 5,749 Folders(Humans)
Our data is not balanced because we have one image of some people and several for others. The same is for dog images. (the difference is from 1 to 9 images in most cases)
Dog images have different image sizes, different backgrounds, some dogs are in full sizes and some just ahead. Lightning is not the same. That is actually ok because we don’t know how users’ images will be, and we want that our model works on different types of images. Human images are all of the same size 250×250. Images are with different backgrounds, light, from different angles, sometimes with few faces on the image. Here are a few samples of our dog and human images:

We can see from the example above that some images have more than one human or a dog on the same image. Maybe it would be a good idea to remove those images. I will test this to see the results.

Solution Statement

We will use Convolutional Neural Networks (CNN) to make a model. CNN is a part of deep neural networks and is great for analyzing images. It would be great if we could mix CNN with XGBoost to find the best possible model but that idea needs more research because I don’t know if this is doable.  To find if the picture is human or not we will use the OpenCV model. And to find if the dog is on a picture we will use a pre-trained VGG16 model. We will create our CNN model using transfer learning because we need a lot fewer images this way and we still can get great results.

Benchmark Model

For our benchmark model, we will use the Convolutional Neural Networks (CNN) model created from scratch with an accuracy of more than 10%. This should be enough to confirm that our model is working because random guess would be 1 in 133 breeds which are less than 1% if we don’t consider unbalanced data for our dog images. 

Evaluation Metrics

The problem we try to solve is a classification problem. Because our data is an unbalanced, simple accuracy score is not very good here. On Kaggle they use multi-class log loss metrics to evaluate models. I will also use this metric so I can compare it to results on Kaggle. I will also use F1 score testing because it considers precision and recall and it is easier for me to understand results.
We calculate F1 with formula:
F1 = 2 * (precision * recall) / (precision + recall)

Project Design

After getting a dataset that is provided by Udacity second thing we want to do is to detect humans on images.
We will use the OpenCV model to get faces from the image and that will tell us is on the image human or not. To do this we will implement Haar feature-based cascade classifiers. We can found more about this here: Object Detection using Haar feature-based cascade classifiers. Our workflow on detecting faces:
– initialize pre-trained face detector
-load image
-convert image to grayscale
-find faces in the image
-return true if the number of faces is more than 0 else return false

Then we detect dogs on images. We will use the pre-trained model VGG16 for this.
– first, we define our VGG16 model
– we will use GPU for better performance
– load and pre-process the image
– send an image to the VGG16 model
– model return index from 0 to 999 (dog classes are from 151 to 268)
– return true if the index is >=158 and <=268 else return false

Our data is already divided into training, validation and test partitions so we can now use our train data to make a benchmark model using Convolutional Neural Networks. After creating a model we will test it with test data. When we get accuracy over 10% we will proceed on building a new model using transfer learning. With transfer learning, we can build our model with fewer data to give us a better result. We will use the same training data as before. We will then test our model with the same test data as before but know we expect our accuracy to be over 60%. Then we can try to experiment with different model parameters to get better results. We will use f1 score and log loss to evaluate our models.