Transfer Learning with YOLO (Custom Object Detection)

Transfer Learning with YOLO (Custom Object Detection)

Problem

Transfer learning is a good method to use when you either have a small dataset and/or the features you are looking to classify is similar to the existing pretrained models. One of the most famous single image, multiobject classfier is YOLO created by PJReddie (John Redmon). By transfer learning with the preexisting weights provided by PJReddie, you can achieve a model with an extremely high IOU by using minimal hardware or training time (about 1 hour on a Tesla K80). However, transfer learning with YOLO can be convoluted since it uses PJReddie’s self written neural network library called darknet.

Solution

I recommend using Trieu’s darkflow which is the Tensorflow port of darknet. Since it uses Tensorflow, I find it a bit more stable and easier to use. Note, darkflow uses Pascal VOC data format, while YOLO uses it’s own data type so be careful which one you choose. Also note, darkflow currently only works with YOLOv2 and below. If you want to use v3, you have to compile darknet and use that instead (which I will show in a future posts).

There are plentiful resources that teach you how to train your own custom objects in YOLO such as this medium post or this post. However, I find the instructions outdated and was not able to train my own model following the posts directly. I will provide you instructions and advice that has worked for me since November, 2018.

To train a custom object, follow these steps:

  1. Install Darkflow
  2. Data annotation
  3. Prepare the configuration files and weights
  4. Train and export model
  5. Use in code

Install Darkflow

Download and install darkflow with the following command:

pip install Cython
git clone https://github.com/thtrieu/darkflow.git
cd darkflow  
python3 setup.py build_ext --inplace
pip install .

Data Annotation

There are bunch of different software that people generously provide for free to annotate your data easily. I personally find labelImg to be the best one out of the bunch. It is stable and works fast. You can also toggle between YOLO and Pascal VOC data format with a click of a button.

Create a /data/ folder in the darkflow root folder. Within the data folder, create a /annotations/ folder and a /images/ folder. Put all the annotations (.xml files) into the annotation folder and all the images (.jpg files) into the image folder.

Change the labels to what you are training for in labels.txt located in the root folder of darkflow. You can point towards a custom located text file as the label file but I find it easier just to change the default labels. Make sure the labels are the exact same ones (cap sensitive) as the ones you use in labelImg.

Prepare the configuration files and weights

I had huge problems using the existing config files and weights that is suggested on darkflow. In the end I used the original configs and weights from PJReddie directly.

Choose the model that you want use. For this example, I used the YOLOv2 trained with the VOC data. Save this file in /darkflow/cfg/ folder and the weights in the root folder (you can save the weights anywhere you want, but it’s a pain to call for the folder later when you train).

Save a copy and rename the file to something you remember. For instance, I changed my file name to yolov2-voc-3c.cfg since I was training for 3 classes. Change the following in the new configuration file:

  1. Line 244 - To the number of class you are training to. This should match the number of labels in your labels.txt
  2. Line 237 - To num * (classes + 5). Num is preset through the YOLO model which is 5 (this does not change). The number of classes is the same as the number of labels in your labels.txt. For instance, if you have 3 classes the filter should be set to 40 (5 * [3 + 5]).

Modify loader.py under darkflow/utils Change the line 121 from self.offset = 16 to self.offset = 20

self.offset = 16 to self.offset = 20

Train and export model

You can train your model two ways. One is to write a python program with all the proper options and code. Another option is to use the terminal interface that darkflow provides. I personally like to use the terminal interface. However, if you want to write a python program, you can follow Park Chansung’s notebook.

To train your own model, open up a terminal and navigate to the root of the darkflow folder and run the following:

flow --model cfg/yolov2-voc-3c.cfg --train --dataset "/data/images" --annotation "/data/annotations" --load yolov2-voc.weights --gpu 1.0 Change the previous line to cater your own model. More specifically these parameters:
  1. cfg/yolov2-voc-3c.cfg - Where ever your config file is located. Do not put spaces in any of your folders or you will run into some problems
  2. /data/images - Where you saved your images
  3. /data/annotations - Where you saved your annotations
  4. yolov2-voc.weights - Where you saved your weights
  5. –gpu 1.0 - The GPU load (decimal percentage). If you set it too high, you will have training errors, especially if its a single card for the whole computer. I recommend using 0.9 at most.

You can dictate the amount of epochs to run with –epochs 20 (or any number) when you pass your terminal command. Darkflow saves the progress of your training in ckpt/checkpoint after each run. You can resume the latest checkpoint by running --load -1 or replace -1 with the checkpoint number (if the checkpoint is yolov2-voc-3c-795, the number is 795)

You will need the following to use it in any Python code:

  1. The four checkpoint files
  2. The new config file you made
  3. The original weights

Using your model

You can reference my code for my Capstone project to see how I use the model here.

To use the model, first import the TFNet tool.

from darkflow.net.build import TFNet

Then create an option dict with the same settings you had when you trained the model. The GPU load can be different. This will just change your inference speed.

options = {"model": "cfg/yolov2-voc-3c.cfg", "load": "bin/yolov2-voc.weights", "threshold": 0.1, "gpu": 0.5, "load":-1}

tfnet = TFNet(options)

Load the model only once in your code or else you will import the model every time you inference an image which will make it unbelievably slow and useless.

Run an inference in an image with the return_predict function. This image has to be opened with OpenCV and in RGB. If the picture is imported with numpy or the wrong color scheme, you will notice poor performance.

predictions = tfnet.return_predict(image)