Image classification is used to solve several Computer Vision problems; right from medical diagnoses, to surveillance systems, on to monitoring agricultural farms. There are innumerable possibilities to explore using Image Classification.
If you have completed the basic courses on Computer Vision, you are familiar with the tasks and routines involved in Image Classification tasks. Want to know more? Check out https://opencv.org/courses/.
Image Classification tasks follow a standard flow – where you pass an image to a deep learning model and it outcomes the class or the label of the object present.
In this article, you will learn how to build python-based gesture-controlled applications using AI. We will guide you all the way with step-by-step instructions. I’m sure you will have loads of fun and learn many useful concepts following the tutorial.
Specifically, you will learn the following:
How to train a custom Hand Detector with Dlib.
How to cleverly automate the data collection & annotation step with image processing so we don’t have to label anything.
How to convert normal PC applications like Games and Video Players to be controlled via hand gestures.
Here’s a demo of what we’ll be building in this Tutorial:
This is a really descriptive and interesting tutorial, let me highlight what you will learn in this tutorial about Tensorflow Object Detection API.
A Crystal Clear step by step tutorial on training a custom object detector.
A method to download videos and create a custom dataset out of that.
How to use the custom trained network inside the OpenCV DNN module so you can get rid of the TensorFlow framework.
Plus there are two things you will receive from the provided source code:
A Jupyter Notebook that automatically downloads and installs all the required things for you so you don’t have to step outside of that notebook.
A Colab version of the notebook that runs out of the box, just run the cells and train your own network.
I will stress again that all of the steps are explained in a neat and digestible way. I’ve you ever planned to do Object Detection then this is one tutorial you don’t want to miss.
As mentioned, by downloading the Source Code you will get 2 versions of the notebook: a local version and a colab version.
So first we’re going to see a complete end-to-end pipeline for training a custom object detector on our data and then we will use it in the OpenCV DNN module so we can get rid of the heavy Tensorflow framework for deployment. We have already discussed the advantages of using the final trained model in OpenCV instead of Tensorflow in my previous post.
Today’s post is the 3rd tutorial in our 3 part Deep Learning with OpenCV series. All three posts are titled as:
Now to follow along and to learn the full pipeline of training a custom object detector with TensorFlow you don’t need to read the previous two tutorials but when we move to the last part of this tutorial and use the model in OpenCV DNN then those tutorials would help.
What is Tensorflow Object Detection API (TFOD) :
To train our custom Object Detector we will be using TensorFlow API (TFOD API). The Tensorflow Object Detection API is a framework built on top of TensorFlow that makes it easy for you to train your own custom models.
The workflow generally goes like this : You take a pre-trained model from this model zoo and then fine-tune the model for your own task. Fine-tuning is a transfer learning method that allows you to utilize features of the model which it learned from a different task to your own task. Because of this, you won’t require thousands of images to train the network, only a few hundred will suffice. If you’re someone who prefers PyTorch instead of Tensorflow then you may want to look at Detectron 2
For this Tutorial I will be using TensorFlow Object Detection API version 1, If you want to know why we are using version 1 instead of the recently released version 2, then you can read below optional explanation.
Tensorflow Object Detection API 1
Why we’re using Tensorflow Object Detection API Version 1? (OPTIONAL READ)
IGNORE THIS EXPLANATION IF YOU’RE NOT FAMILIAR WITH TENSORFLOW’S FROZEN_GRAPHS
Tensorflow Object Detection API v2 comes with a lot of improvements, the new API contains some new State of The ART (SoTA) models, some pretty good changes including New binaries for train/eval/export that are eager mode compatible. You can check out this release blog from the Tensorflow Object Detection API developers.
But the thing is because TF 2 no longer supports sessions so you can’t easily export your model to frozen_inference_graph, furthermore TensorFlow depreciates the use of frozen_graphs and promotes saved_model format for future use cases.
For TensorFlow, this is the right move as the saved_model format is an excellent format.
So what’s the issue?
The problem is that OpenCV only works with frozen_inference_graphs and does not support saved_model format yet, so for this reason, if your end goal is to deploy it in OpenCV then you should use Tensorflow Object Detection API v1. Although you can still generate frozen_graphs, those graphs produce errors with OpenCV most of the time, we’ve tried limited experiments with TF2 so feel free to carry out your experiments but do share if you find something useful.
Now One great thing about this situation is that the Tensorflow team decided to keep the whole pipeline and code of Tensorflow Object Detection API 2 almost identical to Tensorflow Object Detection API 1 so learning how to use Tensorflow Object Detection API v1 will also teach you how to use Tensorflow Object Detection API v2.
NowLet’s start with the code
Code For TF Object Detection Pipeline:
Make sure to download the source code, which also contains the support folder with some helper files that you will need.
Here’s the hierarchy of the source code folder:
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
│Colab Notebook Link.txt
│Custom_Object_Detection.ipynb
│
└───support
│create_tf_record.py
│frozen_inference_graph.pb
│graph_ours.pbtxt
│tf_text_graph_common.py
│tf_text_graph_faster_rcnn.py
│
│
├───labels
│_000.xml
│_001.xml
│_002.xml
│...
├───test_images
│test1.jpg
│test2.jpg
│test3.png
│...
Here’s a description of what these folders & files are:
Custom_Object_Detection.ipynb: This is the main notebook which contains all the code.
Colab Notebook Link: This text file contains the link for the colab version of the notebook.
Create_tf_record.py: This file will create tf records from the images and labels.
fronzen_graph_inference.pb: This is the model we trained, you can try to run this on test images.
graph_ours.pbtxt: This is the graph file we generated for OpenCV, you’ll learn to generate your own.
tf_text_graph_faster_rcnn.py: This file creates the above graph.pbtxt file for OpenCV.
tf_text_graph_common.py: This is a helper file used by the faster_rcnnn.py file.
labels: These are .xml labels for each image
test_images: These are some sample test images to do inference on.
Note: There are some other folders and files which you will generate along the way, I will explain their use later.
Now Even though I make it really easy but still if you don’t want to worry about environment setup, installation, then you can use the colab version of the notebook that comes with the source code.
The Colab version doesn’t require any Configuration, It’s all set to go. Just run the cells in order. You should also be able to use the Colab GPU to speed up the training process.
The full code can be broken down into the following parts
Part 1: Environment Setup
Part 2: Installation & TFOD API Setup
Part 3: Data Collection & Annotation
Part 4: Downloading Model & Configuring it
Part 5: Training and Exporting Inference Graph.
Part 6: Generating .pbtxt and using the trained model with just OpenCV.
Part 1: Environment Setup:
First, let’s Make sure you have correctly set up your environment.
Since we are going to install TensorFlow version 1.15.0 so we should use a virtual environment, you can either install virtualenv or anaconda distribution. I’m using Anaconda. I will start by creating a virtual environment.
Open up the command prompt and do conda create --name tfod1 python==3.7
Now you can move into that environment by activating it:
conda activate tfod1
Make sure there is a (tfod1) at the beginning of each line in your cmd. This means you’re using that environment. Now anything you install will be in that environment and won’t affect your base/root environment.
The first thing You want to do is install a jupyter notebook in that environment. Otherwise, your environment will use the jupyter notebook of the base environment, so do:
pip install jupyter notebook
Now you should go into the directory/folder which I provided you and contains this notebook and open up the command prompt.
First, activate the environment tfod1environment and then launch the jupyter notebook by typing jupyter notebook and hit enter.
This will launch the jupyter notebook in your newly created environment. You can now Open up Custom_Object_DetectionNotebook.
Make sure your Notebook is Opened up in the Correct environment
Python
1
2
3
4
importsys
# Make sure to check you're using your tfod1 environment, you should see that name in the printed output
print(sys.executable)
c:usershp-pcanaconda3envstfod1python.exe
Part 2: Installation & Tensorflow Object Detection API Setup:
You can install all the required libraries by running this cell
Python
1
2
3
4
5
6
7
8
9
10
# If you can't use ! on windows 10 then you should do conda install posix
# Alternatively you can also use % instead of ! in Windows.
If you want to install Tensorflow-GPU for version 1 then you can take a look at my tutorial for that here
Note: You would need to change the Cuda Toolkit version and CuDNN version in the above tutorial since you’ll be installing for TF version 1 instead of version 2. You can look up the exact version requirements here
Another Library you will need is pycocotools
Python
1
2
# RUN THIS TO INSTALL IN WINDOWS
!pip install pycocotools-windows
Alternatively, You can also use this command to install in windows:
Alternatively, you can also use this command to install in Linux and osx:
pip install pycocotools
Note: Make sure you have Cython installed first by doing: pip install Cython
Import Libraries
This will also confirm if your installations were successful or not.
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
importos
importshutil
importmath
importdatetime
importglob
importurllib
importtarfile
importurllib.request
fromurllib.request importurlopen
fromioimportBytesIO
fromzipfileimportZipFile
importre
importmatplotlib.pyplot asplt
%matplotlibinline
# This will let you download any video from youtube
importpafy
importcv2
importnumpy asnp
importtensorflow astf
print("This should be Version 1.15.0, DETECTED VERSION: "+tf.__version__)
This should be Version 1.15.0, DETECTED VERSION: 1.15.0
Clone Tensorflow Object Detection API Model Repository
You need to clone the TF Object Detection API repository, you can either download the zip file and extract it or if you have git installed then you can git clone it.
Option 1: Download with git:
You can run git clone if you have git installed, this is going to take a while, it’s 600 MB+, have a coffee or something.
Option 2: Download the zip and extract all: (Only do this if you don’t have git)
You can download the zip by clicking here, after downloading make sure to extract the contents of this zip inside the directory containing this notebook. I’ve already provided you the code that automatically downloads and unzips the repo in this directory.
# Download and extract the zip file into a folder named support
withurlopen(URL)aszip_file:
withZipFile(BytesIO(zip_file.read()))aszfile:
zfile.extractall()
# Rename `models-master` directory to `models`
os.rename('models-master','models')
The models we’ll be using are in the research directory of the above repo. The research directory contains a collection of research model implementations in TensorFlow 1 or 2 by researchers. There are a total of 4 directories in the above repo, you can learn more about them here.
Install Tensorflow Object Detection API & Compile Protos
Download Protobuff Compiler:
TFOD contains some files .proto format, I’ll explain more about this format in a later step, for now, you need to download the protobuf compiler from here, make sure to download the correct one based on your system. For e.g. I downloaded protoc-3.12.4-win64.zip for my 64-bit windows. For Linux and osx there are different files.
After downloading unzip the proto folder, go to its bin directory, and copy the proto.exe file. Now paste this proto.exe inside the models/research directory.
The below script does all of this, but you can choose to do it manually if you want. Make sure to change the URL if you’re using a system other than 64-bit windows.
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Set the URL, you can copy/paste your target system's URL here.
Now you can install the object detection API and compile the protos: Below two operations must be performed in this directory, otherwise, it won’t work, especially the proto command.
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Move to models/research directory.
os.chdir('models/research/')
# Compiles protobuf files in the object_detction/protos folder, Now for every .proto there will be .py file present there.
# Move up two directories, this will put you back to your original `TF Object Detection v1` directory.
os.chdir('../..')
Note: Since I already had installed pycocotools so after running this line cp object_detection/packages/tf1/setup.py . I edited the setup.py file to get rid of pycocotools package inside the REQUIRED_PACKAGES list then I saved the setup.py file and ran the python -m pip install . command. I did this because I was facing issues installing pycocotools this way which is why I installed the pycocotools-windows package, you probably won’t need to do this.
If you wanted to install TFOD API version 2 instead of version 1 then you can just replace tf1 with tf2 in the cp object_detection/packages/tf1/setup.py . command.
You can check your installation of TFOD API by running model_builder_tf1_test.py
Now for this tutorial, I’m going to train a detector to detect the faces of Tom & Jerry. I didn’t want to use the common animal datasets etc. So I went with this.
While I was writing the above sentence I just realized I’m still using a Cat, mouse dataset albeit an animated one so I guess it’s still a unique dataset.
In this tutorial, I’m not only going to show you how to annotate the data but also show you one approach on how to go about collecting data for a new problem.
So What I’ll be doing is that I’m going to download a video of Tom & Jerry from Youtube and then split the frames of the video to create my dataset and then annotate each of those frames with bounding boxes. Now instead of downloading my Tom & Jerry video you can use any other video and try to detect your own classes.
Alternatively, you can also generate training data from other methods including getting images from Google Images.
To prepare the Data we need to perform these 5 steps:
For more options on how you can download the video take a look at the documentation here
Step 2: Split Video Frames and store it:
Now we’re going to split the video frames and store them in a folder. Since most videos have a high FPS (30-60 frames/sec) and we don’t exactly need this many frames for two reasons:
If you take a 30 FPS video then for each second of the video you will get 30 images and most of those images won’t be different from each other, there will be a lot of repetition of information.
We’re already going to use Transfer Learning with TFOD API, the benefit of this is that we won’t be needing a lot of images and this is good since we don’t want to annotate thousands of images.
So we can do two things we can skip frames and save every nth frame or we can save a frame every nth second of the video. I’m going with the latter approach, although both are valid approaches.
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Define an output directory
output_directory="training/images"
# Define the time interval after which you'll save each frame.
sec=1.5
# If the output directory does not exists then create it
ifnotos.path.exists(output_directory):
os.makedirs(output_directory)
# Initialize the video capture object
cap=cv2.VideoCapture(video_name)
# Get the FPS rate of the video, for this video its 25.0
fps=cap.get(cv2.CAP_PROP_FPS)
# Given the FPS rate of the video calculate the no of frames you will need to skip to determine that `sec` seconds are passed.
no_of_frames_to_skip=round(sec*fps)
frame_count=0
whileTrue:
ret,frame=cap.read()
# Break the loop if the video has ended
ifnotret:
break
# Get the Current Frame Number
frame_Id=int(cap.get(1))
# Only Save the frame when you've skipped the defined the number of frames
print('Done Splitting Video, Total Images saved: {}'.format(frame_count))
# Release the capture
cap.release()
Done Splitting Video, Total Images saved: 165
You can go to the directory where the images are saved and manually go through each image and delete the ones where Tom & Jerry are not visible or hardly visible. Although this is not a strict requirement since you can easily skip these images in the annotation step.
Step 3: Annotate Images with labelImg
You can watch this video below to understand how to use labelImg to annotate images and export annotations. You can also take a look at the GitHub repo here.
For the current Tom & Jerry problem, I am providing you with a labels folder that already contains the .xml annotation file for each image. If you want to try a different dataset then go ahead, make sure to put the labels of that dataset in the labels folder
Note: We are not splitting the images into the train and validation folder right now because we’ll be doing that automatically at tfrecord creation step. Although it would still be a good idea to separate 10% of the data for proper testing/evaluation of the final trained detector, since my purpose is to make this tutorial as simple as possible so I won’t be doing that today, I already have test folder with 4-5 images which I will evaluate on.
Step 4: Create a label Map file
TensorFlow requires a label map file, which maps each of the class labels to integer values. This label map is used in the training and detection process. This file should be saved in training the directory which also contains the labels folder
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# You can add more classes by adding another item and giving them an id of 3 and so on.
Tfrecords are just protocol buffers, they help make the data reading/processing process computationally efficient. The only downside they have is that they are not human-readable.
What are protocol Buffers?
A protocol buffer is a type of serialized structured data. It is more efficient than JSON, XML, pickle, and text storage formats. Google created this Protobuf (protocol buffer) format in 2008 because of their efficiency, Since then they have been widely used by Google and the community. To read the protobuf files (.proto files) you will first need to compile them by a protobuf compiler. So now you probably understand why we needed to compile those proto files at the beginning.
Here’s a nice tutorial by Naveen that explains how you can create a tfrecord for different data types and Here’s a more detailed explanation of protocol buffers with an example.
The create_tf_record.py script I’ll be using to convert images/labels to tfrecords is taken from the TensorFlow’s pet example but I’ve modified the script so now it accepts the following 5 arguments:
Directory of images
Directory of labels
% of Split of Training data
Path to label_map.pbtxt file
Path to output tfrecord files
And it returns a train.record and val.record. So it splits the training data into training/validation sets. For this data, I’m using a training set of 70% and validation is 30%.
Python
1
2
3
4
5
6
7
8
9
10
# Create tfrecords directory if it does not exits. This is where tfrecords will be stored.
tf_reocords="training/tfrecords"
ifnotos.path.exists(tf_reocords):
os.mkdir(tf_reocords)
# We are saving the record files in the folder named tfrecords.
# Change the slashes (i.e. ) according to your OS system.
# I'm using my own labels you can replace them with your labels.
You can ignore these warnings, we already know that we’re using an older 1.15 version of TFOD API which contains some depreciated functions.
Most of the tfrecord scripts available online will first tell you to convert your xml files to csv and then you will use another script to split the data into a training and validation folder and then another script to convert to tfrecords. The script above is doing all of this.
Part 4: Downloading Model & Configuring it:
You can now go to the Model Zoo, select a model, and download its zip. Now unzip the contents of that folder and put them inside a directory named pretrained_model. The below script does this automatically for a Faster-RCNN-Inception model which is already trained on the COCO dataset. You can change the model name to download a different model.
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Specify pre-trained model name you want to download
# Remove the checkpoint file so the model can be trained
os.remove(model_directory+'/checkpoint')
print('Model Downloaded')
Model Downloaded
Modify pipline.config file:
After downloading you will have a number of files present in the pretrained_model folder, I will explain about them later but for now, let’s take a look at the pipeline.config file.
Pipeline.config defines how the whole training process will take place, what optimizers, loss, learning_rate, batch_size will be used. Most of these params are already set by default, it’s up to you if you want to change them or not but there are some paths in the pipeline.config file that you will need to change so that this model can be trained on our data.
So open up pipeline.config with a text editor like Notepad ++ and change these 4 paths:
# Since we have 2 classes (Tom, Jerry) so we set this value to 2.
s=re.sub('num_classes: 90','num_classes: 2',s)
# Doing a little correction to avoid an error in training.
s=re.sub('step: 0','step: 1',s)
# I'm also changing the default batch_size of 1 to be 10 for this example
s=re.sub('batch_size: 1','batch_size: 10',s)
f.write(s)
Notice the correction I did by replacing step: 0 with step: 1, unfortunately for different models sometimes there are some corrections required but you can easily understand what exactly needs to be changed by pasting the error generated during training on google. Click on GitHub issues for that error and you’ll find a solution for that.
Note: These issues seem to be mostly present in TFOD API Version 1
Changing Important Params in Pipeline.config File:
Additionally, I’ve also changed the batch size of the model, just like batch_size, there are lots of important parameters that you would want to tune. I would strongly recommend that you try to change the values according to your problem. Almost always the default values are not optimal for your custom use case. I should tell you that to tune most of these values you need some prior knowledge, make sure to at least change the batch_size according to your system’s memory and learning_rate of the model.
Part 5 Training and Exporting Inference Graph:
You can start training the model by calling the model_main.py script from the Object_detection folder, we are giving it the following arguments.
num_train_steps: These are the number of times your model weights will be updated using a batch of data.
pipeline_config_path: This is the path to your pipeline.config file.
model_dir: Path to the output directory where the final checkpoint files will be saved.
Now you can run the below cell to start training but I would recommend that you run this cell in the command line, you can just paste this line:
Note:When you start training you will see a lot of warnings, just ignore them as TFOD 1 contains a lot of deprecated functions.
Once you start training, the network will take some time to initialize and then the training will start, after every few minutes, you will see a report of loss values and a global loss. The Network is learning if the loss is going down. If you’re not familiar with the Object detection Jargon Like IOU etc, then just make note of the final global loss after each report.
You ideally want to set the num_train_steps to tens of thousands of steps, you can always end training by pressing CTRL + C on the command prompt if the loss has decreased sufficiently. If training is taking place in jupyter notebook then you can end it by pressing the Stop button on top.
After training has ended or you’ve stopped it, there would be some new files in the pre_trained folder. Among all these files we will only need the checkpoint (ckpt) files.
If you’re training for 1000s of steps (which is most likely the case) then I would strongly recommend that you don’t use your CPU but utilize a GPU. If you don’t have one then it’s best to use Google Colab’s GPU. I’m already providing you a ready-to-run colab Notebook.
Note: There’s another script for training called train.py, this is an older script where you can see the loss value for each step, if you want to use that script then you can find it at models / research / object_detection / legacy / train.py
The best way to monitor training is to use Tensorboard, I will discuss this another time
Export Frozen Inference Graph:
Now we will use the export_inference_graph.py script to create a frozen_inference_graph from the checkpoint files.
Why are we doing this?
After training our model it is stored in checkpoint format and a saved_model format but in OpenCV, we need the model to be in a frozen_inference_graph format. So we need to generate the frozen_inference_graph using the checkpoint files.
What are these checkpoint files?
After Every few minutes of training, TensorFlow outputs some checkpoint (ckpt) files. The number on those files represents how many train steps they have gone through. So during the frozen_inference_graph creation, we only take the latest checkpoint file (i.e. the file with the highest number) because this is the one that has gone through the most training steps.
Now every time a checkpoint file is saved, it’s split into 3 parts.
For the initial step these files are:
model.ckpt-000.data: This file contains the value of each single variable, its pretty large.
model.ckpt-000.info: This file contains metadata for each tensor. e.g. checksum, auxiliary data etc.
model.ckpt-000.meta: This file stores the graph structure of the model
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Get all the files present in the pretrained_model directory
If you take a look at the fine_tuned_model folder which will be created after running the above command then you’ll find that it contains the same files you got when you downloaded the pre_trained model. This is the final folder.
Now Your trained model is in 3 different formats, the saved_model format, the frozen_inference_graph format, and the checkpoint file format. For OpenCV, we only need the frozen inference graph format.
The checkpoint format is ideal for retraining purposes and getting to know other sorts of information about the model, for production and serving the model you will need to use is either the frozen_inference_graph or saved_model format. It’s worth mentioning that both these files contain the extension .pb
In TF 2, frozen_inference_graph is depreciated and TF 2 encourages to use the saved_model format, as said previously unfortunately we can’t use the saved_model format with OpenCV yet.
Run Inference on Trained Model (Bonus Step):
You can optionally choose to run inference using TensorFlow sessions, I’m not going to explain much here as Tf sessions are depreciated and our final goal is to actually use this model in OpenCV DNN.
Part 6: Generating .pbtxt and using the trained model with just OpenCV
6 a) Export Graph.pbxt with frozen inference graph:
We can use the above generated frozen graph inside the OpenCV DNN module to do detection but most of the time we need another file called a graph.pbtxt file. This file contains a description of the network architecture, it is required by OpenCV to rewire some network layers for Optimization purposes.
This graph.pbtxt can be generated by using one of the 4 scripts provided by OpenCV. These scripts are:
tf_text_graph_ssd.py
tf_text_graph_faster_rcnn.py
tf_text_graph_mask_rcnn.py
tf_text_graph_efficientdet.py
They can be downloaded here, you will also find more information regarding them on that page.
Now since the Detection architecture we’re using is Faster-RCNN (you can tell by looking at the name of the downloaded model) so we will use tf_text_graph_faster_rcnn.py to generate the pbtxt file. For .pbtxt generation you will need the frozen_inference_graph.pb file and the pipeline.config file.
Note: When you’re done with training then you will also see a graph.pbtxt file inside the pretrained folder, this graph.pbtxt is different from the one generated by OpenCV’s .pbtxt generator scripts. One major difference is that the OpenCV’s graph.pbtxt do not contain the model weights but only contain the graph description, so they will be much smaller in size.
Number of classes: 2 Scales: [0.25, 0.5, 1.0, 2.0] Aspect ratios: [0.5, 1.0, 2.0] Width stride: 16.000000 Height stride: 16.000000 Features stride: 16.000000
For model architectures that are not one of the above 4, then for those, you will need to convert TensorFlow’s .pbtxt file to OpenCV’s version. You can find more on how to do that here. But we warned this conversion is not a smooth process and there are a lot of low-level issues that come up.
6 b) Using the Frozen inference graph along with Pbtxt file in OpenCV:
Now that we have generated the graph.pbtxt file with OpenCV’s tf_text_graph function we can pass this file to cv2.dnn.readNetFromTensorflow() to initialize the network. All of our work is done now Make sure you’re familiar with OpenCV’s DNN module, if not you can read my previous post on it.
Now we will create the following two functions:
Initialization Function: This function will initialize the network using the .pb and .pbtxt files, it will also set the class labels.
Main Function: This function will contain all the rest of the code from preprocessing to postprocessing, it will also have the option to either return the image or display it with matplotlib
Python
1
2
3
4
5
6
7
8
9
10
11
12
# We're passing in the paths of pbtxt file (graph description of model) and our actual trained model
# Return the annotated image if returndata is True
ifreturndata:
returnimg
# Otherwise show the full image.
else:
plt.figure(figsize=(10,10))
plt.imshow(img[:,:,::-1]);plt.axis("off");
Note: When you do net.forward() you get an output of shape (1,1,100,7). Since we’re predicting on a single image instead of a batch of images so you will get (1,1) at the start now the remaining (100,7) means that there are 100 detections for that image and each image contains 7 properties/variables.
There will be 100 detections for each image, this was set in the pipeline.config, you can choose to change that.
So here are what these 7 properties correspond to:
This is the index of image for a single image its 0
This is the index of the target CLASS
This is the score/confidence of that CLASS
Remaining 4 values are x1,y1,x2,y2. These are used to draw the bounding box of that CLASS object
x1
y1
x2
y2
Initialize the network
You will just need to call this once to initialize the network
Python
1
2
3
4
5
# You can initialize the model using our provided trained model
Now you can use the main function to perform prediction on different images, The images we will predict are placed inside a folder named test_images. These images were not in the training dataset.
Python
1
2
img=cv2.imread('support/test_images/test1.jpg')
detect_object(img)
Python
1
2
img=cv2.imread('support/test_images/test2.jpg')
detect_object(img)
Python
1
2
img=cv2.imread('support/test_images/test6.jpg')
detect_object(img)
Python
1
2
img=cv2.imread('support/test_images/test3.png')
detect_object(img)
Python
1
2
img=cv2.imread('support/test_images/test7.jpg')
detect_object(img)
Summary
Limitations: Our Final detector has a decent accuracy but it’s not that robust because of 4 reasons:
Transfer Learning works best when the dataset you’re training on shares some features with the original dataset it was trained on, most of the models are trained on ImageNet, COCO, PASCAL VOC datasets. Which is filled with animals and other real-world images. Now our dataset is a dataset of Cartoon images, which is drastically different from real-world images. We can solve this problem by including more images and training more layers of the model.
Animations of cartoon characters are not consistent, they change a lot in different movies. So if you train the model on these pictures and then try to detect random google images of tom and jerry then you won’t get good accuracy. We can solve this problem by including images of these characters from different movies so the model learns the features that are the same throughout the movies.
The images generated from the sample video created an imbalanced dataset, There are more Jerry Images than Tom images, there are ways to handle this scenario but try to get a decent balance of images for both classes to get the best results.
The annotation is poor, Yeah so the annotation I did was just for the sake of making this tutorial, in reality, you want to set a clear outline and standard about how you’ll be annotating, are you going to annotate the whole head, are ears included, is the neck part of it.. so you need answer all these questions ahead of time.
I will stress again that if you’re not planning to use OpenCV for the final deployment then use TFOD API version 2, it’s a lot more cleaner. However, if the final objective is to use OpenCV at the end then you could get away with TF 2 but it’s a lot of trouble.
Even with TFOD API v1, you can’t be sure that your custom trained model will always be loaded in OpenCV correctly, there are times when you would need to manually edit the graph.pbtxt file so that you can use the model in OpenCV. If this happens and you’re sure you have done everything correctly then your best bet is to raise an issue here.
Hopefully, OpenCV will catch up and start supporting TF 2 saved_model format but it’s gonna take time. If you enjoyed this tutorial then please feel free to comment and I’ll gladly answer you.
You can reach out to me personally for a 1 on 1 consultation session in AI/computer vision regarding your project. Our talented team of vision engineers will help you every step of the way. Get on a call with me directlyhere.
Ready to seriously dive into State of the Art AI & Computer Vision? Then Sign up for these premium Courses by Bleed AI
You think of your move and I’ll make mine below this line in 1…2…and 3.
I choose ROCK.
Well? …who won. It doesn’t matter cause you probably glanced at the word “ROCK” before thinking about a move or maybe you didn’t pay any heed to my feeble attempt at playing rock, paper, scissor with you in a blog post.
So why am I making some miserable attempts trying to play this game in text with you?
Let’s just say, a couple of months down the road in lockdown you just run out of fun ideas. To be honest I desperately need to socialize and do something fun.
Ideally, I would love to play games with some good friends, …or just friends…or anyone who is willing to play.
Now I’m tired of video games. I want to go for something old fashioned, like something involving other intelligent beings, ideally a human. But because of the lockdown, we’re a bit short on those for close proximity activities. So what’s the next best thing?
AI of course. So yeah why not build an AI that would play with me whenever I want.
Now I don’t want to make a dumb AI bot that predicts randomly between rock, paper, and scissor, but rather I also don’t want to use any keyboard inputs or mouse. Just want to play the old fashioned way.
Did you know that you can actually stream a Live Video wirelessly from your phone’s camera to OpenCV’s cv2.VideoCapture() function in your PC and do all sorts of image processing on the spot like build an intruder detection system?
Cool huh?
In today’s post not only we will do just that but we will also build a robust Intruder Detection surveillance system on top of that, this will record video samples whenever someone enters your room and will also send you alert messages via Twilio API.
This post will serve as your building blocks for making a smart intruder detection system with computer vision. Although I’m making this tutorial for a home surveillance experiment, you can easily take this setup and swap the mobile camera with multiple IP Cams to create a much larger system.
Today’s tutorial can be split into 4 parts:
Accessing the Live stream from your phone to OpenCV.
Learning how to use the Twilio API to send Alert messages.
Building a Motion Detector with Background Subtraction and Contour detection.
Making the Final Application
You can watch the full application demo here
So most of the people have used the cv2.videocapture() function to read from a webcam or a video recording from a disk but only a few people know how easy it is to stream a video from a URL, in most cases this URL is from an IP camera.
By the way with cv2.VideoCapture() you can also read a sequence of images, so yeah a GIF can be read by this.
So let me list out all 4 ways to use VideoCapture() class depending upon what you pass inside the function.
1. Using Live camera feed: You pass in an integer number i.e. 0,1,2 etc e.g. cap = cv2.VideoCapture(0), now you will be able to use your webcam live stream. The number depends upon how many USB cams you attach and on which port.
2.Playing a saved Video on Disk: You pass in the path to the video file e.g. cap = cv2.VideoCapture(Path_To_video).
3. Live Streaming from URL using Ip camera or similar: You can stream from a URL e.g. cap = cv2.VideoCapture( protocol://host:port/video) Note: that each video stream or IP camera feed has its own URL scheme.
4.Read a sequence of Images: You can also read sequences of images, e.g. GIF.
Part 1: Accessing the Live stream from your phone to OpenCV For The Intruder Detection System:
For those of you who have an Android phone can go ahead and install this IP Camera application from playstore.
For people that want to try a different application or those of you who want to try on their iPhone I would say that although you can follow along with this tutorial by installing a similar IP camera application on your phones but one issue that you could face is that the URL Scheme for each application would be different so you would need to figure that out, some application makes it really simple like the one I’m showing you today.
You can also use the same code I’m sharing here to work with an actual IP Camera, again the only difference will be the URL scheme, different IP Cameras have different URL schemes. For our IP Camera, the URL Scheme is: protocol://host:port/video
After installing the IP Camera application, open it and scroll all the way down and click start server.
After starting the server the application will start streaming the video to the highlighted URL:
If you paste this URL in the browser of your computer then you would see this:
Note: Your computer and mobile must be connected to the same Network
Click on the Browser or the Flash button and you’ll see a live stream of your video feed:
Below the live feed, you’ll see many options on how to stream your video, you can try changing these options and see effects take place in real-time.
Some important properties to focus on are the video Quality, FPS, and the resolution of the video. All these things determine the latency of the video. You can also change front/back cameras.
Try copying the image Address of the frame:
If you try pasting the address in a new tab then you will only see the video stream. So this is the address that will go inside the VideoCapture function.
As you can see I’m able to stream video from my phone.
Now there are some options you may want to consider, for e.g you may want to change the resolution, in my case I have set the resolution to be 640x480. Since I’m not using the web interface so I have used the app to set these settings.
There are also other useful settings that you may want to do, like settings up a password and a username so your stream is protected. Setting up a password would, of course, change the URL to something like:
I’ve also enabled background mode so even when I’m out of the app or my phone screen is closed the camera is recording secretly, now this is super stealth mode.
Finally here are some other URL Schemes to read this IP Camera stream, with these URLs you can even load audio and images from the stream:
http://19412.168.3.:8080/video is the MJPEG URL.
http://192.168.43.1:8080/shot.jpg fetches the latest frame.
http://192.168.43.1:8080/audio.wav is the audio stream in Wav format.
http://192.168.43.1:8080/audio.aac is the audio stream in AAC format (if supported by hardware).
Part 2: Learning how to use the Twilio API to send Alert messages for the Intruder Detection System:
What is Twilio?
Twilio is an online service that allows us to programmatically make and receive phone calls, send and receive SMS, MMS and even Whatsapp messages, using its web APIs.
Today we’ll just be using it to send an SMS, you won’t need to purchase anything since you get some free credits after you have signed up here.
So go ahead and sign up, after signing up go to the console interface and grab these two keys and your trial Number:
ACCOUNT SID
AUTH TOKEN
After getting these keys you would need to insert them in the credentials.txt file provided in the source code folder. You can download the folder from above.
Make sure to replace theINSERT_YOUR_ACCOUNT_SIDwith your ACCOUNT SID and also replaceINSERT_YOUR_AUTH_TOKEN with yourAUTH TOKEN.
There are also two other things you need to insert in the text file, this is your trail Number given to by the Twilio API and your personal number where you will receive the messages.
So replace PERSONAL_NUMBER with your number and TRIAL_NUMBER with the Twilio number, make sure to include the country code for your personal number.
Note: in the trail account the personal number can’t be any random number but its verified number. After you have created the account you can add verified numbers here.
Now you’re ready to use the twilio api, you first have to install the API by doing:
pip install twilio
Now just run this code to send a message:
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
fromtwilio.rest importClient
# Read text from the credentials file and store in data variable
withopen('credentials.txt','r')asmyfile:
data=myfile.read()
# Convert data variable into dictionary
info_dict=eval(data)
# Your Account SID from twilio.com/console
account_sid=info_dict['account_sid']
# Your Auth Token from twilio.com/console
auth_token=info_dict['auth_token']
# Set client and send the message
client=Client(account_sid,auth_token)
message=client.messages.create(to=info_dict['your_num'],from_=info_dict['trial_num'],body="What's Up Man")
Check your phone you would have received a message. Later on we’ll properly fill up the body text.
Part 3: Building a Motion Detector with Background Subtraction and Contour detection:
Now in OpenCV, there are multiple ways to detect and track a moving object, but we’re going to go for a simple background subtraction method.
What are Background Subtraction methods?
Basically these kinds of methods separate the background from the foreground in a video so for e.g. if a person walks in an empty room then the background subtraction algorithm would know there’s disturbance by subtracting the previously stored image of the room (without the person ) and the current image (with the person).
So background subtraction can be used as effective motion detectors and even object counters like a people counter, how many people went in or out of a shop.
Now what I’ve described above is a very basic approach to background subtraction, In OpenCV, you would find a number of complex algorithms that use background subtraction to detect motion, In my Computer Vision & Image Processing Course I have talked about background subtraction in detail. I have taught how to construct your own custom background subtraction methods and how to use the built-in OpenCV ones. So make sure to check out the course if you want to study computer vision in depth.
For this tutorial, I will be using a Gaussian Mixture-based Background / Foreground Segmentation Algorithm. It is based on two papers by Z.Zivkovic, “Improved adaptive Gaussian mixture model for background subtraction” in 2004 and “Efficient Adaptive Density Estimation per Image Pixel for the Task of Background Subtraction” in 2006
The cv2.createBackgroundSubtractorMOG2() takes in 3 arguments:
detectsSadows: Now this algorithm will also be able to detect shadows, if we pass in detectShadows=True argument in the constructor. The ability to detect and get rid of shadows will give us smooth and robust results. Enabling shadow detection slightly decreases speed.
history: This is the number of frames that is used to create the background model, increase this number if your target object often stops or pauses for a moment.
varThreshold: This threshold will help you filter out noise present in the frame, increase this number if there are lots of white spots in the frame. Although we will also use morphological operations like erosion to get rid of the noise.
Now after we have our background subtraction done then we can further refine the results by getting rid of the noise and enlarging our target object.
We can refine our results by using morphological operations like erosion and dilation. After we have cleaned our image then we can apply contour detection to detect those moving big white blobs (people) and then draw bounding boxes over those blobs.
If you don’t know about Morphological Operations or Contour Detection then you should go over this Computer Vision Crash course post, I published a few weeks back.
So in summary 4 major steps are being performed above:
Step 1: We’re Extracting moving objects with Background Subtraction and getting rid of the shadows
Step 2: Applying morphological operations to improve the background subtraction mask
Step 3: Then we’re detecting Contours and making sure you’re not detecting noise by filtering small contours
Step 4: Finally we’re computing a bounding box over the max contour, drawing the box, and displaying the image.
Part 4: Creating the Final Intruder Detection System Application:
Finally, we will combine all the things above, we will also use the cv2.VideoWriter() class to save the images as a video in our disk. We will alert the user via Twilio API whenever there is someone in the room.
The function is_person_present() is called on each frame and it tells us if a person is present in the current frame or not, if it is then we append True to a deque list of length 15, now if the detection has occurred 15 times consecutively we then change the Room occupied status to True. The reason we don’t change the Occupied status to True on the first detection is to avoid our system being triggered by false positives. As soon as the room status is true the VideoWriter is initialized and the video starts recording.
Now when the person is not detected anymore then we wait for 7 seconds before turning the room status to False, this is because the person may disappear from view for a moment and then reappear or we may miss detecting the person for a few seconds.
Now when the person disappears and the 7-second timer ends then we make the room status to False, we release the VideoWriter in order to save the video and then send an alert message via send_message() function to the user.
Also I have designed the code in a way that our patience timer (7 second timer) is not affected by False positives.
Here’s a high level explanation of the demo:
See how I have placed my mobile, while the screen is closed it’s actually recording and sending live feed to my PC. No one would suspect that you have the perfect intruder detection system setup in the room.
Improvements:
Right now your IP Camera has a dynamic IP so you may be interested in learning how to make your device have a static IP address so you don’t have to change the address each time you launch your IP Camera.
Another limitation you have right now is that you can only use this setup when your device and your PC are connected to the same network/WIFI so you may want to learn how to get this setup to run globally.
Both of these issues can be solved by some configuration, All the instructions for that are in a manual which you can get by downloading the source code from above for the intruder detection system.
Summary:
In this tutorial you learned how to turn your phone into a smart IP Camera, you learned how to work with URL video feeds in general.
After that we went over how to create a background subtraction based motion detector.
We also learned how to connect the twilio api to our system to enable alert messages. Right now we are sending alert messages every time there is motion so you may want to change this and make the api send you a single message each day containing a summary of all movements that happened in the room throughout the day.
Finally we created a complete application where we also saved the recording snippets of people moving about in the room.
This post was just a basic template for a surveillance system, you can actually take this and make more enhancements to it, for e.g. for each person coming in the room you can check with facial recognition if it’s actually an intruder or a family member. Similarly there are lots of other things you can do with this.
If you enjoyed this tutorial then I would love to hear your opinion on it, please feel free to comment and ask questions, I’ll gladly answer them.
You can reach out to me personally for a 1 on 1 consultation session in AI/computer vision regarding your project. Our talented team of vision engineers will help you every step of the way. Get on a call with me directlyhere.
Ready to seriously dive into State of the Art AI & Computer Vision? Then Sign up for these premium Courses by Bleed AI
You can reach out to me personally for a 1 on 1 consultation session in AI/computer vision regarding your project. Our talented team of vision engineers will help you every step of the way. Get on a call with me directlyhere.