Get Started

From Zero to Mastery in Image Processing, Machine Learning, Deep Learning & Computer Vision, a simplified detailed Step by Step learning path for you.

OpenCV 101: How to Get Started with OpenCV & Image Processing:

If you ever want to get into computer vision and image processing library then OpenCV is the go-to library for you, specifically for Image processing tasks. All over the internet, you’ll find countless applications that people have built using OpenCV. Now obviously before you get your hands dirty building those cool apps, you first need to grasp OpenCV fundamentals and understand image processing basics, this is exactly what this section teaches you. Note, all tutorials have a detailed blog post form and a complete video tutorial. So at Bleed AI you’ll get a complete package, you don’t need to ponder around, just follow the path I outline in this section and you’ll be an OpenCV ninja in no time.

OpenCV Crash Course


This is a single stand-alone tutorial that will serve as a Crash Course for you to learn the basics of the OpenCV library. All you need to know is the basics of the python programming language and you’re all set to go. Below is the outline for the course above, all the parts are included in the link above.


OpenCV Tips and Tricks


Now, this is the tutorial that I wish I had access to when I was starting my journey learning OpenCV, in this tutorial I have revealed some very interesting info about OpenCV including excellent tips for you, regarding where to find the right resources, tutorials for the library. The outline:

  • How to navigate the OpenCV docs to find what you’re looking for.
  • How to get details regarding any OpenCV function.
  • The differences between the C++ and Python versions of OpenCV and which one you should work with.
  • Pip installation of OpenCV vs Source installation.
  • Where to ask questions regarding OpenCV when you’re stuck.
Installing OpenCV from Source


In this tutorial, you will learn how to install OpenCV from Source in Windows 10 with Nvidia GPU Support & Non-Free Flags Enabled. The outline of the tutorial, I only recommend that you follow this tutorial if you’re already comfortable with OpenCV and need to utilize additional features that are not available with the default pip installation.

Contour Detection 101: The Basics


If you only had time to learn 1 OpenCV algorithm then that should definitely be contours, using contours, you’ll be able to create systems like vehicle detection, object counters, shape detection, distance measurement, etc. without even using deep learning. So it’s a must-learn technique in OpenCV.  So in this tutorial, you will learn the basics of contour detection, how to use it, and go over various preprocessing techniques required for Contour Detection. Below is the outline of the tutorial.

Contour Detection 101: Contour Manipulation (Pt:2)

Detecting and drawing contours alone won’t be enough to build powerful computer vision applications. The contours need to be processed and manipulated to make them useful for building cool applications. 

So in this tutorial, you will learn how to perform different contour manipulations to extract beneficial information from the detected contours. Using this information you can easily build very useful applications, in fact, we have an entire course that will help you master contours for building computer vision applications. Here’s the outline of this tutorial.

Contour Detection 101: Contour Analysis (Pt:3)

In this tutorial, you will be learning about analyzing contours that help recognize the object being detected and differentiate one contour from another. We will also explore how you can identify different properties of contours to retrieve useful information. After learning to analyze the contours, you will be able to do all sorts of cool things with them. The outline:

Learn To Build Computer Vision Applications in 1 Week


Now that you have learned the basics of contour detection, it is time to combine those concepts and start building cool real-time vision applications and learn to use contours at an expert level utilizing this extremely affordable course

In the course, we have provided high-quality graded quizzes, applications, case studies, challenges, forums, and a lot more with no AI or Computer Vision background required.

Vehicle Detection

The course we mentioned above is full of exciting real-world applications if you are still confused about whether you should join the course or not. Here is a free tutorial, in which you can see how useful and simple it is to manipulate contour & build real-world applications like vehicle detection. The outline of the tutorial:

  • Background Subtraction
  • Noise Removal
  • Contour Detection & Manipulation
Computer Vision and Image Processing with OpenCV


Now that you have learned the basics of OpenCV and Image processing, your next step should be to join our premium course on Image Processing with OpenCV ( with video lectures in Urdu/Hindi). The course covers:

  • Image Processing
  • Classical Computer Vision
  • Computer Vision Projects
  • Machine Learning & Deep Learning

This course gives you everything you need, in detail for you to master Computer Vision and Image Processing with OpenCV in python with no prerequisites other than just some programming experience in any language, so I highly recommend embracing the opportunity and don’t miss it out.

Nice going! you have completed the first and most important step of your learning journey so you should be proud of yourself. Now the next step is to move towards building some interesting applications and extend your OpenCV knowledge further.

Note: If you have joined and completed my courses then the next section will be a piece of cake for you.

OpenCV 102: Going beyond the Basics & Building Interesting Applications

Having a lot of knowledge is no use if you don’t know how to utilize it so, it’s time to start building some advanced level applications utilizing the concepts you learned in the previous section, to move forward in your journey and get more comfortable with the library, so below are some tutorials on building some interesting vision apps:

Intruder Detection

Computer Vision & Machine Learning in the security industry, coupled with sensors like CCTV cameras, etc is a highly demanded skill these days and allows you to derive meaningful information and insights from visual data without anyone’s involvement. This opens up the door of feasibility to a ton of interesting applications like Automatic Number Plate Recognition (ANPR), Anomaly & Violation detection, and  Social Distancing Monitoring.

This tutorial will let you get started in the security industry and guide you in building a robust Intruder Detection surveillance system using your phone’s camera. The system will record video samples whenever someone enters your room and will also send you alert messages via Twilio API. The tutorial can be split into 4 parts:

Virtual Pen

In this tutorial, you will learn to build your own Virtual Pen & also a Virtual Eraser without needing any hardware other than just a normal webcam and a pen. So you will be able to draw and erase virtually on the screen by waving a pen in the air, cool right? So let’s get into it. The outline:

Rock Paper Scissor



It’s important to have fun every once in a while, so how about taking a break to play some games. If you want you can go play with your friends or maybe how about we build an AI to play games with us? Sounds like a plan, right? This way you will end up having fun along with learning several new AI concepts at the same time. 

So this tutorial will cover creating a system that will allow you to play Rock, Paper, Scissors with AI. The system will be capable of recognizing your hand signs and then play its random move, as the game does not require a strategy to choose the move. The tutorial can be breakdown into the following steps:

Well done! You have completed the section and learned to build some very interesting applications that have added great value to your portfolio, so congrats on that.


Mediapipe: Learn to Build Cool Real-World Applications.

 “Mediapipe is a cross-platform/open-source tool that allows you to run a variety of Cutting-edge ML solutions in real-time. It’s designed primarily for facilitating the use of ML in streaming media & It was built by Google”

You can expect nothing less than the state of the Art performance from this library, as this tool is backed by Google and the models in Mediapipe are actively used in Google products.

So now, in the tutorials below, you will learn to use this library to build real-time cool vision applications.


Building 4 Applications Using Real-Time Selfie Segmentation in Python

Image segmentation is one of the most important and widely used image processing techniques as it brings down the classification task to the pixel level and partitions an image into multiple segments (i.e., set of pixels) which makes it much easier to extract and analyze the objects of our interest and be used to make a ton of vision applications.

In the tutorial, you will learn to perform Real-Time Selfie Segmentation using Mediapipe in Python and then build the following 4 applications:

The same segmentation model is also being utilized by Google Hangouts. You will not need any GPU, all of these applications will run smoothly in real-time on your CPU, and trust me there is no catch here. And you will also learn about the following image segmentation categories:

Real-Time 3D Pose Detection & Pose Classification with Mediapipe and Python

Pose detection is a must learn technique in vision since it has a lot of real-world use cases like controlling HCI applications, Fitness/exercise/dance monitoring, and overlaying virtual clothes or other accessories over the body, etc.


This tutorial will teach you to perform 3D pose detection and pose classification on images, videos, and webcam feed in real-time. Plus at the end I’ll also show you how to create a pose classification system for detecting Yoga poses. The tutorial is divided into the following parts:

Controlling Subway Surfers Game with Pose Detection using Mediapipe and Python

You have made it this far which tells me that you are quite dedicated to becoming a Computer Vision/Machine Learning expert and will succeed soon but one should also maintain a healthy balance between working and having fun.

Regularly doing physical exercises is a great way to stay healthy. So why not combine exercising, coding, and even gaming? And hit all the birds with a single shot.

For this purpose, I have made a tutorial in which you will learn to control a popular game called “Subway Surfers” using body gestures and movements while exercising and having fun all at the same time, with the help of Mediapipe’s pose detection solution utilizing just a webcam. 

Here’s a Demo of me controlling the game using the application we will cover in this tutorial:

The application is divided into smaller components and all the steps to create the components and integrate them to produce the final application are described in a very neat manner. The outline:


Real-Time 3D Hands Landmarks Detection & Hands Classification with Mediapipe and Python

Hand landmark detection is one of the most popular computer vision techniques out there and every vision practitioner must learn it to compete in the market. Some of its applications are sign language recognition and controlling HCI applications with hand gestures, etc.

This tutorial will teach you to perform 3D hands landmarks detection, and hands classification (i.e. is it a left or right hand) images and videos as well in real-time using the Mediapipe library in python on your CPU with 30+ FPS

The tutorial will also cover drawing bounding boxes around hands and customized landmarks annotation of the hands utilizing the Depth of the landmarks.  The tutorial is divided into the following parts:


Real-Time Fingers Counter & Hand Gesture Recognizer with Mediapipe and Python

Gesture Recognition is gonna be one of the highest demand skills in the near future as it is a building block of Augmented Reality (AR) applications and can also be used in controlling simple HCI applications via hand gestures which kind of sounds much cooler than typing through keys or tapping on a touch screen.

So in this tutorial, you will learn to utilize the hand landmarks to count hand fingers, build a Hand Gesture Recognizer that will be capable of identifying multiple gestures, and create a Selfie-Capturing System that will be controlled using hand gestures. The outline:

With the completion of this section, you have obtained a fine grip on Mediapipe. If you want you can check out the other solutions provided by Mediapipe here

Deep Learning Systems

Deep Learning Systems are the best option to choose when you have the required amount of clean dataset, and a high-end machine to train on, as these systems let the data speak for itself and don’t require feature engineering makes the process easier by reducing human effort.

Deep Learning with OpenCV DNN module.

Now is the time to start learning about the DNN module in OpenCV which lets you use pre-trained neural networks from popular frameworks like TensorFlow, PyTorch, etc directly in OpenCV, which is quite convenient.


Deep Learning with OpenCV DNN Module, A Comprehensive Guide

In this tutorial, you will learn about the various important details of the DNN module, that are never discussed like, selecting preprocessing params correctly and designing pre and post-processing pipelines for different models, and a lot more that is required to use the module to its fullest extent. This tutorial can be split into 3 sections.

Training a Custom Image Classifier with Tensorflow, and using it in the OpenCV DNN module

In this tutorial, we will first train a custom image classifier with Tensorflow’s Keras API, and use it in the OpenCV DNN module. You can skip the training part for now and focus on learning how to use a custom-trained classifier in OpenCV along with its advantages. You will learn to:

Training an Object Detector with Tensorflow Object Detection API and using it in the OpenCV dnn module

This tutorial first covers a complete end-to-end pipeline for training a custom object detector and then how to use it in the OpenCV DNN module. For this one too, you can skip the training part for now and focus on learning how to use a custom-trained object detector in OpenCV. You will learn to: 

Introduction to Super Resolution


You may already know that the higher the resolution (i.e., number of pixels) is, the more detailed the image is, so the more information it contains to extract. 

Super Resolution is a very useful algorithm that is effectively being used to generate high-resolution versions of the images in various applications (like medical Imaging, Surveillance Systems, and Satellite Imagery, etc) across several industries and is a must-learn technique for every computer vision practitioner.

So you must check this tutorial, which covers some theoretical details of what exactly Super-Resolution is, its algorithms and architectures, and how to perform it with just OpenCV. The outline:

Super Resolution, Going from 3x to 8x Resolution

In this tutorial, you will learn to perform Super-Resolution using multiple models, even those that will do 8x resolution. In the previous tutorial, we had implemented only a single SR model with 3x Resolution, so this one’s gonna be quite interesting. The four different models that you will be learning to use in this tutorial are:

      • EDSR: Enhanced Deep Residual Network 
      • ESPCN: Efficient Subpixel Convolutional Network 
      • FSRCNN: Fast Super-Resolution Convolutional Neural Networks
      • LapSRN: Laplacian Pyramid Super-Resolution Network 
Emotion / Facial Expression Recognition

To achieve effective interaction between humans and machines it is essential for machines to recognize the emotions of humans, as emotions play a vital role in conveying non-verbal messages which can’t be ignored. And emotion recognition can also be used to create other intelligent systems like Smart Music players, Student Mood Monitoring System, and Smart Advertisement Banners, etc.

This tutorial will cover how to perform Facial Expression Recognition AKA Emotion Recognition using the DNN module. The tutorial is structured in the following way:

Nice going! You have completed one more section and learned everything you need to know to effectively implement the Deep Neural Network module in OpenCV.

Deep Learning with Tensorflow & Keras

This section covers TensorFlow & Keras (which provide high-level APIs with pre-trained models, and data and are one of the leading libraries mostly used by researchers and students) to build up your foundation and get you familiar with training machine learning and deep learning models.

TensorFlow 2.0 GPU Complete Installation Guide

This tutorial covers the TensorFlow 2.0 GPU installation in a step-by-step manner structured in a very neat and simplest way possible. And not only that it will also help you to set up your Nvidia GPU for OpenCV source installation so that you can use your GPU with the OpenCV DNN module. The outline:


Training a Custom Image Classifier with Tensorflow

In this tutorial, we will train an image classifier with Tensorflow’s Keras API, and use it in the OpenCV DNN module. You have already learned to utilize the OpenCV DNN  module so you should only focus on learning how to train a custom classifier with TensorFlow.


Training an Object Detector with Tensorflow Object Detection API

In this tutorial, you will see a complete end-to-end pipeline for training an object detector and then use it in the OpenCV DNN module. For this one too, you have already done the OpenCV DNN part so you can focus on learning how to train an object detector with TFOD API. You will learn to:

Classification with Localization

You may have already worked on an image classifier, maybe the Cats and Dog Classification problem, or digit recognition. And this tutorial will take you a step further by covering image classification along with object localization which means you will learn to train a model capable of identifying an object as well as locating it in the image using Keras. 

It’s fine even if you have not worked on a Classification system as it is not a pre-requisite for this tutorial and I have explained everything from scratch in the tutorial. The approach used, is only recommended if you don’t want to, or cannot use the TensorFlow Object Detection (TFOD) API. 

Introduction to Video Classification and Human Activity Recognition

In case you are thinking that video classification would be the same as a classification of a series of images and you can skip this tutorial, then you are totally wrong as in videos the series of frames/images are related to one another which makes video classification a lot more complex and fascinating as well.

The tutorial is about how to perform classification on videos, you will learn about several approaches that can be used to perform video classification, and will implement a simple but quite effective approach called Single-Frame CNN using a moving average technique across frames for human activity recognition. The outline:

Human Activity Recognition using TensorFlow (CNN + LSTM).

This tutorial will take you a step further by teaching you to implement a much more effective video classification approach for human action recognition using a Convolutional Neural Network combined with a Long-Short Term Memory Network. We’ll be using two different architectures and will also cover some theory on video classification and different approaches that can be used for video classification. The outline:

Well done! You are doing great, just one more section to go.

Become a Dlib Expert

Dlib library is another powerful computer vision library out there. It is not as extensive as OpenCV but still, there is a lot you can do with it including, Facial Landmark detection, Deep Metric Learning, Object tracking, and more. 

It is used in industry and academia as well for domains like robotics, embedded devices, and other areas. This section will help you master the library with the help of the tutorials listed below.

Dlib Library Crash Course, 101 to Mastery

This tutorial will serve as a dlib library crash course for you and will cover most of the prominent features and algorithms present in the dlib library, sounds impossible to cover everything in a blog post right?

So I will just provide all the best and the most important tutorials on different aspects of the dlib library out there in a nice hierarchical order as everything is already out there. It will serve as a very well-structured reference guide for dlib library users. The outline:

Training a Custom Object Detector & Making Gesture Controlled Applications

This tutorial will teach you to train a custom Hand Detector with Dlib, cleverly automate the data collection & annotation step with image processing, and control applications like Games and Video Players with hand gestures, fascinating right? So let’s dive into it. The tutorial can be split into the followings parts:

Playing Chrome’s T-Rex Game with Facial Gestures

In this tutorial, you will learn to control the infamous Chrome’s Dino game that appears on Chrome when the internet is not working, using face gestures. 

The game is normally played to kill spare time when the internet is not working, But why not put your time to more productive use, you can go over this tutorial and try to create a similar application whenever your internet stops working. The tutorial is divided into the followings steps:

Congratulations! you have made it to the end and became a proficient practitioner, you must be proud of yourself. But still, I will recommend you to continue your learning journey and keep exploring new technologies, so that you can keep up with others as Computer Vision and Machine Learning are very competitive and rapidly growing fields.


Some Other Resources

A Detailed Road-map on Making a Career in Computer Vision.

It is a free 10-day email course on making a career in computer vision, that will help you reach your desired outcomes, whether it’s landing a job, becoming a researcher, building projects as a hobby, or whatever it might be, this course will help you get that and it’ll show you an ideal path from start to finish to master the computer vision career roadmap.

You can join the course here.

Computer Vision For Everyone ( CVFE ) Episodes

It’s a high-level course that will teach you everything you need to get started with computer vision that is completely free. The course is designed in a way that people of all skill levels can benefit from it. So you don’t need any background in computer vision or Artificial Intelligence for this course. The released episodes are:

And that’s not all, as more episodes are on the way.

Resource Guide

It’s a free 25 Page Computer Vision Resource Guide in which you will find some of the best resources (Courses, Books, Tutorials, Blogs, and YouTube Channels, etc) out there, to learn computer vision.

You can download it here.