Real-Time Fingers Counter & Hand Gesture Recognizer with Mediapipe and Python

By Taha Anwar and Rizwan Naeem

On August 23, 2021

Watch Video Here

In the last Week’s tutorial, we had learned to perform real-time hands 3D landmarks detection, hands classification (i.e., either left or right), extraction of bounding box coordinates from the landmarks, and the utilization of the depth (z-coordinates) of the hands to create a customized landmarks annotation.

Yup, that was a whole lot, and we’re not coming slow in this tutorial too, 😉

In this week’s tutorial, we’ll learn to utilize the landmarks to count and fingers (that are up) in images and videos and create a real-time hand counter. We will also create a hand finger recognition and visualization application that will display the exact fingers that are up.  This will work for both hands.

Then based on the status (i.e., up/down) of the fingers, we will build a Hand Gesture Recognizer that will be capable of identifying multiple gestures. 

Below are the results on a few sample images but this will also work on camera feed in real-time and on recorded videos as well. 

You will not need any expensive GPU, your CPU will suffice as the whole code is highly optimized.

And that is not all, in the end, on top of all this, we will build a Selfie-Capturing System that will be controlled using hand gestures to enhance the user experience. So we will be able to capture images and also turn an image filter on/off without even touching our device. The image below shows a visual of what this system will be capable of.

Well 🤔, maybe not exactly that but somewhat similar.

Excited yet? I know I am! Before diving into the implementation, let me tell you that as a child, I always was fascinated with the concept of automating the interaction between people and machines and that was one of the reasons I got into programming. 

To be more precise, I wanted to control my computer with my mind, yes I know how this sounds but I was just a kid back then. Controlling computers via mind with high fidelity is not feasible yet but hey Elon, is working on it.. So there’s still hope.

But for now, why don’t we utilize the options we have. I have published some other tutorials too on controlling different applications using hand body gestures.

So I can tell you that using hand gestures to interact with a system is a much better option than using some other part like the mouth since hands are capable of making multiple shapes and gestures without much effort.

Also during these crucial times of covid-19, it is very unsafe to touch the devices installed at public places like ATMs. So upgrading these to make them operable via gestures can tremendously reduce infection risk..

Tony Stark, the boy Genius can be seen in movies to control stuff with his hand gestures, so why let him have all the fun when we can join the party too. 

You can also use the techniques you’ll learn in this tutorial to control any other Human-Computer Interaction based application.

The tutorial is divided into small steps with every step explained in detail in the simplest manner possible. 

Outline

  1. Step 1: Perform Hands Landmarks Detection
  2. Step 2: Build the Fingers Counter
  3. Step 3: Visualize the Counted Fingers
  4. Step 4: Build the Hand Gesture Recognizer
  5. Step 5: Build a Selfie-Capturing System controlled by Hand Gestures

Download Code

Alright, so without further ado, let’s get started.

Import the Libraries

First, we will import the required libraries.

Initialize the Hands Landmarks Detection Model

After that, we will need to initialize the mp.solutions.hands class and then set up the mp.solutions.hands.Hands() function with appropriate arguments and also initialize mp.solutions.drawing_utils class that is required to visualize the detected landmarks. We will be working with images and videos as well, so we will have to set up the mp.solutions.hands.Hands() function two times.

Once with the argument static_image_mode set to True to use with images and the second time static_image_mode set to False to use with videos. This speeds up the landmarks detection process, and the intuition behind this was explained in detail in the previous post.

Step 1: Perform Hands Landmarks Detection

In the step, we will create a function detectHandsLandmarks() that will take an image/frame as input and will perform the landmarks detection on the hands in the image/frame using the solution provided by Mediapipe and will get twenty-one 3D landmarks for each hand in the image. The function will display or return the results depending upon the passed arguments.

The function is quite similar to the one in the previous post, so if you had read the post, you can skip this step. I could have imported it from a separate .py file, but I didn’t, as I wanted to make this tutorial with the minimal number of prerequisites possible.

Now let’s test the function detectHandsLandmarks() created above to perform hands landmarks detection on a sample image and display the results.

Great! got the required landmarks so the function is working accurately.

Step 2: Build the Fingers Counter

Now in this step, we will create a function countFingers() that will take in the results of the landmarks detection returned by the function detectHandsLandmarks() and will utilize the landmarks to count the number of fingers up of each hand in the image/frame and will return the count and the status of each finger in the image as well.

How will it work?

To check the status of each finger (i.e., either it is up or not), we will compare the y-coordinates of the FINGER_TIP landmark and FINGER_PIP landmark of each finger. Whenever the finger will be up, the y-coordinate of the FINGER_TIP landmark will have a lower value than the FINGER_PIP landmark.

But for the thumbs, the scenario will be a little different as we will have to compare the x-coordinates of the THUMB_TIP landmark and THUMB_MCP landmark and the condition will vary depending upon whether the hand is left or right.

For the right hand, whenever the thumb will be open, the x-coordinate of the THUMB_TIP landmark will have a lower value than the THUMB_MCP landmark, and for the left hand, the x-coordinate of the THUMB_TIP landmark will have a greater value than the THUMB_MCP landmark.

Note: You have to face the palm of your hand towards the camera.

Now we will utilize the function countFingers() created above on a real-time webcam feed to count the number of fingers in the frame.

Output Video

Astonishing! the fingers are being counted very fast.

Step 3: Visualize the Counted Fingers

Now that we have built the finger counter, in this step, we will visualize the status (up or down) of each finger in the image/frame in a very appealing way. We will draw left and right handprints on the image and will change the color of the handprints in real-time depending upon the output (i.e., status (up or down) of each finger) from the function countFingers().

  • The hand print will be Red if that particular hand (i.e., either right or left) is not present in the image/frame.
  • The hand print will be Green if the hand is present in the image/frame.
  • The fingers of the hand print, that are up, will be highlighted by with the Orange color and the fingers that are down, will remain Green.

To accomplish this, we will create a function annotate() that will take in the output of the function countFingers() and will utilize it to simply overlay the required hands and fingers prints on the image/frame in the required color.

We have the .png images of the hands and fingers prints in the required colors (red, green, and orange) with transparent backgrounds, so we will only need to select the appropriate images depending upon the hands and fingers statuses and overlay them on the image/frame. You will also get these images with the code when you will download them.

Now we will use the function annotate() created above on a webcam feed in real-time to visualize the results of the fingers counter.

Output Video

Woah! that was Cool, the results are delightful.

Step 4: Build the Hand Gesture Recognizer

We will create a function recognizeGestures() in this step, that will use the status (i.e., up or down) of the fingers outputted by the function countFingers() to determine the gesture of the hands in the image. The function will be capable of identifying the following hand gestures:

  • V Hand Gesture ✌️ (i.e., only the index and middle finger up)
  • SPIDERMAN Hand Gesture 🤟 (i.e., the thumb, index, and pinky finger up)
  • HIGH-FIVE Hand Gesture ✋ (i.e., all the five fingers up)

For the sake of simplicity, we are only limiting this to three hand gestures. But if you want, you can easily extend this function to make it capable of identifying more gestures just by adding more conditional statements.

Now we will utilize the function recognizeGestures() created above to perform hand gesture recognition on a few sample images and display the results.

Step 5: Build a Selfie-Capturing System controlled by Hand Gestures

In the last step, we will utilize the gesture recognizer we had made in the last step to trigger a few events. As our gesture recognizer can identify only three gestures (i.e., V Hand Gesture (✌️), SPIDERMAN Hand Gesture (🤟), and HIGH-FIVE Hand Gesture (✋)).

So to get the most out of it, we will create a Selfie-Capturing System that will be controlled using hand gestures. We will allow the user to capture and store images into the disk using the ✌️ gesture. And to spice things up, we will also implement a filter applying mechanism in our system that will be controlled by the other two gestures. To apply the filter on the image/frame the 🤟 gesture will be used and the ✋ gesture will be used to turn off the filter.

Output Video

As expected, the results are amazing, the system is working very smoothly. If you want, you can extend this system to have multiple filters and introduce another gesture to switch between the filters.

Join My Course Computer Vision For Building Cutting Edge Applications Course

The only course out there that goes beyond basic AI Applications and teaches you how to create next-level apps that utilize physics, deep learning, classical image processing, hand and body gestures. Don’t miss your chance to level up and take your career to new heights

You’ll Learn about:

  • Creating GUI interfaces for python AI scripts.
  • Creating .exe DL applications
  • Using a Physics library in Python & integrating it with AI
  • Advance Image Processing Skills
  • Advance Gesture Recognition with Mediapipe
  • Task Automation with AI & CV
  • Training an SVM machine Learning Model.
  • Creating & Cleaning an ML dataset from scratch.
  • Training DL models & how to use CNN’s & LSTMS.
  • Creating 10 Advance AI/CV Applications
  • & More

Whether you’re a seasoned AI professional or someone just looking to start out in AI, this is the course that will teach you, how to Architect & Build complex, real world and thrilling AI applications

Summary:

In this tutorial, we have learned to perform landmarks detection on the prominent hands in images/videos, to get twenty-one 3D landmarks, and then use those landmarks to extract useful info about each finger of the hands i.e., whether the fingers are up or down. Using this methodology, we have created a finger counter and recognition system and then learned to visualize its results.

We have also built a hand gesture recognizer capable of identifying three different gestures of the hands in the images/videos based on the status (i.e., up or down) of the fingers in real-time and had utilized the recognizer in our Selfie-Capturing System to trigger multiple events.

Now here are a few limitations in our application that you should know about, for our finger counter to work properly the user has to face the palm of his hand towards the camera in front of him. As the directions of the thumbs change based upon the orientation of the hand. And the approach we are using completely depends upon the direction. See the image below.

But you can easily overcome this limitation by using accumulated angles of joints to check whether each finger is bent or straight. And for that, you can check out the tutorial I had published on Real-Time 3D Pose Detection as I had used a similar approach in it to classify the poses.

Another limitation is that we are using the finger counter to determine the gestures of the hands and unfortunately complex hand gestures can have the same fingers up/down like the victory hand gesture (✌), and crossed fingers gesture (🤞). To get around this, you can train a deep learning model on top of some target gestures.

You can reach out to me personally for a 1 on 1 consultation session in AI/computer vision regarding your project. Our talented team of vision engineers will help you every step of the way. Get on a call with me directly here.

Ready to seriously dive into State of the Art AI & Computer Vision?
Then Sign up for these premium Courses by Bleed AI

Designing Advanced Image Filters in OpenCV | Creating Instagram Filters – Pt 3⁄3

Designing Advanced Image Filters in OpenCV | Creating Instagram Filters – Pt 3⁄3

This is the last tutorial in our 3 part Creating Instagram Filters series. In this tutorial, you will learn to create 10 very interesting and cool Instagram filters-like effects on images and videos. The Filters which are gonna be covered are; Warm Filter, Cold Filter, Gotham Filter, GrayScale Filter, Sepia Filter, Pencil Sketch Filter, Sharpening Filter, Detail Enhancing Filter, Invert Filter, and Stylization Filter.

10 Comments

  1. Abhinav

    Thank you for the amazing demo 🙂 The best approach I’ve come across for solving the finger counting problem with the python mediapipe library. Gestures triggering events are the cherry on top of this fantastically baked tutorial.
    Cheers!

    Reply
    • Taha Anwar

      Thank you Abhinav for your kind comments, I’m glad you loved it.

      Reply
  2. Thomas Hu

    One of the best mediapipe tutorials. I downloaded the source code. However it is from your previous tutorial ” Real-Time 3D Hands Landmarks Detection & Hands Classification with Mediapipe and Python”. Would you please update the link to the source code or e-mail to me? Thanks!

    Reply
    • Taha Anwar

      Thank you thomas, I will look into this and update the source code in 1-2 days.

      Reply
      • Diego L. M.

        Hello Taha! Thanks for the course 🙂

        I tried to download it also and got the same files from the previous tutorial..

        Could you update this? Thank you very much!

        Reply
        • Taha Anwar

          Hi Diego, I think this issue was fixed.

          Reply
  3. 魚丸

    Could you tell me about
    def annotate(image, results, fingers_statuses, count, display=True): <<<<<<<<this function

    line 93:
    ROI[alpha_channel==255] = hand_imageBGR[alpha_channel==255] mean?

    I try to run it but it can't be successful.
    And I try to let ROI(line 88)from [BGR] to [BGRA],because ROI is come from output_image(just [BGR]).
    But ROI is a range,so did it can be revise a part of output_image([BGR]) to[BGRA]?
    I'm come from Taiwan and my English is not well ,I try to tell you my question by English strive as much as possible,Hope you can understand ,tank you!

    Reply
    • Rizwan Naeem

      The line; ROI[alpha_channel==255] = hand_imageBGR[alpha_channel==255] simply just overlays the handprint image on the ROI image by updating the pixel values of the ROI at the indexes where the alpha channel has the value 255.

      ROI is just a cropped part of the output image (that is a copy of the input image). And we cannot and do not need to convert the output image into [BGRA] because we do not have an alpha channel of the output image the alpha channel we have, is of the handprint image.

      Reply
  4. Zixuan Xu

    Great tutorial, sir
    I was confused about the last part line 122
    ” frame[filter image BGRA[:,:,-1]==255] = filter_imageBGR[filter_imageBGRA[:,:,-1]==255]” meaning?
    Could you please explain this a little? Thanks!

    Reply
    • Rizwan Naeem

      Thanks, Zixuan Xu. This line updates the pixel values of the frame with the pixel values of the filter image at the indexes where the alpha channel of the filter image has the value 255. In simple words, this line will simply overlay the filter image over the webcam frame.

      Reply

Submit a Comment

Your email address will not be published. Required fields are marked *