Develop an integrated architecture that implemented on microcontroller, able to recognize hand gesture and optimizing the model performance.
Asthe vision-based technology of hand gesture recognition is an important part of human-computer interaction. Those technologies, such as speech recognition and gesture recognition receive great attention in the field of HCI. The problem was originally tackled by the computer vision community by means of images and videos. More recently the introduction of low-cost consumer depth cameras, has opened the way to several different approaches that exploit the depth information acquired by these devices for improving gesture recognition performance.
The literature study gives insight into the many strategies that may be considered and executed to achieve hand gesture recognition. It also assists in comprehending the benefits and drawbacks of the various strategies. The literature review is separated into two parts: the detection module and the camera module.
In the literature data gloves, hand belts, and cameras have been shown to be the most often utilized techniques of gathering user input. In many research articles, the technique of gesture recognition employs input extraction using data gloves, a hand belt equipped with an accelerometer, and Bluetooth to read hand motions. For pre-processing the image, a variety of approaches were used, including algorithms and techniques for noise removal, edge identification, and smoothening, followed by several segmentation techniques for boundary extraction, i.e., separating the foreground from the background. A standard 2D camera was used for gesture recognition. Earlier it was thought that single camera may not be as effective as stereo or depth aware cameras, but some companies are challenging this theory. For this reason, using Edge Impulse  framework built a Software-based gesture recognition technology using a standard 2D camera that can detect robust hand gestures. The range of image-based gesture recognition systems may also raise concerns about the technology’s practicality for broad application. For example, an algorithm calibrated for one camera may not function with another device or camera. In order to cope with this challenge during dataset creation a class was taken from dataset  and validate this approach using FOMO  algorithm.
The objective was to develop an integrated architecture that implemented on microcontroller, able to recognize hand gesture and optimizing the model performance. As for the TinyML platform, we chose an OpenMV  microcontroller, which acted as a decision unit. The OpenMV (shown in Figure 1) is a small, low power microcontroller that enables the easy and intuitive implementation of image processing applications. It can be programmed using high-level Python scripts (Micro-Python). It is driven by an STM32H74VI ARM Cortex M7 processor running at 480 MHz, which is suitable for most machine vision applications. OpenMV was particularly suitable for our proposed approach due to its low power consumption and simple algorithms will run between 25–50 FPS on QVGA (320x240) resolutions and below. It equipped with a high-performance camera that we used to collect data for the mission purposes.
Figure 1: OpenMV microcontroller
Data collection process
Building this project from the scratch and use OpenMV microcontroller. First thing was collection of data we need to do is to create the data that we are going to use for training the model.
For this step OpenMV was used with build-in camera and OpenMV IDE for dataset creation. In total 30 images were captured from my hand showing 3 different gestures and they are split in 3 folders each folder has its unique class name. All the training (prepared) images are stored in dataset folder. Addition to that for from  superb class was taken to compare the result with created dataset and testing data, which was not taken through microcontroller.
To create a dataset using OpenMV IDE, firstly connect OpenMV to laptop using the USB cable. Click on the connect button, and this action will connect the default data acquisition program. Once successfully connected, you can start taking images of the object and it will save in define class folder.
Figure 2 represent the steps to be follow in order to create a dataset directory and in Figure 3 represent the dataset folder with each class consist of images with unique ID.
Figure 2: Dataset creation using OpenMV IDE
Figure 3: index class representation without label
Once dataset is created all the images were uploaded to Edge Impulse for labeling. Figure 4, represent the Edge Impulse platform how to upload the data for labelling before it process and Figure 5 represent labeled image for class horns.
Figure 4: Uploading data into Edge Impulse platform
Figure 5: Labeling of horns class image
Building and training the model
Images in dataset are now labeled, to train a model FOMO algorithm was used. As FOMO (Faster Objects, More Objects) is a unique machine learning approach that extends object identification to devices with limited processing power. It allows you to count things, locate objects in an image, and track numerous objects in real time while consuming up to 30x less computing power and memory than MobileNet SSD or YOLOv5. Dataset visualization and separability of the classes is presented in Figure 6. Even after rescaling and color conversions, image features have a high dimensionality that prevents suitable visualization. Each image was resized to 48 x 48 pixels, addition to that data augmentation technique was applied.
The number of epochs is the number of times the entire dataset is passed through the neural network during training. There is no ideal number for this, and it depends on the data in total model was run for 60 epochs with learning rate 0.001 and the dataset was split into training, validation, and testing.
After introducing a dynamic quantization from a 32-bit floating point to an 8-bit integer, the resulting optimized model showed a significant reduction in size (75.9KKB). The onboard inference time was reduced to 70 msec and the use of RAM was limited to 63.9 KB with an accuracy after the post-training validation of 87.8%. The model confusion matrix and on mobile device performance can be seen in Figure 7.
Figure 6: Dataset visualization and separability of the classes
Figure 7: Confusion Matrix with model accuracy after quantization
In order to deploy model on microcontroller Figure 8 represent the block diagram the red bounding box is the steps where first model is trained on given data, after that model is converted to tflite file then deployed on a microcontroller.
Here in our case we have to build firmware using Edge Impulse platform Figure 9 represent the steps for OpenMV with red bounding box. Impulses can be deployed as a C++ library. You can include this package in your own application to run the impulse locally. We will be having three files in our zip folder from Edge Impulse, a python micro script, label as txt and tflite file. Once we have the tflite file we can deploy that on our microcontroller in our case we use OpenMV. Copy the tfliteand label file from folder and paste it into OpenMV disk and open python micro script file in OpenMV IDE and start inference. For further details of OpenMV development refer to .
Figure 8: Block diagram for tflite model deployment on OpenMV
Figure 9: Post quantization model deployment
To test the model, some images of hand gestures were that was split during processing steps as test. The live testing on website. it takes the input image as a parameter and predicts the class it belongs to. Before passing the image, we need to ensure that we are using the same dimensions that we used during the training phase here it’s by default the same dimension the Figure 10 represents the result of different class live testing. Moreover, the Figures 11 and 12 are on device testing result for two different class.
Figure 10: Live classification result of class finger on Edge Impulse platform
Figure 11: Classification result of class index using OpenMV
Figure 12: Classification result of class horns using OpenMV
In this project, we have built a develop recognition model based on FOMO algorithm. The result shows that the accuracy of the proposed algorithm on TinyML device is up to 87.8%. However, since the proposed method’s effectiveness is low, the gesture dataset is insufficient. As a result, we can improve the accuracy of recognition and detection steps with more data and classes.
Public Project Link: Hand Gesture Recognition using TinyML on OpenMV — Hackster.io