Overview
Asthe vision-based technology of hand gesture recognition is an important part of human-computer interaction. Those technologies, such as speech recognition and gesture recognition receive great attention in the field of HCI. The problem was originally tackled by the computer vision community by means of images and videos. More recently the introduction of low-cost consumer depth cameras, has opened the way to several different approaches that exploit the depth information acquired by these devices for improving gesture recognition performance.
The literature study gives insight into the many strategies that may be considered and executed to achieve hand gesture recognition. It also assists in comprehending the benefits and drawbacks of the various strategies. The literature review is separated into two parts: the detection module and the camera module.
Problem Description
In the literature data gloves, hand belts, and cameras have been shown to be the most often utilized techniques of gathering user input. In many research articles, the technique of gesture recognition employs input extraction using data gloves, a hand belt equipped with an accelerometer, and Bluetooth to read hand motions. For pre-processing the image, a variety of approaches were used, including algorithms and techniques for noise removal, edge identification, and smoothening, followed by several segmentation techniques for boundary extraction, i.e., separating the foreground from the background. A standard 2D camera was used for gesture recognition. Earlier it was thought that single camera may not be as effective as stereo or depth aware cameras, but some companies are challenging this theory. For this reason, using Edge Impulse [1] framework built a Software-based gesture recognition technology using a standard 2D camera that can detect robust hand gestures. The range of image-based gesture recognition systems may also raise concerns about the technology’s practicality for broad application. For example, an algorithm calibrated for one camera may not function with another device or camera. In order to cope with this challenge during dataset creation a class was taken from dataset [2] and validate this approach using FOMO [3] algorithm.
Proposed Hardware
The objective was to develop an integrated architecture that implemented on microcontroller, able to recognize hand gesture and optimizing the model performance. As for the TinyML platform, we chose an OpenMV [4] microcontroller, which acted as a decision unit. The OpenMV (shown in Figure 1) is a small, low power microcontroller that enables the easy and intuitive implementation of image processing applications. It can be programmed using high-level Python scripts (Micro-Python). It is driven by an STM32H74VI ARM Cortex M7 processor running at 480 MHz, which is suitable for most machine vision applications. OpenMV was particularly suitable for our proposed approach due to its low power consumption and simple algorithms will run between 25–50 FPS on QVGA (320x240) resolutions and below. It equipped with a high-performance camera that we used to collect data for the mission purposes.