Hand gesture recognition algorithm based on depth vision combined with statistical image processing and shallow CNN

Hand gesture recognition algorithm based on depth vision combined with statistical image processing and shallow CNN

3D-AI GROUP

With the rapid development of artificial intelligence technology, hand gesture recognition is becoming one of the most effective and important ways of human-computer interaction. In this paper, the problem of hand gesture recognition in complex real scene is studied, and the real-time gesture recognition with high robustness and accuracy is realized based on the depth vision, the statistical gesture segmentation algorithm and the shallow CNN network recognition classification.

For many large-scale hand gesture interaction scenes, it is necessary to identify the hand gesture, firstly, we need to identify the area of hand ROI. In the experiment, we found that Kinect can provide human joint information and extract the ROI region of the hand according to the joint position, but in complex scenes where people are overlapping, the Kinect skeleton extraction will fail, two people will be mistaken for a person, and the result of the wrong joint position will be given. Therefore, we use depth information to achieve hand region detection and generate ROI recognition classification through background removal, point cloud conversion, plane detection and other statistical image processing algorithms, combined with 2D pixel information and 3D depth space information. However, in the process of classification, the identification accuracy of the method was found to be bottleneck. As the data volume increased, the fitting phenomenon occurred, so the feasibility of the CNN scheme was studied and demonstrated.

Through deconvolution and anti-pooling, the feature map is mapped back to the original image space. Visualizing features from each layer of Convolutional Neural Network are as shown in the right figure. Through the verification of Alexnet, it is found that the feature extraction from CNN is not random and unexplainable, but is consistent with the intuitive expectation property. Shallow learning angles, colors, edges and other low-level features, with the number of layers to deepen the response of the abstract, with distinguishing complete features, in order to achieve classification.

In this paper, the method of combining shallow CNN with statistical image segmentation can solve the problem of overfitting and effectively improve the recognition accuracy, compared with traditional SVM. The DTW is used to perform template matching on the static gesture results of successive frames to obtain the recognition result of the action sequence. The comparison of the recognition accuracy rate performance of different methods is given below.

Because our proposed algorithm combined with statistical image processing gesture segmentation and shallow convolutional neural network, compared with the end-to-end deep learning algorithm, the computational complexity of the algorithm is greatly reduced. Real-time running speed of 12.5 FPS and more than 95% of gesture recognition accuracy can be achieved on the i5 CPU.

In addition, this study has been applied to the gesture recognition of railway train drivers. The demonstration at the Guangzhou South Railway Station has excellent performance, as shown below.