Gesture Recognition of RGB and RGB-D Static Images Using Convolutional Neural Networks

  1. Rubén González-Crespo
  2. Elena Verdú
  3. Manju Khari
  4. Aditya Kumar Garg
Journal:
IJIMAI

ISSN: 1989-1660

Year of publication: 2019

Volume: 5

Issue: 7

Pages: 22-27

Type: Article

DOI: 10.9781/IJIMAI.2019.09.002 DIALNET GOOGLE SCHOLAR lock_openDialnet editor

More publications in: IJIMAI

Abstract

In this era, the interaction between Human and Computers has always been a fascinating field. With the rapid development in the field of Computer Vision, gesture based recognition systems have always been an interesting and diverse topic. Though recognizing human gestures in the form of sign language is a very complex and challenging task. Recently various traditional methods were used for performing sign language recognition but achieving high accuracy is still a challenging task. This paper proposes a RGB and RGB-D static gesture recognition method by using a fine-tuned VGG19 model. The fine-tuned VGG19 model uses a feature concatenate layer of RGB and RGB-D images for increasing the accuracy of the neural network. Finally, on an American Sign Language (ASL) Recognition dataset, the authors implemented the proposed model. The authors achieved 94.8% recognition rate and compared the model with other CNN and traditional algorithms on the same dataset.

Bibliographic References

  • W. H. O. (WHO), “Deafness and hearing loss,” 2019. Available at: https:// www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (last visited on 13 June 2019).
  • L. Kin, T. Tian, R. Anuar, Z. Yahya, and A. Yahya, “Sign Language Recognition System using SEMG and Hidden Markov Model,” Conference on Recent Advances in Mathematical Methods, Intelligent Systems and Materials, 2013, pp. 50–53.
  • M. P. Lewis, G. F. Simons, and C. D. Fennig, Ethnologue: Languages of the World, 17 edn. Dallas: Sil International, 2013.
  • E. Verdú, C. Pelayo G-Bustelo, M. A. Martínez and R. GonzalezCrespo, “A System to Generate SignWriting for Video Tracks Enhancing Accessibility of Deaf People,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 4, no. 6, pp. 109-115, 2017. doi: 10.9781/ijimai.2017.09.002
  • R.E. Mitchell, T.A. Young, B. Bachleda, M.A. Karchmer, “How many people use ASL in the United States? Why estimates need updating,” Sign Language Studies, vol. 6, no. 3, pp. 306–335, 2006.
  • A. Kumar, A. Kumar, S. K. Singh and R. Kala, “Human Activity Recognition in Real-Times Environments using Skeleton Joints,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 3, no. 7, pp. 61-69, 2016. doi: 10.9781/ijimai.2016.379
  • M. Raees and S. Ullah, “EVEN-VE: Eyes Visibility Based Egocentric Navigation for Virtual Environments,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 3, pp. 141-151, 2018. doi: 10.9781/ijimai.2018.08.002
  • I. Rehman, S. Ullah and M. Raees, “Two Hand Gesture Based 3D Navigation in Virtual Environments,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 4, pp. 128-140, 2019. doi: 10.9781/ijimai.2018.07.001D.
  • Aryanie and Y. Heryadi, “American sign language-based finger-spelling recognition using k-Nearest Neighbors classifier,” 2015 3rd International Conference on Information and Communication Technology (ICoICT), Nusa Dua, 2015, pp. 533-536. doi: 10.1109/ICoICT.2015.7231481
  • M. M. Islam, S. Siddiqua and J. Afnan, “Real time Hand Gesture Recognition using different algorithms based on American Sign Language,” 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Dhaka, 2017, pp. 1-6. doi: 10.1109/ ICIVPR.2017.7890854
  • C. Szegedy, W. Liu, Y. Jia, Y. et al., “Going deeper with convolutions,” In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9. doi: 10.1109/CVPR.2015.7298594.
  • K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • C. Szegedy, V. Vanhoucke, S. Ioffe et al., “Rethinking the inception architecture for computer vision,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, USA, June 2016, pp. 2818–2826.
  • V. Gulshan, L. Peng, M. Coram, et al., “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” JAMA, vol. 316, no. 22, pp. 2402–2410, 2016. doi:10.1001/jama.2016.17216
  • B. Xie, X. He, and Y. Li, “RGB-D static gesture recognition based on convolutional neural network,” The Journal of Engineering, vol. 2018, no. 16, pp. 1515-1520, 2018, doi: 10.1049/joe.2018.8327
  • N. Pugeault and R. Bowden, “Spelling it out: Real-time ASL fingerspelling recognition,” 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, 2011, pp. 1114-1119. doi: 10.1109/ICCVW.2011.6130290
  • B. Estrela, G. Cámara-Chávez, M.F. Campos, W.R. Schwartz and E.R. Nascimento, “Sign language recognition using partial least squares and RGB-D information,” In Proceedings of the IX Workshop de Visao Computacional, WVC, 2013.
  • C. Chuan, E. Regina and C. Guardino, “American Sign Language Recognition Using Leap Motion Sensor,” 2014 13th International Conference on Machine Learning and Applications, Detroit, MI, 2014, pp. 541-544. doi: 10.1109/ICMLA.2014.110
  • L. Rioux-Maldague and P. Giguère, “Sign Language Fingerspelling Classification from Depth and Color Images Using a Deep Belief Network,” 2014 Canadian Conference on Computer and Robot Vision, Montreal, QC, 2014, pp. 92-97. doi: 10.1109/CRV.2014.20
  • S. Ameen and S. Vadera, “A convolutional neural network to classify American Sign Language fingerspelling from depth and colour images,” Expert Systems, vol. 34, no. 3, 2017, e12197. TABLE I. Comparison between Traditional Models and Proposed Model Recognition Methods Gabor+RDF SIFT+PLS H3DF+SVM Our Model Recognition Rate 75% 71.51% 73.3% 94.8% TABLE II. Comparison between Other CNN Models and Proposed Model Recognition Methods CaffeNet VGG16 VGG19 Inception V3 Our Model Recognition Rate 73.75% 83.44% 87.37% 88.15% 94.8% - 27 - Regular Issue
  • Q. Dai, J. Hou, P. Yang, X. Li, F. Wang, and X. Zhang, “The Sound of Silence: End-to-End Sign Language Recognition Using SmartWatch,” in Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, 2017, pp. 462-464.
  • W. Tao, M.C. Leu and Z. Yin, “American Sign Language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion,” Engineering Applications of Artificial Intelligence, vol. 76, pp. 202-213, 2018.
  • T. W. Chong and B.G. Lee, “American sign language recognition using leap motion controller with machine learning approach,” Sensors, vol. 18, no. 10, 3554, 2018.
  • K. M. Lim, A. W. C. Tan, C. P. Lee and S. C. Tan, “Isolated sign language recognition using Convolutional Neural Network hand modelling and Hand Energy Image,” Multimedia Tools and Applications, vol. 78, no. 14, pp. 19917-19944, 2019.
  • J. Hou, X. Y. Li, P. Zhu, Z. Wang, Y. Wang, J. Qian, J. and P. Yang, “SignSpeaker: A Real-time, High-Precision SmartWatch-based Sign Language Translator,” in Proceedings of the 25th Annual International Conference on Mobile Computing and Networking (MobiCom ’19), Los Cabos, Mexico, 2019, article no. 24.
  • A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, Lake Tahoe, Nevada, 2012, pp. 1097-1105