Real-Time Facial Expression Recognition and Speech Tran-scripts over an on-premise Video Conference Application

Document Type : Original Article

Authors

Mansoura University Electronics and Communication Engineering Department, Faculty of Engi-neering, Mansoura University

Abstract

Since Covid-19 pandemic outbreak, organizations and individuals have had to use vid-eo conference applications increasingly. However, the commercial video conference applications are expensive, and feature limited. This paper discusses how to enable organizations to host on-premise video conference applications. Then, it explores assisting organization’s stakeholders with making decisions based on facial expressions of video conference attendees. Moreover, it facili-tates transcribing speech into text to enable deaf persons to participate in online conferences. Technologies and tools used in addressing these challenges respectively are: (i) Web Real Time Communication (WebRTC) project, (ii) Tensorflow.js library, (iii) and Web Speech Application Programming Interface (API). This paper depends on integration between a collection of technol-ogies, libraries, standards, and protocols. Most of them can be managed using JavaScript frame-work. Hence, load of the performance is distributed on each client-side device. The proposed on-premise video conference application has been enhanced through including facial expression recognition with 66% high accuracy while the speech-into-text feature with Word Error Rates (WER) are 0 and 0.12 for British English and Egyptian Arabic, respectively

Keywords