Speech Communication and Smart Interactions Laboratories | Department of Telecommunications and Media Informatics

Head of Laboratory:

Speech Communication and Smart Interactions Laboratory

The Speech Communication and Smart Interactions Labs has the following outstanding conpetencies:

- Higher education: Fulbright and DAAD scholarships, 1 st prize at nationwide student conferences (OTDK)

- R&D for technology and application development in the following areas: infocommunications services, mobile information systems, human – machine/robot/vehicle interactions, rehabilation and health, speech and language technologies, multimodal interactions, smart devices and platforms (smartphone, smartTV, smartwatch…).

It consists of three co-operating Labs: a) Speech Technology and Smart Interactions Lab (http://smartlab.tmit.bme.hu/index-en )

Active research for theoretical and applied topics of speech technology and smart interactions. Regarding deep learning besides the basic architectures (feed-forward, convolution, recurrent - LSTM, GRU) we study autoencoders, synthetic gradients and other emerging topics. Additonal topics include text processing, time series classification and prediction. Our colleague - Bálint Gyires-Tóth- has been an accredited trainer and academic ambassador of the NVidia Deep Learning Institute.

The Lab has developed the following technologies and applications:

- multiple speech synthesis technologies on several computer plaforms (Windows, Android, Linux, ...),

- special speech technology applications (e.g. railway announcement system, person and company name synthesis, price list synthesis. Speech synthesis for visually and speech impaired people, communication application for stroke and aphasic patients, call center automation, social robor application, multimodal information system for elderly people)

Recent related EU projects: PAELIFE, VUK, DANSPLAT, AI4EU, APH-ALARM Tasks in the projects: deep learning algorithms, user interface design, system issues

Project laboratory topics

· Conversational AI applications

· Speech synthesis application for aphasic patients

· Talking mobile applications

· Deep learning – based machine learning

· Silent Speech Interface

· Deep learning based self-driving car methods

· My own radio channel

Description of the lectures Human-Computer Interaction, http://smartlab.tmit.bme.hu/education-hci, https://portal.vik.bme.hu/kepzes/targyak/VITMMA11/en/

b) Speech Recognition Lab

We research and develop speech-to-text engines for multiple languages (with an emphasis on Hungarian) and for various tasks: real-time, low latency applications, dictation systems or off-line mass transcription. Automatic speech recognition (ASR) – as a hot topic in Artificial Intelligence – is rooted profoundly in “deep learning” applying its entire arsenal, including GPU/TPU-based gradient calculations, supervised, semi-supervised and unsupervised learning and more.

Project laboratory topics

· Conversational AI applications

· End-to-end ASR

· Semi-supervised acoustic modeling by the Fairseq toolkit

· Unsupervised acoustic modeling using GAN (Generative Adversarial Network) technology

· Low latency ASR inference for streaming using GPU’s

· Neural Language modeling for ASR

· Open-source ASR engine comparison

Description of the lectures

· Infocommunications

· Media Informatic Systems

· Sound and Speech Recognition Technologies

c) Laboratory of Speech Acoustics

The research of speech requires a special interdisciplinary thinking. Researchers of the Laboratory of Speech Acoustics has expertise of psychoacoustics, linguistics, physics, maths and informatics. Research topics: artificial intelligence solutions in healthcare, pathological and mental disorders in speech, forensic voice comparison, basic acoustic-phonetic research, database creation.

Project laboratory topics

· Clinical depression in speech

· Development of Automatic Pathological Speech Recognition and Classification System

· FORENSICspeech: Forensic Voice analysis using Hungarian follow-up voice database

· Detection of Parkinson's Disease Using Speech

· Emotions in speech

Description of the lectures / laboratories that belong to the lab

Bio-inspired Signal Processing and Systems , Diagnostics of Speech and Hearing Disorders

Laboratory groups

Group	Head of group
Speech Technology and Smart Interaction Laboratory (SpeechLab)	Géza Németh, PhD
Laboratory of Speech Recognition (LSR)	Péter Mihajlik, PhD
Laboratory of Speech Acoustics (LSA)	Klára Vicsi, DSc

Magyar