SPECOM 2016 conference proceedings - LNAI 9811 is now available online. You can access the online version at here.
Program at a glance
Hours | Tue | Wednesday | Thursday | Friday | Sat | |
8.00 | Registration | |||||
8.30 | Opening ceremony | |||||
9.00 | Keynote lecture of Ralf Schlueter | Keynote lecture of Attila Vékony | Keynote lecture of Nick Campbell | Budapest tour | ||
10.00 | Coffee break | Coffee break | Coffee break | |||
10.30 | Speech recognition and understanding | Multimodal human-machine interaction | Natural language processing | |||
12.30 | Lunch | Lunch | Lunch | |||
14.00 | SPECOM Poster session I | ICR Poster session | SPECOM Poster session II | |||
16.00 | Registration | Coffee break | Coffee break | Coffee break | ||
16.30 | Speech synthesis | Interactive collaborative robotics | Speech signal processing | Speaker and language recognition | ||
18.30 | Welcome reception 18.30 - 20.00 | Closing ceremony | ||||
19.30 | Gala dinner on the Danube 19.30 - 21.30 | |||||
| ||||||
The detailed program can be downloaded here
Detailed Technical Program
Tuesday, August, 23th
16:00-18:00 Registration
18:30-20:00 Welcome Reception
Wednesday, August, 24th
08:00-08:30 Registration
08:30-09:00 Opening ceremony
09:00-10:00 Keynote speech: Automatic Speech Recognition based on Neural Networks
Ralf Schlueter, RWTH Aachen University, Germany
Chair: Géza Németh, Budapest University of Technology and Economics, Hungary
10:00-10:30 Coffee break
10:30-12:30 Speech recognition and understanding
Chair: Alexey Karpov, SPIIRAS, Russia
10:30-10:50 Adaptation of DNN Acoustic Models using KL-divergence Regularization and Multi-Task Training
Lászlo Tóth and Gábor Gosztolya
10:50-11:10 Improving Automatic Speech Recognition Containing Additive Noise Using Deep Denoising, Autoencoders of LSTM Networks
Marvin Coto, John Goddard and Fabiola Martinez
11:10-11:30 Knowledge Transfer for Utterance Classification in Low-Resource Languages
Andrei Smirnov and Valentin Mendelev
11:30-11:50 Designing Syllable Models for an HMM based Speech Recognition System
Kseniya Proenca, Kris Demuynck and Dirk Van Compernolle
11:50-12:10 In-document Adaptation for a Human Guided Automatic Transcription Service
André Mansikkaniemi, Mikko Kurimo and Krister Lindén
12:10-12:30 Automatic Summarization of Highly Spontaneous Speech
András Beke and György Szaszák
12:30-14:00 Lunch
14:00-16:00 SPECOM Poster session I
Chair: Ralf Schlueter, RWTH Aachen University, Germany
P1: Exploring GMM-derived Features for Unsupervised Adaptation of Deep Neural Network Acoustic Models
Natalia Tomashenko, Yuri Khokhlov, Anthony Larcher and Yannick Estève
P2: DNN-based Acoustic Modeling for Russian Speech Recognition Using Kaldi
Irina Kipyatkova and Alexey Karpov
P3: Improving the Quality of Automatic Speech Recognition in Trucks
Maхim Korenevsky, Ivan Medennikov and Vadim Shchemelinin
P4: Feature Space VTS with Phase Term Modeling
Maxim Korenevsky and Aleksei Romanenko
P5: LSTM-based Language Models for Spontaneous Speech Recognition
Ivan Medennikov and Anna Bulusheva
P6: Speaker-dependent bottleneck features for Egyptian Arabic speech recognition
Aleksei Romanenko and Valentin Mendelev
P7: Advances in STC Russian Spontaneous Speech Recognition System
Ivan Medennikov and Alexey Prudnikov
P8: Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech
György Szaszák, Máté Ákos Tündik, Branislav Gerazov and Aleksandar Gjoreski
P9: Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation Chitralekha Bhat, Bhavik Vachhani and Sunil Kumar Kopparapu
P10: Comparison of Retrieval Approaches and Blind Relevance Feedback Methods within the Czech Speech Information Retrieval
Lucie Skorkovska
P11: A Phonetic Segmentation Procedure Based on Hidden Markov Models
Edvin Pakoci, Branislav Popović, Nikša Jakovljević, Darko Pekar and Fathy Yassa
P12: Stress, arousal, and stress detector trained on acted speech database
Róbert Sabo, Milan Rusko, Andrej Ridzik and Jakub Rajčani
P13: Improvements to Prosodic Variation in Long Short-Term Memory based Intonation Models Using Random Forest
Bálint Pál Tóth, Balázs Szórádi and Géza Németh
P14: Fusing various audio feature sets for detection of Parkinson's disease from sustained voice and speech recordings
Evaldas Vaiciukynas, Antanas Verikas, Adas Gelzinis, Marija Bacauskiene, Kestutis Vaskevicius, Virgilijus Uloza, Evaldas Padervinskis and Jolita Ciceliene
P15: Investigation of Speech Signal Parameters Reflecting the Truth of Transmitted Information
Victor Budkov, Irina Vatamaniuk, Vladimir Basov and Daniyar Volf
P16: Trade-off between speed and accuracy for Noise Variance Minimization (NVM) pitch estimation algorithm
Andrey Barabanov and Aleksandr Melnikov
P17: Study on the improvement of intelligibility for elderly speech using formant frequency shift method
Yuto Tanaka, Mitsunori Mizumachi and Yoshihisa Nakatoh
P18: Quality Assessment of two Fullband Audio Codecs Supporting Real-Time Communication
Michael Maruschke, Oliver Jokisch, Martin Meszaros, Franziska Trojahn and Mario Hoffmann
P19: A Deep Neural Networks (DNN) Based models for a Computer Aided Pronunciation Learning System
Mohamed Elaraby, Mustafa Abdallah, Sherif Abdou and Mohsen Rashwan (in absentia)
P20: Evaluation of Response Times on a Touch Screen using Stereo Panned Speech Command Auditory Feedback
Hunor Nagy and György Wersényi
P21: Speech Enhancement with Microphone Array Using a Multi Beam Adaptive Noise Suppressor
Mikhail Stolbov and Alexander Lavrentyev
P22: Microphone Array Directivity Improvement in Low-Frequency Domain for Speech Processing
Sergei Aleinik and Mikhail Stolbov
P23: Optimization of Zelinski post-filtering calculation
Sergei Aleinik
P24: Assessment of the relation between low-frequency features and velum opening by using real articulatory data
Alexander Sepulveda-Sepulveda and German Castellanos-Dominguez
P25: Evaluation of the speech quality during rehabilitation after surgical treatment of the cancer of oral cavity and oropharynx based on a comparison of the Fourier spectra
Evgeny Kostyuchenko, Roman Mescheryakov, Dariya Ignatieva, Alexander Pyatkov, Evgeny Choynzonov and Lidiya Batatskaya
16:00-16:30 Coffee break
16:30-18:30 Speech synthesis
Chair: Géza Németh, Budapest University of Technology and Economics, Hungary
16:30-16:50 Ensemble Deep Neural Network based Waveform-Driven Stress Model for Speech Synthesis
Bálint Pál Tóth, Kornél István Kiss, György Szaszák and Géza Németh
16:50-17:10 DNN-Based Duration Modeling for Synthesizing Short Sentences
Péter Nagy and Géza Németh
17:10-17:30 Experiments with One-Class Classifier as a Predictor of Spectral Discontinuities in Unit Concatenation
Daniel Tihelka, Martin Grůber and Markéta Jůzová
17:30-17:50 Phonetic Aspects of High Level of Naturalness in Speech Synthesis
Vera Evdokimova, Pavel Skrelin, Andrey Barabanov and Karina Evgrafova
17:50-18:10 An agonist-antagonist pitch production model
Branislav Gerazov and Philip N. Garner
18:10-18:30 An UMP (Universal Melodic Portraits) Model of Pitch Contours Stylization for Analysis and Synthesis of Intonation
Boris Lobanov
Thursday, August, 25th
09:00-10:00 Keynote speech: Speech Recognition Challenges in the Car Navigation Industry
Attila Vékony, NNG Software Developing and Commercial Llc. Hungary
Chair: Andrey Ronzhin, SPIIRAS, Russia
10:00-10:30 Coffee break
10:30-12:30 Multimodal human-machine interaction
Chair: Milos Zelezny, University of West Bohemia, Czech Republic
10:30-10:50 Toward Sign Language Motion Capture Dataset Building
Zdeněk Krňoul, Pavel Jedlička, Jakub Kanis and Milos Zelezny
10:50-11:10 Selecting Keypoint Detector and Descriptor Combination for Augmented Reality Application
Lukáš Bureš and Luděk Müller
11:10-11:30 Human-Robot Interaction using Brain-Computer Interface
Lev Stankevich and Konstantin Sonkin
11:30-11:50 Attention Training Game with Aldebaran Robotics NAO and Brain-Computer Interface
Evgeny Shandarov, Stepan Gomilko and Alina Zimina
11:50-12:10 HAVRUS Corpus: High-speed Recordings of Audio-Visual Russian Speech
Vasilisa Verkhodanova, Alexander Ronzhin, Irina Kipyatkova, Denis Ivanko, Alexey Karpov and Milos Zelezny
12:10-12:30 Speech Recognition combining MFCCs and Image Features (Skype)
Stamatis Karlos, Nikos Fazakis, Katerina Karanikola, Sotiris Kotsiantis and Kyriakos Sgarbas
12:30-14:00 Lunch
14:00-16:00 ICR Poster session
Chair: Eugene Larkin, Tula State University, Russia
P1: Decentralized Approach to Control of Robot Groups During Execution of the Task Flow
Igor Kalyaev, Anatoly Kalyaev and Iakov Korovin
P2: A Recovery Method for the Robotic Decentralized Control System with Performance Redundancy
Iakov Korovin, Eduard Melnik and Anna Klimenko
P3: Control Algorithms for Heterogeneous Vehicle Groups Control in Obstructed 2-D Environments
Viacheslav Pshikhopov, Mikhail Medvedev, Anatoly Gaiduk and Aleksandr Kolesnikov
P4: Method of Spheres for Solving 3D Formation Task in a Group of Quadrotors
Donat Ivanov, Sergey Kapustyan and Igor Kalyaev
P5: Multi-Robot Exploration and Mapping Based on the Subdefinite Models
Valery Karpov, Alexander Migalev, Anton Moscowsky, Maxim Rovbo and Vitaly Vorobiev
P6: Simulation of Commands Execution by Mobile Robot
Eugene Larkin, Alexey Ivutin, Vladislav Kotov and Alexander Privalov
P7: The Effectiveness of Rescuing Casualties when Using Robotic Systems
Anna Motienko, Igor Dorozhko, Anatoly Tarasov and Oleg Basov
P8: Distributed Information System for Collaborative Robots and IoT Devices
Siarhei Herasiuta, Uladzislau Sychou and Ryhor Prakapovich
P9: Positioning Method Basing on External Reference Points for Surgical Robots
Ekaterina Sinyavskaya, Elena Shestova, Mikhail Medvedev and Evgenij Kosenko
P10: Hardware-Software Solution for Three-Dimensional Model Control in Volumetric Display Testing Unit for Visualization and Dispatching Applications
Alexander Bolshakov, Arthur Sgibnev, Tatiana Chistyakova, Viktor Glazkov and Dmitry Lachugin P11: Educational Marine Robotics in SMTU
Mikhail Chemodanov, Ryzhov Vladimir, Nickolay Semenov, Kirill Rozhdestvensky and Igor Kozhemyakin
P12: Designing Simulation Model of Humanoid Robot to Study Servo Control System Alexander Denisov, Viktor Budkov and Daniil Mikhalchenko
P13: Speech Dialog as a Part of Interactive "Human-Machine" Systems
Rodmonga Potapova
P14: Human-Machine Speech-Based Interfaces with Augmented Reality and Interactive Systems for Controlling Mobile Cranes
Maciej J. Majewski and Wojciech Kacalak
P15: Preprocessing Data for Facial Gestures Classifier on the Basis of the Neural Network Analysis of Biopotentials Muscle Signals
Raisa Budko and Irina Starchenko
P16: Mimic Recognition and Reproduction in Bilateral Human-Robot Speech Communication
Arkady S. Yuschenko, Sergey Vorotnikov, Dmitry Konyshev and Andrey Zhonin
P17: Interactive Collaborative Robotics and Natural Language Interface Based on Multi-Agent Recursive Cognitive Architectures
Murat Anchokov, Zalimkhan Nagoev, Vladimir Denisenko, Boris Tazhev and Zaurbek Sundukov P18: An Analysis of Visual Faces Datasets
Ivan Gruber, Miroslav Hlaváč, Marek Hrúz, Miloš Železný and Alexey Karpov
P19: Voice Dialogue with a Collaborative Robot Driven by Multimodal Semantics
Alexander Kharlamov and Konstantin Ermishin
P20: Human-Smartphone Interaction for Dangerous Situation Detection & Recommendation Generation while Driving
Alexander Smirnov, Alexey Kashevnik and Igor Lashkov
P21: Conceptual Model of Cyberphysical Environment Based on Collaborative Work of Distributed Means and Mobile Robots
Anton Saveliev, Oleg Basov and Andrey Ronzhin
P22: The Humanoid Robot Assistant for a Preschool Children
Evgeny Shandarov, Alina Zimina, Dmitry Rimer, Evgenia Sokolova and Olga Shandarova
16:00-16:30 Coffee break
16:30-18:30 Interactive collaborative robotics
Chair: Roman Meshcheryakov, TUSUR, Russia
16:30-16:50 Development of Wireless Charging Robot for Indoor Environment based on Probabilistic Roadmap
Yi-Shiun Wu, Chi-Wei Chen and Hooman Samani
16:50-17:10 Mechanical Leg Design of the Anthropomorphic Robot Antares
Nikita Pavluk, Victor Budkov, Andrey Kodyakov and Andrey Ronzhin
17:10-17:30 YuMi, come and play with me! A Collaborative Robot for piecing together a Tangram Puzzle
David Kirschner, Rosemarie Velik, Saeed Yahyanejad, Mathias Brandstötter and Michael Hofbaur
17:30-17:50 A Control Strategy for a Lower Limb Exoskeleton with a Toe Joint
Sergei Savin, Sergey Jatsun and Andrey Yatsun
17:50-18:10 Robot Soccer Team for RoboCup Humanoid KidSize League
Evgeny Shandarov, Stepan Gomilko, Darya Zhulaeva, Dmitry Rimer, Dmitry Yakushin and Roman Meshcheryakov
18:10-18:30 Smart M3-Based Robot Interaction Scenario for Coalition Work
Alexander Smirnov, Alexey Kashevnik, Sergey Mikhailov, Mikhail Mironov and Mikhail Petrov
16:30-18:30 Speech signal processing
Chair: László Tóth, University of Szeged
16:30-16:50 Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments
Surasak Boonkla, Masashi Unoki and Stanislav S. Makhanov
16:50-17:10 An Algorithm for Phase Manipulation in a Speech Signal
Darko Pekar, Siniša Suzić, Robert Mak, Meir Friedlander and Milan Sečujski
17:10-17:30 Detecting Laughter and Filler Events by Time Series Smoothing with Genetic Algorithms
Gábor Gosztolya
17:30-17:50 Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit
Alexey Petrovsky, Vadzim Herasimovich and Alexander Petrovsky
17:50-18:10 Statistical analysis of acoustical parameters in the voice of children with juvenile dysphonia
Miklós Gábriel Tulics, Ferenc Kazinczi and Klára Vicsi
18:10-18:30 Precise estimation of harmonic parameter trend and modification of a speech signal
Andrey Barabanov, Evgenij Vikulov and Valentin Magerkin
19:30-21:30 Gala dinner on the Danube
Friday, August, 26th
09:00-10:00 Keynote speech: Machine Processing of Dialogue States; Speculations on Conversational Entropy
Nick Campbell, Trinity College Dublin, Ireland
Chair: Rodmonga Potapova, MSLU, Russia
10:00-10:30 Coffee break
10:30-12:30 Natural language processing
Chair: Rodmonga Potapova, MSLU, Russia
10:30-10:50 Text Classification in the Domain of Applied Linguistics as Part of a Pre-editing Module for Machine Translation Systems
Ksenia Oskina
10:50-11:10 Backchanneling via Twitter Data for Conversational Dialogue Systems
Michimasa Inaba and Kenichi Takahasi
11:10-11:30 Measuring prosodic entrainment in Italian collaborative game-based dialogues
Michelina Savino, Loredana Lapertosa, Alessandro Caffò and Mario Refice
11:30-11:50 A Preliminary Exploration of Group Social Engagement Level Recognition in Multiparty Casual Conversation
Yuyun Huang, Emer Gilmartin, Benjamin R. Cowan and Nick Campbell
11:50-12:10 Interaction Quality as a Human-Human Task-Oriented Conversation Performance (ppsx)
Anastasiia Spirina, Olesia Vaskovskaia, Maxim Sidorov and Alexander Schmitt
12:10-12:30 A comparison of acoustic features of speech of typically developing children and children with autism spectrum disorders
Elena Lyakso, Olga Frolova and Aleksey Grigorev
SPECOM Poster session II
14:00-16:00 Chair: Nick Campbell, Trinity College Dublin, Ireland
P1: Polybasic Attribution of Social Network Discourse
Rodmonga Potapova and Vsevolod Potapov
P2: Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech using SVM
Vasilisa Verkhodanova and Vladimir Shapranov
P3: Multimodal Perception of Aggressive Behavior
Rodmonga Potapova and Liliya Komalova
P4: Designing High-Coverage Multi-Level Text Corpus for Non-Professional-Voice Conservation
Markéta Jůzová, Daniel Tihelka and Jindřich Matoušek
P5: A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English
Tijana Delić, Branislav Gerazov, Branislav Popović and Milan Sečujski
P6: Emotional speech of 3-years old children: norm-risk-deprivation
Olga Frolova and Elena Lyakso
P7: Profiling a Set of Personality Traits of a Text's Author: a Corpus-Based Approach
Tatiana Litvinova, Olga Zagorovskaya, Olga Litvinova and Pavel Seredin
P8: Unsupervised trained functional discourse parser for e-learning materials scaffolding
Varvara Krayvanova and Svetlana Duka
P9: Low Inter-Annotator Agreement in Sentence Boundary Detection and Personality
Anton Stepikhov and Anastassia Loukina
P10: Modeling Imperative Utterances in Russian Spoken Dialogue: Verb-Central Quantitative Approach
Olga Blinova
P11: An Exploratory Study on Sociolinguistic Variation of Spoken Russian
Natalia Bogdanova-Beglarian, Tatiana Sherstinova, Olga Blinova and Gregory Martynenko
P12: Speech Acts Annotation of Everyday Conversations in the ORD corpus of Spoken Russian
Tatiana Sherstinova
P13: Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer
Milan Sečujski, Branislav Gerazov, Tamás Gábor Csapó, Vlado Delić, Philip Garner, Aleksandar Gjoreski, David Guennec, Zoran Ivanovski, Aleksandar Melov, Géza Németh, Ana Stojković and György Szaszák
P14: Sociolinguistic Extension of the ORD Corpus of Russian Everyday Speech
Natalia Bogdanova-Beglarian, Tatiana Sherstinova, Olga Blinova, Olga Ermolova, Ekaterina Baeva, Gregory Martynenko and Anastasia Ryko
P15: Detecting state of aggression in sentences using CNN
Denis Gordeev
P16: Tonal Specification of Perceptually Prominent Non-Nuclear Pitch Accents in Russian
Nina Volskaya and Tatiana Kachkovskaia
P17: Lexical Stress in Punjabi and its Representation in PLS
Swaran Lata, Swati Arora and Simerjeet Kaur
P18: Comparative analysis of classifiers for automatic language recognition in spontaneous speech
Konstantin Simonchik, Sergey Novoselov and Galina Lavrentyeva
P19: Semi-automatic Speaker Verification System Based on Analysis of Formant, Durational and Pitch Characteristics
Elena Bulgakova and Aleksei Sholohov
P20: Scores Calibration in Speaker Recognition Systems
Andrey Shulipa, Sergey Novoselov and Yuri Matveev
P21: Speech Features Evaluation for Small Set Automatic Speaker Verification Using GMM-UBM System
Ivan Rakhmanenko and Roman Meshcheryakov
P22: Approaches for Out-of-Domain Adaptation to Improve Speaker Recognition Performance Andrey Shulipa, Sergey Novoselov and Aleksandr Melnikov
P23: Prosody Analysis of Malay Language Storytelling Corpus
Izzad Ramli, Noraini Seman, Norizah Ardi and Nursuriati Jamil
P24: Finding speaker position under difficult acoustic conditions
Evgeniy Shuranov, Alexander Lavrentyev, Alexey Kozlyaev and Valeriya Volkovaya
P25: Scenarios of Multimodal Information Navigation Services for Users in Cyberphysical Environment
Irina Vatamaniuk, Dmitriy Levonevskiy, Anton Saveliev and Alexander Denisov
16:00-16:30 Coffee break
16:30-18:30 Speaker and language recognition
Chair: Iosif Mporas, University of Hertfordshire, UK
16:30-16:50 Investigation of Segmentation in i-Vector based Speaker Diarization of Telephone Speech
Zbynek Zajic, Marie Kunesova and Vlasta Radova
16:50-17:10 Improving Robustness of Speaker Verification by Fusion of Prompted Text-Dependent & Text- Independent Operation Modalities
Iosif Mporas, Saeid Safavi and Reza Sotudeh
17:10-17:30 Convolutional Neural Network in the Task of Speaker Change Detection
Marek Hruz and Marie Kunesova
17:30-17:50 Online Biometric Identification With Face Analysis in Web Applications
Gerasimos Arvanitis, Konstantinos Moustakas and Nikos Fakotakis
17:50-18:10 Language Identification using Time Delay Neural Network D-Vector on Short Utterances
Maxim Tkachenko, Alexander Yamshinin, Nikolay Luibimov, Mikhail Kotov and Marina Nastasenko
18:10-18:30 On Individual Polyinformativity of Speech and Voice Regarding Speaker's Auditive Attribution (Forensic Phonetic Aspect)
Rodmonga Potapova and Vsevolod Potapov
18:30-18:40 Closing ceremony
Saturday, August, 27th
09:00-15:00 Budapest tour
Recent news
PicturesPictures are available at the Gallery | |
Specom History presentation | |
Program Guideclick on the picture |