Overview
The Speech Technologies group focuses on voice technologies and their use for advanced services and applications. We create solutions, standards, frameworks, and services that enhance the experience and capabilities offered to users and enterprises. In the embedded area, our activities focus on advanced voice-based and multimodal user interface technologies and on advanced speech-enabled services. We develop state-of-the-art technology for high quality, natural sounding embedded text-to-speech that can be used to deliver information, entertainment and convenience to mobile users and car drivers. For contact centers, we develop solutions and middleware for speech analytics, including transcription, audio search and retrieval, and emotion detection.
The group's expertise covers a wide spectrum of technologies for speech/audio coding and processing, speech recognition and synthesis, speech enhancement, multimedia processing, and web services and applications.
Activities
Publications
- R. Fernandez, Z. Kons, S. Shechtman, Z. Shuang, R. Hoory, B. Ramabhadran and Y. Qin, "The IBM Submission to the 2008 Text-to-Speech Blizzard Challenge", Blizzard Workshop, Sep. 2008, Brisbane Australia.
- S. Tiomkin and D. Malah, "Statistical Text-to-Speech Synthesis with Improved Dynamics", Interspeech, Sep. 2008, Brisbane, Australia, Sep. 2008.
- H. Aronowitz, "Online Vocabulary Adaptation Using Contextual Information and Information Retrieval," in Proc. Interspeech, Sep. 2008, Brisbane Australia.
- H. Aronowitz and Y. Solewicz , "Speaker Recognition in Two Wire Test Sessions," in Proc. Interspeech, Sep. 2008, Brisbane Australia.
- J. Mamou, B. Ramabhadran, "Phonetic Query Expansion for Spoken Document Retrieval", in Proc. Interspeech, Sep. 2008, Brisbane Australia.
- J. Mamou, Y. Mass, B. Ramabhadran, B. Sznajder, "Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search", Search in Spontaneous Conversational Speech Workshop, SIGIR 2008, Singapore
- A. Geven, M. Tscheligi, A. Sorin and H. Aronowitz, "Presenting a speech-based mobile reminder system", SiMPE 2008, Sept. 2008, Amsterdam, Netherlands.
- V. Mylonakis, J. Soldatos, A. Pnevmatikakis, L. Polymenakos, A. Sorin and H. Aronowitz, "Using Robust Audio and Video Processing Technologies to Alleviate the Elderly Cognitive Decline", PETRA 2008, July 2008, Athens, Greece.
- B. Sznajder, J. Mamou, Y. Mass, and M. Shmueli-Scheuer "Metric inverted - an efficient inverted indexing method for metric spaces" in Proc. Efficiency Issues in Information Retrieval Workshop, ECIR 2008
- W. Allasia, F. Falchi, F. Gallo, M. Kacimi, A. Kaplan, J. Mamou, Y. Mass and N. Orio, "Audio-visual content analysis in P2P networks: the SAPIR approach", 1st Workshop on Automated Information Extraction in Media Production, AIEMPro'08
- S. Chu, H. Kuo, L. Mangu, Y. Liu , S. Qin, Q. Shi, S. Zhang, H. Aronowitz, "Recent advances in the IBM GALE Mandarin transcription system", in Proc. ICASSP, Apr. 2008, Las Vegas, USA
- H. Aronowitz and D. Burshtein, "Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)", in IEEE Trans. on Audio, Speech & Language Processing, pp. 2033-2043, September 2007
- S. Shechtman, "Maximum Likelihood Dynamic Intonation Model for Concatenative Text-to-Speech Systems", in Proc. 6th ISCA Workshop on Speech Synthesis, Aug. 2007, Bonn, Germany
- J. Mamou, B. Ramabhadran, O. Siohan, "Vocabulary Independent Spoken Term Detection", in Proc. SIGIR, July 2007, Amsterdam, Netherlands
- J. Mamou, Y. Mass, M. Shmueli-Scheuer, B. Sznajder, "A Query Language for Multimedia Content", in Proc. SIGIR 2007 Multimedia workshop, July 2007, Amsterdam, Netherlands
- R. Hoory, Z. Kons and A. Sorin, "The future of text-to-speech on mobile clients", ACM workshop on Speech in Mobile and Pervasive Environments, Sep. 2006, Espoo, Finland.
- Z. Shuang, R. Bakis, S. Shechtman, D. Chazan and Y. Qin, "Frequency warping based on mapping formant parameters", in Proc. ICSLP, Sep. 2006, Pittsburgh PA, USA.
- S. Ben-David, A. Roytman, R. Hoory and Z. Sivan, "Using voice servers for speech analytics", International Conference on Digital Telecommunications (ICDT), Aug. 2006, Cap Esteral, France.
- J. Mamou, D. Carmel and R. Hoory, "Spoken document retrieval from call-center conversations", in Proc. SIGIR, Aug. 2006, Seattle WA, USA.
- D. Chazan, R. Hoory, A. Sagi, S. Shechtman, A. Sorin, Z. Shuang and R. Bakis, "High quality sinusoidal modeling of wideband speech for the purpose of speech synthesis and modification", in Proc. ICASSP, May 2006, Toulouse, France.
- G. Mishne, D. Carmel, A. Roytman and A. Soffer "Automatic analysis of call-center conversations", in Proc. 14th ACM international conference on Information and knowledge management (CIKM), Oct. 2005, Bremen, Germany.
- D. Chazan, R. Hoory, Z. Kons, A. Sagi, S. Shechtman and A. Sorin, "Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling", in Proc. Eurospeech, Sep. 2005, Lisbon, Portugal.
- S. Basson, A. Faisman, R. Hoory, D. Kanevsky, M. Picheny, A. Roytman, Z. Sivan and A. Sorin, "Accessibility, Speech Technology, and Human Interventions" AVIOS/SpeechTek 2005.
- A. Sorin, T. Ramabadran, D. Chazan, R Hoory, M. McLaughlin, D. Pearce, F. Wang, Y. Zhang, "The ETSI Extended Distributed Speech Recognition Standards: Client Side Processing and Tonal Language Recognition Evaluation", in Proc. ICASSP, May 2004, Motreal Canada.
- T. Ramabadran, A. Sorin, M. McLaughlin, D. Chazan, D. Pearce, R. Hoory, "The ETSI Extended Distributed Speech Recognition Standards: Server Side Speech Reconstruction", in Proc. ICASSP, May 2004, Motreal Canada.
- K. Y. Kupeev and Z. Sivan, "Selective Enhancement of Contrast Blocks for MPEG/JPEG Image Compression", Visual Communications and Image Processing (VCIP) 2003, Lugano, Switzerland, pp. 1382-1389.
- D. Chazan, R. Hoory, Z. Kons, D. Silberstein and A. Sorin, "Reducingthe footprint of the IBM trainable synthesis system", in Proc.7th Int. Conf. Spoken Language Processing, Sep. 2002, Denver USA ( ICSLP2002).
- K. Y. Kupeev and Z. Sivan, "New shape representation and similarity measure for fast shape indexing", Proceedings of SPIE,"Storage and Retrieval for Media Databases 2002", Vol. 4676, pp. 116-125,San Jose, USA, 2002.
- D. Cohen-Or, Y. Noimark and T. Zvi, "A Server-based Interactive Remote Walkthrough", proceedings of EGMM2001.
- D. Chazan, M. Zibulski, R. Hoory and G. Cohen, "Efficient periodicityextraction based on sine-wave representation and its application to pitch determination of speech signals", in proceedings of EUROSPEECH2001.
- K. Y. Kupeev and Z. Sivan, "An algorithm for efficient segmentation and selection of representative frames in video sequences", Proceedings of SPIE "Storage and Retrieval for Media Databases 2001", Vol. 4315, pp.253-261, Jan. 2001,San Jose USA.
- S. H. Maes, G. Cohen, R. Hoory and D. Chazan, "Conversational networking: conversational protocols for transport, coding and control", in Proc. 6th Int. Conf. Spoken Language Processing, Beijing China,Oct. 2000 (ICSLP-2000 ).
- D. Chazan, G. Cohen, R. Hoory and M. Zibulski, "Low bit rate speechcompression for playback in speech recognition systems", in proceedings of EUSIPCO,Sept. 2000.
- D. Chazan, G. Cohen, R. Hoory and M. Zibulski, "Speech reconstructionfrom mel-frequency cepstral coefficients and pitch frequency", in proceedings of ICASSP,June 2000.
- Z. Sivan, D. Chazan, G. Cohen, R. Hoory, A. Sorin, "Voice in Pervasive Devices - Serving both Human Listeners and Machine Recognizers", PvCC 2000, Yorktown Hights USA.
- A. Amir, D. Ponceleon, B. Blanchard, D. Petkovic, S. Srinivasan and G.Cohen, "Using Audio Time Scale Modification for Video Browsing", in collaboration with IBM Almaden , in Proceedings of HICSS2000. Received best paper award in the digital documents track.
- Z. Sivan, E. D. Karnin, D. Ramm and R. Cohen, "Performance of a Software-Only H.263 Video Encoder on the PowerPC processor" ,19th IEEE conventionin Israel, Jerusalem Israel, November 1996, pp. 395-398.
- R. Hoory, N. Shaked and D. Chazan, "Building a speech database for the purpose of speaker specific speech synthesis", In Proceedings of ICSP 1996, pp. 741--744.
- R. Hoory and D. Chazan, "Speech Synthesis for a specific speaker based on a labeled speech database", In Proceedings of ICPR 1994, pp. C146-148.
- Y. Medan, E. Yair and D. Chazan, "Super resolution pitch determination of speech signals", IEEE Trans. Acouts., Speech and Signal Processing, vol. 39, pp.40-48, Jan. 1991.
