RESEARCH
New Page 1
 
  Machine Translation  
  Speech Technology  
  OCR  
  Digital Library, Corpora & Content Creation  
 

Linguistic Tools & Resources

 
 

Web Technologies

 
 

Publications

 
 
 
 Ongoing  Projects
 SNLP Team
Our Associations
Proposed Events

Conference Organized

1) HMI Workshop
2) ASR Workshop
3) ICSLT 2004
4) O-COCOSDA 2004
5) iSTRANS 2004
6)  iSTEPS 2004
7) W3C
8) Internationalization & Localization
9) Mobile Web Initiative
10) Digital Library
 

 Home>SNLP>Speech Technology>Speech corpora for Indian Languages

Speech corpora for Indian Languages (Hindi, Punjabi & Marathi)

  • Multi form phonetic data units

  • Syllable,

  • Most frequent words,

  • Most frequent conjunct words,

  • Vocabulary of digits, time, day, months, year, units

  • Sentences of digits, time, day, months, year, units

  • Phonetically Rich sentences

  •  Prosody Rich Sentences

  • Domain Specific Text

  • News Text

  • Recording in noise free and echo cancelled studio conditions

  • Recording by professional speakers (Male & Female) to maintain constant pitch and prevent stress phenomenon.

  • Speech samples recorded at a sampling rate of 44.1khz (16 bit) in stereo mode

  • Annotation of Speech units in a hierarchical manner, comprising of sentence, word, syllable

  • Structural Storage of Corpora for ease in accessing

  • Meta data for Speaker profile & Recording information

  • User friendly interface for Speech Corpora view

Updated on 23 November 2010

New Page 2

© 2008 C-DAC. All rights reserved | For information: webmaster@cdacnoida.in | Legal Notices ::Privacy Policy