語音實驗室 - Phonetics Lab Homepage

 


語音實驗室 - Phonetics Lab Homepage

 

The major differences of my research from most phoneticians are the topics and methods. My collaboration with speech scientists and engineers dates back to 1982, which has led me away from studying limited samples and numbers of speakers toward multiple speakers, larger chunks of more realistic speech, and larger quantities of data (though modest by speech technology community standards). My research has integrated techniques from engineering and speech technology into acoustic phonetic experimental studies. My investigation of Mandarin Chinese fluent speech prosody beyond the sentence level begins from a macro/top-down perspective, taking intonation units larger than the phrase or sentence into consideration, has resulted in the emergence of what I believe to be the defining feature of fluent speech prosody: systematic cross-phrase prosodic association, which constitutes prosodic context. This approach contrasts with analyses of discourse intonation based on patterns of individual phrase intonation. Using quantitative evidence, I have developed a hierarchical prosodic framework, which models the formation of spoken discourse prosody as the accumulation of multi-layered prosodic contributions. I have also been able to tease apart the contributions to cross-phrase prosodic association made by each layer of the prosodic hierarchy for a range of acoustic parameters for which, interestingly, the contributions made by supra-segmental acoustic correlates have been found to vary. As of 2008, I have also begun phonetic comparisons of L1 and L2 English (with a focus on prosody) as a member of AESOP (Asian English Speech cOrpus Project).

 

 

1999至今,語音實驗室召集人帶領研究團隊開發語料庫語音學研究方法,設計語音資料庫,收集大批語料,捨棄個別語料觀察,率先開發了研究課題導向語料庫語音學的研究方法,從語料庫語言學探討超音段語音訊息。復以計算語言學方法驗證,建立口語語篇的韻律模型,提出「階層式多短語語流韻律HPG架構」(Hierarchical Prosodic Phrase Grouping) (Tseng et al2004; 2005) ,走出傳統語音學著重「質」的觀察限制,重視語料「量」的計算。在實驗語音學方面,專攻口語韻律中的上層結構,把語音學帶出一向只重語音細節、只觀察極小的語音單位,卻無力處理大範圍語音單位,以致研究結果多為見樹不見林的不足困境。在語料庫語言學方面,創新語音資料庫的設計、語料收集的特性、標註系統的開發、標註結果一致性的取得等工具庫的設計。在計算語言學方面,採用語料庫研究,勢必要做大規模的量化處理,因此也將實驗語音學的研究,從觀察聲波、採用套裝統計軟體的方式,提升到配合理論架構、調整量化方式的層次。

 

主要成果見於下列網站:

鄭秋豫研究員個人網頁 http://www.ling.sinica.edu.tw/v3-3-1.asp-auserid=22.htm

語料庫 Database

1. 中央研究院口語韻律語料庫暨工具平台 (COSPRO & Toolkit)

COSPRO & Toolkit係中研院語言所特聘研究員鄭秋豫從事語流韻律研究,於1994至2005年收集之國語連續語流語料,及依研究需要所發展的工具平台,可供語音研究、語音合成與語者辨識等多方面應用。
COSPRO包含9個子語料庫,每個子語料庫針對不同的語流韻律現象設計而成:COSPRO 01-08為麥克風朗讀語音,COSPRO 09則為麥克風自發性語音。內容包括不同長度的語料:孤立詞組(1至4字詞)、孤立句(含直述、驚嘆、疑問句)、無意義字串隨機排列句(“Word Salad”),及段落語篇(85至996音節)。

本資料庫共10.5GB,約132小時,共有114人次口語資料(53男61女)。其中7.7 GB的語料已經過處理,並附說明,釋出wav檔案、每位語者的朗讀(轉寫)文本(*.txt)、人工調整音標檔(*.adjusted / *.syl),以及停延韻律標記檔(*.break);其餘未經處理之原始語料,則釋出wav檔案、語者的朗讀(轉寫)文本(*.txt),以及程式處理過後的音標檔(*.phn)。

COSPRO與其他語料庫最大的差異在於:包含 (1)人工調整音標檔(*.adjusted / *.syl):不只是HTK處理過的音段標註檔案(*.phn)。處理完成之語料均以人工方式對齊語音音段邊界,標註子音與母音的時間碼。(2)停延韻律標記檔(*.break):經過訓練之標音員以聽感為基礎標註韻律,並通過標註一致性檢驗。人工感知韻律標註的主要意義在於:以本語料庫所提供的韻律標記做為語音信號分析的標準答案,而非得自文本分析結果,是符合語音事實的韻律單位,目的是突顯語音與文本不完全匹配的事實。

COSPRO Toolkit則為一視窗介面,易操作的語音分析暨合成之工具平台,集合了Adobe Audition、Praat及Speech Viewer等常見語音分析(合成)軟體之特點,其主要功能包括:聲學訊號分析功能、標記口語語流功能以及重新合成語音訊號功能,特別適合作為教學工具。

本資料庫之智慧財產權屬中央研究院,基於學術資源共享之理念、促進語音科學研究與技術能有突破性發展之初衷,授權予中華民國計算語言學學會發行,供國內學術或民間機構使用。申請人需向中華民國計算語言學學會提出申請,簽妥授權使用協議書,並同意確實遵守協議書上之約定條款。

如欲申請使用,請點選此連結:計算語言學會 口語韻律語料庫暨工具平台庫

1. Sinica COSPRO & Toolkit

The Sinica COSPRO (Mandarin Continuous Speech Prosody Corpora) and Toolkit is designed, collected and annotated by Dr. Chiu-yu Tseng and her research group at the Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan (1994-2005) for development in phonetic research, speech synthesis and recognition.
The corpora include 9 subsets consisting both read and spontaneous speech data by a total of 114 native speakers of Mandarin (53M, 61F). It is 10.5 GB in total, featuring approximately 132 hours of sound files. The reading text is designed in terms of various prosodic phenomena, including word sequences (1-4 word), sentences (declaration, exclamation, interrogation), sentences consisting of random words (“Word Salad”), and paragraphs (85-996 syllables).

7.7 GB of the database has been annotated, including (1) wav files, (2) transcription of each speaker (*.txt), (3) human-labeled segmental boundaries (*adjusted / *.syl), and (4) human-labeled prosodic boundaries (*.break). The remaining part includes (1) wav files, (2) transcription of each speaker (*.txt), and (3) segmental boundaries auto-labeled by the HTK toolkit (*phn).

Except for HTK force aligned segments, all annotation was perception based and manually tagged by trained transcribers. Both intra- and inter-transcriber consistencies are around 90%. As a result, the tagging provides perceived speech units independent of syntactic structures and semantic relationship.

The COSPRO Toolkit is a Window-based and user-friendly speech analysis software and interface. It integrates commonly accessible speech analysis software, such as Adobe Audition, Praat, and Speech Viewer, into one common platform, and consists of three major functions: (1) performing acoustic analysis, (2) labeling continuous fluent speech and (3) re-synthesizing speech signals.

The intellectual property of the corpora belongs to Academia Sinica, and is therefore under specifications by the Department of Intellectual Property and Technology Transfer, Academia Sinica. The database is available by signing the license agreement and complying with the terms on the license agreement at the Association of Computational Linguistics and Chinese Language Processing (ACLCLP).

To apply for access, please go to:ACLCLP COSPRO & Toolkit

2. AESOP-ILAS (Asian English Speech cOrpus Project - Institute of Linguistics, Academia Sinica) 亞洲口音英語跨國語音資料庫—中研院語言所台灣二語英語語料庫

AESOP-ILAS語料庫為「亞洲口音英語跨國語音資料庫AESOP(Asian English Speech cOrpus Project)國際聯盟」的台灣二語英語部分,ILAS為中研院語言所英文名稱Institute of Linguistics, Academia Sinica之縮寫。建置台灣二語英語語料庫之工作由蔣經國國際學術交流基金會同名專題計畫資助,計畫編號DB002-D-08,執行期間民98.07.01∼101.12.31,計畫主持人為中研院語言所特聘研究員兼所長鄭秋豫。
參加AESOP國際聯盟的研究團隊包括日本、香港、台灣、中國、泰國、韓國、印尼及印度。各國團隊同意各自錄製相同的核心語料以便日後交流比較,並視各團隊研究需求各自增加語料。AESOP國際聯盟由日本早稻田大學Yoshinori Sagisaka教授於2008年發起並擔任召集人至2013年,2014年起召集人改由由早大Mariko Kondo教授接任。核心語料由中研院語言所特聘研究員兼所長鄭秋豫及陽明大學助理教授魏廷雅Tanya Visceglia負責設計。

本語料庫分二部分:AESOP-ILAS 1之內容為AESOP國際語音聯盟之核心語料,AESOP-ILAS 2為中研院語言所語音實驗室負責人鄭秋豫以口語韻律研究為中心所設計之語料。語料庫之特點不以音段或某特定或個別語音特徵為目標,而強調音節、詞組、片語、短句與多短語語段等各級不同大小的口語層次及單位,以便獲取較全面性、具溝通訊息的韻律現象及語音特徵,進而提供更豐富的語音分析與評量指標。

本語料庫共13.9 GB,約812小時。分為二部分:AESOP-ILAS 1及AESOP-ILAS 2。

AESOP-ILAS 1共8.58GB,約500小時,共有500人次口語資料:計12位美式英語母語者(6男6女),488位台灣國語母語者(231男257女),每位錄音時間約1小時。其中台灣國語母語者學習英文年數最短2年,最長22年,平均10.5年。語料內容包括:朗讀目標詞(Target Word)嵌入短句文本、朗讀短語篇(“The North Wind and the Sun”)、誘發性半自發人機對話,及看圖描述之自發性語篇。

AESOP-ILAS 2共5.32GB,約312小時,共有40人次口語資料:計10位美式英語母語者(5男5女),每位錄音時間約5.25小時;30位台灣國語母語者(15男15女),每位錄音時間約8.7小時。其中台灣國語母語學習英文年數最短7年,最長30年,平均15.3年。語料內容包括:朗讀單詞(自CMU電子辭典高頻詞選取,共5400詞)、朗讀寬/窄焦點短句(獲AESOP中國團隊中國社科院語言所AESOP-CASS同意選取其設計之焦點語料)、朗讀長語篇(“The Cinderella Fairy Tale”)、朗讀中文短語篇(「北風與太陽」),及特定情境設計誘發性對話(Discourse Completion Tasks, DCT:原始文本為早稻田大學Michiko Nakano教授研究團隊所設計,文本中部份情境或內容係依台灣地區的文化民情與中文特性適度修改而成)。

基於學術資源公開共享之理念,於2015年4月釋出AESOP-ILAS語料庫,提供國內外學術研究單位使用,可供英語教學、語音研究、語音建模、語音辨識與合成等多方面學術應用。

本資料庫之智慧財產權屬中央研究院,授權中華民國計算語言學會發行,供國內學術或民間機構非營利使用。申請人需向中華民國計算語言學學會提出申請,簽妥授權使用協議書,並同意確實遵守協議書上之約定條款。

如欲申請非營利使用,請點選此連結:計算語言學會 AESOP-ILAS語料庫;企業商品開發使用,請洽中央研究院公共事務組(聯絡電話:02-2787-2509)。

2. AESOP-ILAS (Asian English Speech cOrpus Project - Institute of Linguistics, Academia Sinica) Corpora

The AESOP-ILAS Corpora are outcome of the Taiwan research team of the multinational consortium AESOP (Asian English Speech Corpus Project). Initiated by Professor Yoshinori Sagisaka of Waseda University in 2008, the consortium aims at collecting common speech data towards better understanding of L2 English features that are common to Asian English in general as well as specific to each participating Asian country in particular.
Participating research teams are committed to (1) record a common set of core data designed by Dr. Chiu-yu Tseng of Academia Sinica and Dr. Tanya Visceglia of National Yangming University and (2) follow the same recording protocols designed by Professor Helen Meng of Hong Kong Chinese University. The AESOP research team consists of research teams from Japan, Hong Kong, Taiwan, China, Thailand, Indonesia and India. Participating members assemble annually under Convener Professor Sagisaka (2008-2013) and Convener Professor Mariko Kondo of Waseda University since 2014 to update data collection and share research findings.

The AESOP Taiwan team is led by Dr. Chiu-yu Tseng, Distinguished Research Fellow and Director of ILAS (the Institute of Linguistics, Academia Sinica), and features L2 English speech by native speakers of Taiwan Mandarin. The AESOP-ILAS research project was funded by the Chiang Ching-kuo Foundation for International Scholarly Exchange (DB002-D-08. 2009.7.1-2012.12.31.). The project mainly aims at investigating a wide range phonetic and prosodic features in Taiwan L2 English bearing communicative functions at the segmental, lexical, phrasal, and discourse levels, rather than focusing on specific and individual phenomena. The intellectual property of the corpora belongs to Academia Sinica, and is therefore under specifications by the Department of Intellectual Property and Technology Transfer, Academia Sinica.

The AESOP-ILAS Corpora are divided into two parts: AESOP-ILAS 1 featuring the AESOP core data and AESOP-ILAS 2 featuring speech data focusing on prosody properties specific to research projects led by Dr. Chiu-yu Tseng, PI of the Phonetics Lab, ILAS.

The AESOP-ILAS Corpora are 13.9 GB in total, containing approximately 812 hours of sound files. AESOP-ILAS 1 is 8.58 GB (500 hours), including L1 English speech data by 12 American English native speakers (6M, 6F) and L2 English speech by 488 Taiwan Mandarin speakers (231M, 257F). The recording time of each speaker is approximately 1 hour. Years of L2 speaker’s English training range from 2 to 22 (average 10.5 years). The data content consists of 8 recorded tasks: 6 elicited read speech tasks including reading The North Wind and the Sun passage, 1 fully aided computer-prompted dialogue task, and 1 partially aided picture description task.

AESOP-ILAS 2 is 5.32 GB (312 hours), including L1 English speech data by 10 American English speakers (5M, 5F) and L2 English speech data by 30 Taiwan Mandarin speakers (15M, 15F). The recording time of each L1 speaker is approximately 5.25 hours and 8.7 hours for each L2 speaker. Years of L2 speaker’s English training range from 7 to 30 (average 15.3 years). The data content consists of 5 recorded tasks: 4 elicited read speech tasks (including readings of approximately 5400 high frequency words from the CMU Electronic Dictionary, elicited broad/narrow focus sentences designed by the AESOP-CASS (Chinese Academy of Social Sciences), The Cinderella Passage, and one Taiwan Mandarin task) and 1 fully aided computer-prompted Discourse Completion Task (DCT) modified from the Waseda dataset.

The AESOP-ILAS Corpora were released in April, 2015 through ACLCLP (Association of Computational Linguistics and Chinese Language Processing) for use of non-commercial academic research only. The Corpora should be useful for research and development in language teaching, language modeling, phonetic research and applications to speech synthesis and recognition.

To apply for non-commercial use, please click the following link: ACLCLP AESOP-ILAS Corpora;For commercial applications, please contact Department of Intellectual Property and Technology Transfer, Academia Sinica.(Tel: +886-2-2787-2509)。

地址 Address

台北市南港區研究院路2段128號 語言學研究所語音實驗室
Institute of Linguistics, Academia Sinica, No. 128, Sec. 2, Academia Road, Nankang, Taipei 11529 Taiwan R.O.C.

聯繫 Contact

電話(Tel): (886) 02 - 26525000 ext 6143
傳真(Fax): (886) 02 - 26525048

Valid XHTML 1.0!

Valid CSS!