The SAIL Lab Transcribers' Webpage


Contents

Introduction

Welcome to the home of the SAIL Lab's transcribers. This page is meant to make available transcription resources and data and to detail our progress, achievements, and research interests.

News

Aug 4

...as of Jun 6,

The Transcribers

Currently, we are two undergrads and one grad student (Daylen Riggs, Nathan Go, and Abe Kazemzadeh. We comprise a broad spectrum of backgrounds and research interests focused on the task of phonetic transcription of children's speech.

Transcription Tools

Transcriber.tcl

Currently, for broad phonetic transcriptions, we are using "transcriber.tcl". It is a simple program that will iterate through a directory of speech recordings and prompt for a transcription in a text box. With the use of "control"+p to replay the clip and "enter" to submit a transcription, the transcribing can be done quickly and allow maximum attention to be given listening and transcribing.

Installation

To use transcriber.tcl, you need three things: the program code itself, transcriber.tcl, the tcl interpreter, tclkit.exe (this is the windows version, but mac and linux ones should also...let me know if there are any problems), and a virtual filesystem with the snack library, transcriber.vfs (use winzip to unzip). Put them all in the same directory (I suggest creating a brand new directory/folder). That's all... but first you need data and to set your name in the program: Put the data in a subfolder, then open the program transcriber.tcl with a text editor like wordpad/notepad/emacs. You'll see the following code:

#!/bin/sh
# the next line restarts using wish \
exec wish "$0" "$@"

set transcriber nobody ;# set your name, Transcriber

set directory test ;#the location of the audio and transcription files

Change "nobody" to your name and "test" to the name of the folder where your data is.

Use

To use, right click on transcriber.tcl and select "Open with..." and "choose program" and browse to find tclkit.exe. You'll see the following window:

The count refers to the number of files transcribed since the program was started. The # of files per minute is an average gauge of speed. The play and submit buttons can be used with the mouse, but it is faster to use control-p and return. Please use quit instead of clicking the top right corner "x". For each wav file, the program will save a transcription file in the same directory and for each transcribing session, a log file containing all the transcriptions will be saved to a directory called "logs". If you make a mistake while transcribing, quit the program, delete the most resent transcription file (sort the files in the data folder by date).


Wavesurfer

Wavesurfer is a good all-purpose sound playback/recording/analysis tool. We will be using this for more detailed transcriptions where the segments are mapped to their beginning and end points. For sentences, these will generally be words, and for words, the segments will be phones.

Getting it

Wavesurfer is free. It can be downloaded from The Royal Institute of Technology in Stockholm. Its very easy to install. Just save the executable program somewhere convenient (Desktop, c:\Program Files, etc) and click to run it.

How we'll be using it

Since we'll be using wavesurfer to transcribe, when you open wavesurfer, choose the "HTK transcription format". It will either prompt you for this automatically, or you'll have to rightclick->apply configuration->htk transcription. You'll see the spectrogram of the wave, and a bar below (called a "Pane"), which is where you'll type in the transcription. Find the first (leftmost) boundary of a phone or word. Place the cursor there and type "sil". This labels the silence at the beginning. Then go to the end of the segment and click the cursor at that point in the transcription bar/pane. Type the phone/word that represents the segment. Continue until the end, then put another "sil" symbol at the end.

See Daylen's sentence transcriptions for an eg.

Data

Speech data can be found at this link. Right now we're doing sentences and the kindergarten wordlist (KWlist)

Resources

Here's an IPA-style chart with our transcription symbols instead of the familar ones.
Here's a zipped spreadsheet with links to examples and transcribing instructions
Here's a timesheet...be sure to turn it in punctually (Feb 8, 22, Mar 8, 22...)
Here's a dictionary that shows the transcription that have been given so far.
2/27/05: Here's the new data to work on for g1wordlist_16k.zip
My Homepage| Research| SAIL Lab| Ling. Dept.| IMSC| CS| SIPI| ICT| Viterbi School of Engineering
Last modified: Sun Feb 6 21:45:30 PST 2005