No specific pre-processing was done when converting this data set to Child Project (all audio files were already in WAVE format).
Child's DOB was found on the TalkBank page presenting the data set.
@Date
extractionDate for all CHA files was extracted using the following snippet:
find . -type f -name *.cha -exec sh -c "echo -n {}' '; cat {} | grep Date | sed -e 's/^@Date://' | tr -d '\n'; echo" \;
Start time was missing and set to NA for all recordings.
Original CHA annotations were imported using ../scripts/import_annotations.py.
Duration of each file was computed using ChildProject's command line tool
child-project compute-durations .
Audio files:
020225.wav
(only the transcription file 020225.cha
is available)030904.wav
(only the transcription file 030904.cha
is available)