notes.md 1.2 KB

Child Project importation

Pre-processing & metadata

WAVE files were resampled to 16kHz and stored in recordings/raw. Original files (22.05kHz) are stored in recordings/original. Resamping was done with the following command:

for f in $( find ./original -type f -name "*.wav" ); do ffmpeg -i "$f" -ac 1 -ar 16000 "${f/original/raw}" ; done

Child's DOB was found on the TalkBank page presenting the data set.

CHA @Date extraction

Date for all CHA files was extracted using the following snippet:

find . -type f -name *.cha -exec sh -c "echo -n {}' '; cat {} | grep Date | sed -e 's/^@Date://' | tr -d '\n'; echo" \;

Start time was missing and set to NA for all recordings.

Annotations

Original CHA annotations were imported using ../scripts/import_annotations.py.

Duration

Duration of each file was computed using ChildProject's command line tool

child-project compute-durations .

Missing files

Audio files:

  • 020225.wav (only the transcription file 020225.cha is available)
  • 030904.wav (only the transcription file 030904.cha is available)