notes.md 1.0 KB

Child Project importation

Pre-processing & metadata

No specific pre-processing was done when converting this data set to Child Project (all audio files were already in WAVE format).

Child's DOB was found on the TalkBank page presenting the data set.

CHA @Date extraction

Date for all CHA files was extracted using the following snippet:

find . -type f -name *.cha -exec sh -c "echo -n {}' '; cat {} | grep Date | sed -e 's/^@Date://' | tr -d '\n'; echo" \;

Start time was missing and set to NA for all recordings.

Annotations

Original CHA annotations were imported using ../scripts/import_annotations.py.

Duration

Duration of each file was computed using ChildProject's command line tool

child-project compute-durations .

Missing files

Audio files:

  • 020225.wav (only the transcription file 020225.cha is available)
  • 030904.wav (only the transcription file 030904.cha is available)