# Child Project importation ## Pre-processing & metadata WAVE files were resampled to 16kHz and stored in `recordings/raw`. Original files (22.05kHz) are stored in `recordings/original`. Resamping was done with the following command: ```bash for f in $( find ./original -type f -name "*.wav" ); do ffmpeg -i "$f" -ac 1 -ar 16000 "${f/original/raw}" ; done ``` Child's DOB was found on the [TalkBank page presenting the data set](https://childes.talkbank.org/access/Eng-UK/Thomas.html). ## CHA `@Date` extraction Date for all CHA files was extracted using the following snippet: ```bash find . -type f -name *.cha -exec sh -c "echo -n {}' '; cat {} | grep Date | sed -e 's/^@Date://' | tr -d '\n'; echo" \; ``` Start time was missing and set to NA for all recordings. ## Annotations Original CHA annotations were imported using [*../scripts/import_annotations.py*](../scripts/import_annotations.py). ## Duration Duration of each file was computed using ChildProject's command line tool ```bash child-project compute-durations . ``` ## Missing files Audio files: - `020225.wav` (only the transcription file `020225.cha` is available) - `030904.wav` (only the transcription file `030904.cha` is available)