This is a child project version of the corpus Tsay: https://phonbank.talkbank.org/access/Chinese/Taiwanese/Tsay.html
The corpus Tsay was not downloaded as it is presented on the website, because some recordings or information in the annotations are missing. Missing recordings: 'CEY_020226.wav', 'HBL_031102.wav', 'HBL_040003.wav', 'LMC_030009.wav', 'LMC_030021.wav', 'LMC_030115.wav, 'LMC_030205.wav', 'LMC_030226.wav', 'LMC_030324.wav', 'LMC_030412.wav', 'LMC_030503.wav', 'LMC_030525.wav', 'LMC_030614.wav', 'LMC_030707.wav', 'LMC_030804.wav', 'LMC_030825.wav', 'LMC_030925.wav', 'LMC_031004.wav', 'LMC_031025.wav', 'LMC_031122.wav', 'LMC_040015.wav', 'LMC_040101.wav', 'LMC_040115.wav', 'LMC_040129.wav', 'LMC_040212.wav', 'LMC_040226.wav', 'LMC_040310.wav', 'LMC_040323.wav', 'LMC_040419.wav', 'LMC_040505.wav', 'LMC_040607.wav', 'LMC_040802.wav', 'TWX_010805.wav', 'TWX_020901.wav'.
The time stamps are missing in the following annotations: CEY_022226.cha, HBL_031102.cha, HBL_040003.cha, all the files LMC, except LMC_050321.cha, LYC_020810.cha, LYC_021124.cha, TWX_010721.cha, TWX_010805.cha, TWX_020901.cha, all the files LJX, WZX, YCX, YJK, YDA, YSW, ZQM. All these files were not downloaded from the website of the corpus and not included into the LAAC_Tsay corpus.
After preprocessing, the corpus includes data from the children: CEY, HBL, HYS, LWJ, LYC, TWX and one recording of the child LMC, in total 490 recordings.
In order to recreate a childProject version of the Tsay corpus launch main.py (from scripts) with the following attributes: --corpus /path/to the directory where you would like to create the Datalad folder with your corpus \ -- url /link/to the corpus on the phonbank.talkbank.org
Then, you can validate your corpus with child-project and add recording durations with the following command: $ child-project compute-durations /path/to/dataset
Generate a dataframe for bulk importation of annotations by launching dataframe_for_ann_importation.py with argument: --corpus /path/to your datalad folder
and do the bulk importation: child-project import-annotations /path/to/dataset --annotations /path/to/dataframe.csv