Towards unsupervised metrics of child language development. Yaya Sy

yaya-sy 6f895387c4 updata readme		1 рік тому
.datalad	26f6058844 [DATALAD] new dataset	2 роки тому
code	f77b7928d2 add training and testing scripts	1 рік тому
datasets	f77b7928d2 add training and testing scripts	1 рік тому
extra	f77b7928d2 add training and testing scripts	1 рік тому
results	f77b7928d2 add training and testing scripts	1 рік тому
.gitattributes	d0fa2c1003 premieres configurations	1 рік тому
CHANGELOG.md	2702adb24e Apply YODA dataset setup	2 роки тому
README.md	6f895387c4 updata readme	1 рік тому
commands_reproduction.txt	6e9ca5f21b re-downloaded childes data	1 рік тому
ter	f77b7928d2 add training and testing scripts	1 рік тому

Towards unsupervised metrics of child language competences

Folder structure

All source code is located in code/
All datasets are located in datasets/ :
- childes_json_corpora/ contains a test corpus for each language. Each test corpus is a json file containing utterances produced by a given speaker from a given family at a given child age : {family : {age : {speaker : utterances} } }
- opensubtitles_corpora\ contains a train and development corpora for each language. Each corpus contains one utterance per line.

extra/ contains configuration files. Those are important :

languages_to_download_informations.yaml details all the information needed to construct the training and test data for each language. This file is organized as follows:

` language:

    1 - identifier for the espeak backend

    2 - full language name

    3 - Speakers to consider when creating the CHILDES test corpus (in our case, adults=[Mother, Father], and child=[Target_Child])

    4 - Whether to extract the orthography tier or not

    5 - The urls of the selected corpora for this language`

markers.json

README.md

Towards unsupervised metrics of child language competences

Folder structure