Towards unsupervised metrics of child language competences

Folder structure

All source code is located in code/
All datasets are located in datasets/ :
- childes_json_corpora/ contains a test corpus for each language. Each test corpus is a json file containing utterances produced by a given speaker from a given family at a given child age : {family : {age : {speaker : utterances} } }
- opensubtitles_corpora\ contains a train and development corpora for each language. Each corpus contains one utterance per line.

extra/ contains configuration files. Those are important :

languages_to_download_informations.yaml details all the information needed to construct the training and test data for each language. This file is organized as follows:

    language:
        1 - identifier for the espeak backend
                
        2 - full language name
                
        3 - Speakers to consider when creating the CHILDES test corpus (in our case, adults=[Mother, Father], and child=[Target_Child])
                
        4 - Whether to extract the orthography tier or not
                
        5 - The urls of the selected corpora for this language

markers.json

README.md 1.3 KB تاريخچه Download

Towards unsupervised metrics of child language competences

Folder structure

README.md 1.3 KB

تاريخچه Download