暫無描述

Loann Peurey 7863851913 update lyon with vcm		1 年之前
.datalad	ed4b15d6d3 tests	3 年之前
alphen @ 6748764380	03c34dc7c7 +lena_speaker	3 年之前
bergelson @ 7d943f1a71	4286e84098 update bergelson png, recompute figures	1 年之前
code	4286e84098 update bergelson png, recompute figures	1 年之前
cougar @ 60f61a335e	9681e9a081 update subdatasets	1 年之前
documentation	4286e84098 update bergelson png, recompute figures	1 年之前
elo @ b2dcc7627b	03c34dc7c7 +lena_speaker	3 年之前
ganek @ 15088b17e6	03c34dc7c7 +lena_speaker	3 年之前
kalashnikova @ d343d90938	03c34dc7c7 +lena_speaker	3 年之前
kidd @ 8c2ad48a82	03c34dc7c7 +lena_speaker	3 年之前
lucid @ 25f638b485	82d5235159 update datasets with vtc_no_overlap	1 年之前
lyon @ 66d74e5489	7863851913 update lyon with vcm	1 年之前
png2019 @ 20a93d2fd8	9681e9a081 update subdatasets	1 年之前
rague @ 44f44eb072	82d5235159 update datasets with vtc_no_overlap	1 年之前
ramirez-esparza @ 6d93d6e1b1	82d5235159 update datasets with vtc_no_overlap	1 年之前
senegal @ 80d25ebd8e	03c34dc7c7 +lena_speaker	3 年之前
swedish @ 6d5936c370	03c34dc7c7 +lena_speaker	3 年之前
tests	ed4b15d6d3 tests	3 年之前
tsimane2017 @ 555ce3a80e	9681e9a081 update subdatasets	1 年之前
warlaumont @ d7dfe79197	82d5235159 update datasets with vtc_no_overlap	1 年之前
weisleder @ da63e724b2	6c881bb94c add weisleder	2 年之前
winnipeg @ 2d3310abe9	82d5235159 update datasets with vtc_no_overlap	1 年之前
.gitattributes	4aee88e6c0 text2git conf	3 年之前
.gitignore	4aee88e6c0 text2git conf	3 年之前
.gitmodules	6c881bb94c add weisleder	2 年之前
README.md	24305d9ec7 update reamde, update lyon	2 年之前

EL1000

Requesting access to the data
Gaining access to the data
Re-using EL1000 datasets
Data description
Derived datasets
Maintainers
- The EL1000 package
- How to import new datasets

Requesting access to the data

The procedure to request access to the data can be found here.

Gaining access to the data

Once your project has been approved, the technical advisor will ensure your access to the data sets. Please note that you may not have been allowed access to all of the corpora, either because data donors declined, or because you are not a Homebank member.

Data (including .its and metadata) have been formatted using the ChildProject package; for an overview of the formatting and structure, see this introduction. We strongly encourage you to build on this (i.e., do not move data around, do not make other copies), which will allow you to maintain compatibility with others and increase reproducibility. For an example of how to set up an analysis that relates to data sets like this one, see this example or this one.

To access the data, you'll need to:

Create an account on https://gin.g-node.org/user/sign_up
Give your username to the technical advisor
Follow the instructions to install the ChildProject package and DataLad
Wait until you have received confirmation from the technical advisor, that you now have access. Then, follow instructions below.

Re-using EL1000 datasets

Requirements

You will first need to install the ChildProject package as well as DataLad. Instructions to install these packages can be found here.

Configuring your SSH key on GIN

This step should only be done once for all.

Copy your SSH public key to your clipboard (usually located in ~/.ssh/id_rsa.pub). If you don't have one, please create one following these instructions.
In your browser, go to GIN > Your parameters > SSH keys.
Click on the blue "Add a key" button, then paste the content of your public key in the Content field, and submit.

Your key should now appear in your list of SSH keys - you can add as many as necessary.

Installing datasets

First, clone the EL1000 superdataset:

datalad install -r git@gin.g-node.org:/LAAC-LSCP/EL1000.git
cd EL1000

To get data from any of the EL1000 datasets (e.g.: kidd), cd into it, then run the setup script.

cd kidd
datalad run-procedure setup

If you would like to claim access to the confidential files as well, do the following instead (notice the --confidential flag):

cd kidd
datalad run-procedure setup --confidential

Note: you may not have been allowed access to all of the corpora, either because data donors declined, or because you are not a Homebank member. If you think you should have access to more corpora, please get in touch with the technical advisor.

Getting data

You can get data from a dataset using the datalad get command, e.g.:

datalad get annotations # get all files under annotations/

Or:

datalad get . # get everything

You can download many files in parallel using the -J or --jobs parameters:

datalad get . -J 4 # get everything, with 4 parallel transfers

For more help with using DataLad, please refer to our cheatsheet or DataLad's own cheatsheet. If this is not enough, check DataLad's documentation and Handbook.

Fetching updates

If you are notified of changes to the data, please retrieve them by issuing the following commands:

datalad update --merge
datalad get .

Removing the data

It is important that you delete the data once your project is complete. This can be done with datalad remove:

datalad remove -r path/to/your/dataset

Data description

Data documentation

Datasets are structured according to the ChildProject package standards detailed here.

Participants

The matrix of how many children are exposed to language X in corpus Y can be found in documentation/languages.csv.

Available annotations

Derived datasets

metrics: metrics derived from ACLEW and LENA annotations.
reliability: reliability estimations for ACLEW and LENA annotations based on manual annotations.

Maintainers

The EL1000 package

In order to maintain EL1k datasets (e.g. to export metadata from .its annotations, or to import annotations), the EL1000 package is needed. It can be installed with pip with the following command:

pip install git+ssh://git@gin.g-node.org:/LAAC-LSCP/tools.git --upgrade

How to import new datasets

Instructions to import new datasets can be found here.

README.md