|
@@ -17,9 +17,11 @@
|
|
|
|
|
|
The present repository showcases the organization
|
|
|
of a Zooniverse campaign using ChildProject and DataLad.
|
|
|
+[Zooniverse](https://www.zooniverse.org/about) is a crowd-sourcing platform
|
|
|
+that may be used for large-scale annotation tasks. The [ExElang book](https://laac-lscp.github.io/exelang-book/humanannotation.html) provides examples of research goals for which Zooniverse may be useful.
|
|
|
|
|
|
-This campaign requires citizens to listen to 500 ms audio clips
|
|
|
-and to perform the following tasks:
|
|
|
+The present campaign requires citizens to listen to 500 ms audio clips
|
|
|
+and then to perform the following tasks:
|
|
|
|
|
|
1. Decide whether they hear speech from either a Baby, a Child, an Adolescent, an Adult, or no speech.
|
|
|
2. Guess the gender of Adolescent or Adult speakers.
|
|
@@ -27,20 +29,21 @@ and to perform the following tasks:
|
|
|
|
|
|
### Workflow
|
|
|
|
|
|
-1. We used [DataLad](https://joss.theoj.org/papers/10.21105/joss.03262) to manage this campaign.
|
|
|
+1. We used [DataLad](https://joss.theoj.org/papers/10.21105/joss.03262) to manage this campaign. (See the installation instructions [here](http://handbook.datalad.org/en/latest/intro/installation.html#install).)
|
|
|
2. The primary dataset (containing the audio and the metadata)
|
|
|
-was included in this repository as a subdataset. It was structured according to ChildProject's standards.
|
|
|
-3. [ChildProject](https://childproject.readthedocs.io/en/latest/) was used to generate the samples, to upload the audio chunks to zooniverse, and to retrieve the classifications.
|
|
|
+was included in this repository as a subdataset. It was structured according to [ChildProject's format](https://childproject.readthedocs.io/en/latest/format.html).
|
|
|
+1. [ChildProject](https://childproject.readthedocs.io/en/latest/) was used to generate the samples, to upload the audio chunks to zooniverse, and to retrieve the classifications. (See the installation instructions [here](https://childproject.readthedocs.io/en/latest/install.html).)
|
|
|
|
|
|
This repository contains all the scripts that we used to implement this workflow.
|
|
|
You are welcome to re-use this code and adapt it to your needs.
|
|
|
|
|
|
+
|
|
|
### Repository structure
|
|
|
|
|
|
- `annotations` contains annotations built from the classifications retrieved from Zooniverse.
|
|
|
- `classifications` contains the classifications retrieved from Zooniverse.
|
|
|
- `samples` contains the samples that were selected as well as the chunks generated from them.
|
|
|
- - `vandam-data` is a subdataset containing VanDam Daylong corpus, structed according to ChildProject's standards.
|
|
|
+ - `vandam-data` is a subdataset containing VanDam Daylong corpus, structed according to [ChildProject's format](https://childproject.readthedocs.io/en/latest/format.html) — this is very important as it allows to use all the features of ChildProject that will be used next.
|
|
|
|
|
|
## Preparing samples
|
|
|
|
|
@@ -90,8 +93,9 @@ See [ChildProject's documentation](https://childproject.readthedocs.io/en/latest
|
|
|
Once the chunks have been extracted, the next step is to upload them to Zooniverse.
|
|
|
Note that due to quotas, it is recommended to upload only a few at time (e.g. 1000 per day).
|
|
|
|
|
|
-You will need to provide the numerical id of your Zooniverse project;
|
|
|
-you will also need to set Zooniverse credentials as environment variables:
|
|
|
+You will need to provide the numerical id of your Zooniverse project. Instructions to create a Zooniverse project are available [here](https://help.zooniverse.org/getting-started/).
|
|
|
+
|
|
|
+You will also need to set Zooniverse credentials as environment variables:
|
|
|
|
|
|
```bash
|
|
|
export ZOONIVERSE_LOGIN=""
|