Kaynağa Gözat

Updated instructions

William N. Havard 1 yıl önce
ebeveyn
işleme
86713a0b37
1 değiştirilmiş dosya ile 117 ekleme ve 16 silme
  1. 117 16
      README.md

+ 117 - 16
README.md

@@ -16,7 +16,7 @@ mkdir scripts && cd scripts
 git submodule add git@gin.g-node.org:/LAAC-LSCP/URUMETRICS-CODE.git
 ```
 
-and install the necessary dependancies
+and install the necessary dependencies
 ```bash
 pip install -r requirements.txt
 ```
@@ -24,6 +24,7 @@ pip install -r requirements.txt
 ## Repository structure 
 
 * `acoustic_annotations` contains the code that computes the acoustic annotations
+* `turn_annnotations` contains the code that compute the conversational annotations
 * `import_data` contains the code that imports the new recordings and new annotations to the data set
 * `compute_metrics` contains the code that computes and save the metrics
 * `generate_messages` is the code that reads the metrics file and generates the messages sent to the families
@@ -32,7 +33,7 @@ pip install -r requirements.txt
 
 ### Running requirements
 
-All the runnable files **should be run from the root of the data set** (i.e. the directory containing the `script` directory). If not, an exception will be raised and the code will stop running.
+All the runnable files **should be run from the root of the data set** (i.e. the directory containing the `scripts` directory). If not, an exception will be raised and the code will stop running.
 
 ### Naming convention: recording file names
 
@@ -51,42 +52,141 @@ Additional information will be store in the metadata file `metadata/recordings.c
 
 ## How to use?
 
-The following instructions explain how to use this code when it is embedded (as a submodule) in a ChildProject project.
+The following instructions explain how to use this code when it is embedded (as a submodule) in a ChildProject project using datalad
 
-0. Prepare the data set by running the following command `python -u scripts/URUMETRICS-CODE/import_data/prepare_data_set.py`
+**(0)** Define the following bash variables
+
+```bash
+today=$(date '+%Y%m%d')
+dataset="URUMETRICS"
+```
+
+The content of `dataset` should be the name of the data set you are interested in and should exist in the [LAAC-LSCP GIN repository](https://gin.g-node.org/LAAC-LSCP).
+
+**(1)** Run the following commands to install the data set
+
+```bash
+datalad install -r git@gin.g-node.org:/LAAC-LSCP/${dataset}.git
+cd ${dataset}
+datalad get extra/messages/definition
+datalad get extra/metrics && datalad unlock extra/metrics
+datalad get metadata && datalad unlock metadata
+
+```
+
+**Note that you will only be allowed to clone and install the data set if you are added as a collaborator on GIN.** Please ask [William](mailto:william.havard@gmail.com) or [Alex](mailto:alecristia@gmail.com) for more information.
+
+**(2)** Prepare the data set by running the following command 
+
+```bash
+python -u scripts/URUMETRICS-CODE/import_data/prepare_data_set.py
+```
 
 This will create the necessary directories required by ChildProject if they do not exist.
 
-1. Place the recordings in `dat/in/recordings/raw` and run `python -u scripts/URUMETRICS-CODE/import_data/import_recordings.py --experiment Uruguayan_Chatbot_2022`
+**(3)** Place the new recordings in `dat/in/recordings/raw`
+
+**(4)** Place the VTC, VCM, and ALICE annotation files in their respective folder in `dat/in/annotations/{vtc|vcm|alice}/raw`
+
+Note that the annotation files should have **unique names** (e.g. like the acoustic annotations) and **should by no means overwrite the files already present** in the aforementioned directories.
+
+**(5)** Save the data set and push the new annotations to GIN
+
+```bash
+datalad save recordings -m "Imported new recordings for date ${today}"
+datalad save annotations/*/raw -m "Imported raw annotations for date ${today}"
+datalad push --to origin
+```
+
+This is a very important step. This allows us to push the new recordings and annotations before running any script that could potentially fail.
+
+**(6)** Import the new recordings
+
+```bash
+python -u scripts/URUMETRICS-CODE/import_data/import_recordings.py --experiment Uruguayan_Chatbot_2022
+```
 
 This command will look at the new recordings found in the `raw` directory and add them to the metadata file `metadata/recordings.csv`. If some recordings belong to previously unknown children, they will be added to the metadata file `metadata/children.csv`.
 
 Note that the recording file names **should comply with the file naming convention described above**!
 
-2. Place the annotations in their respective folder in `dat/in/annotations/{vtc|vcm|alice}/raw`
+**(7)** Extract the acoustic annotations using the following command 
+```bash
+python -u scripts/URUMETRICS-CODE/acoustic_annotations/compute_acoustic_annotations.py --path-vtc ./annotations/vtc/raw/VTC_FILE_FOR_WHICH_TO_DERIVE_ACOUSTIC_ANNOTATIONS_FOR.rttm --path-recordings ./recordings/raw/ --save-path ./annotations/acoustic/raw
+```
 
-Note that the annotation files should have **unique names** (e.g. like the acoustic annotations) and **should by no means overwrite the files already present** in the aforementioned directories.
+This command will compute acoustic annotations (mean pitch, pitch range) for the VTC file passed as argument. The output file will have the same name as the input VTC file with the `rttm` extension replaced by `csv`. Of course, in the previous command replace `VTC_FILE_FOR_WHICH_TO_DERIVE_ACOUSTIC_ANNOTATIONS_FOR` by the name of the RTTM file which you want to compute acoustic annotations for.
+
+
+**(8)** Run the following commands to convert and import the annotations to the ChildProject format:
+
+```bash
+python -u scripts/URUMETRICS-CODE/import_data/import_annotations.py --annotation-type VTC --annotation-file VTC_FILE.rttm
+python -u scripts/URUMETRICS-CODE/import_data/import_annotations.py --annotation-type VCM --annotation-file VCM_FILE.vcm
+python -u scripts/URUMETRICS-CODE/import_data/import_annotations.py --annotation-type ALICE --annotation-file ALICE_FILE.txt
+python -u scripts/URUMETRICS-CODE/import_data/import_annotations.py --annotation-type ACOUSTIC --annotation-file ACOUSTIC_FILE.csv
+```
+This will import the VTC, VCM, ALICE and ACOUSTIC annotations contains in the specified files. **Note that you shouldn't specify the full path to the file, but only its raw filename and extension**
+
+This script can also take the additional (optional) parameter `--recording`. When used, only the annotations pertaining to the specified recording (`filename.wav`) will be imported. This can be useful when you need to import only the annotations for a specific recording and not all the annnotations for all the recordings.
 
-3. Extract the acoustic annotations using the following command `python -u scripts/URUMETRICS-CODE/acoustic_annotations/compute_acoustic_annotations.py --path-vtc ./annotations/vtc/raw/VTC_FILE_FOR_WHICH_TO_DERIVE_ACOUSTIC_ANNOTATIONS_FOR.rttm --path-recordings ./recordings/raw/ --save-path ./annotations/acoustic/raw`.
+**(9)** Compute the conversational annotations using the following command:
 
-This command will compute acoustic annotations (mean pitch, pitch range) for the VTC file passed as argument. The output file will have the same name as the input VTC file with the `rttm` extension replaced by `csv`.
+```bash
+python compute_turn_annotations.py --save-path ./annotations/conversations/raw
+```
 
-4. Run the following command `python -u scripts/URUMETRICS-CODE/import_data/import_annotations.py` to convert the annotations to the ChildProject format.
+This command will only compute the conversational annotations for the newly imported VTC files.
 
-6. `python ./scripts/URUMETRICS-CODE/turn_annotations/compute_turn_annotations.py --save-path annotations/conversations/raw/`
+**(10)** Import the conversational annotations using the following command
 
-5. Run the following command `python -u scripts/URUMETRICS-CODE/compute_metrics/metrics.py` to compute ACLEW metrics as well as the additional metrics defined in `compute_metrics/metrics_functions.py`
+```bash
+python -u scripts/URUMETRICS-CODE/import_data/import_annotations.py --annotation-type CONVERSATIONS --annotation-file CONVERSATIONS_FILE.csv
+```
 
-This command will generate a file `metrics.csv` which will be stored in `extra/metrics`. If the file already exists, new lines will be added at the end.
+**(11)** Run the following command to compute ACLEW metrics as well as the additional metrics defined in `compute_metrics/metrics_functions.py`:
+
+```bash
+python -u scripts/URUMETRICS-CODE/compute_metrics/metrics.py
+```
 
-Note that the metrics are only computed for newly imported recordings and not for all the files. If not annotations are linked to the new files (e.g. you forgot to import them) the columns will be empty.
+This command will generate a file `metrics.csv` which will be stored in `extra/metrics`. If the file already exists, new lines will be appended.
 
-6. Generate the message using the following command `python -u scripts/URUMETRICS-CODE/generate_messages/messages.py [--date YYYYMMDD]`
+Note that the metrics are only computed for newly imported recordings and not for all the files. If no annotations are linked to the new files (e.g. you forgot to import them) the columns will be empty.
+
+**(12)** Generate the message using the following command
+
+```bash
+python -u scripts/URUMETRICS-CODE/generate_messages/messages.py [--date YYYYMMDD]
+```
 
 This command will create a file in `extra/messages/generated` with the following name pattern `messages_YYYYMMDD.csv`
 
 The file will contain the messages that correspond to each new audio file. The date parameter is used to specify the date for which to generate messages. If the date is before the current date, only recording available at the specified date will be considered to generate the messages. This allows to re-generate past messages if needed. If no date is specified, the current date is used.
 
+**Do something with the generated message file**
+
+**(13)** Save the data set and push everything to GIN
+
+```bash
+datalad save annotations/*/raw -m "Imported derived raw annotations for date ${today}"
+datalad save annotations/*/converted -m "Converted annotations for date ${today}"
+datalad save metadata -m "Updated metadata for date ${today}"
+datalad save extra/metrics -m "Computed new metrics for date ${today}"
+datalad save extra/messages/generated -m "Message generated for date ${today}"
+datalad save .
+datalad push --to origin
+```
+
+**(14)** Uninstall the data set
+
+```bash
+git annex dead here
+datalad push --to origin
+cd ..
+datalad remove -d ${dataset}
+```
+
 ## Return codes
 
 Every command returns either a `0` (i.e. no problem) or `1` (i.e. problem) return code. They might print information, warning and error messages to STDERR.
@@ -99,4 +199,5 @@ TO DO!
 
 * VTC: [66f87c2a8cef25c80c9d9b91f4023ab4757413da](https://github.com/MarvinLvn/voice-type-classifier/tree/66f87c2a8cef25c80c9d9b91f4023ab4757413da)
 * VCM: [37e27e75c613ef78f375ff43f1d69940b02d0713](https://github.com/LAAC-LSCP/vcm/tree/37e27e75c613ef78f375ff43f1d69940b02d0713)
-* Alice: [f7962f46615a6a433f0da5398f61282d9961c101](https://github.com/orasanen/ALICE/tree/f7962f46615a6a433f0da5398f61282d9961c101)
+* Alice: [f7962f46615a6a433f0da5398f61282d9961c101](https://github.com/orasanen/ALICE/tree/f7962f46615a6a433f0da5398f61282d9961c101)
+* Conversations: TO BE SPECIFIED