Scheduled service maintenance on November 22


On Friday, November 22, 2024, between 06:00 CET and 18:00 CET, GIN services will undergo planned maintenance. Extended service interruptions should be expected. We will try to keep downtimes to a minimum, but recommend that users avoid critical tasks, large data uploads, or DOI requests during this time.

We apologize for any inconvenience.

YODA repo to align vandam corpus using Montreal Forced Aligner.

mfrebo 456da85a3a Update 'datacite.yml' 3 years ago
.datalad c11e500a15 [DATALAD] new dataset 3 years ago
.vscode 385198801d corrected with lucas' review 3 years ago
code dbd9efcf3f updated compare.py + confusion matrices 3 years ago
inputs 2cb46a39ea cleaned unnecessary files 3 years ago
outputs dbd9efcf3f updated compare.py + confusion matrices 3 years ago
.gitattributes c84fb1c33e Apply YODA dataset setup 3 years ago
.gitmodules 1d2b1741b5 [DATALAD] Recorded changes 3 years ago
CHANGELOG.md c84fb1c33e Apply YODA dataset setup 3 years ago
Comparison-summary.md 17093d6efb edited comaprison.md 3 years ago
LICENSE 707d897fcf Add 'LICENSE' 3 years ago
README.md 2820fd10b6 modified readme 3 years ago
datacite.yml 456da85a3a Update 'datacite.yml' 3 years ago

README.md

Project

Dataset structure

  • All inputs (i.e. building blocks from other sources) are located in inputs/.
  • All custom code is located in code/.

Steps to generate aligned .csv from vandam-data .cha annotations

  1. Run code/csv2grid with annotations/cha/converted as input (converts the original .csv to .TextGrid)
  2. Run MFA Align with output files of previous step as input (with inputs/mfa-models/acoustic & inputs/mfa-models/dictionary as required)
  3. Run code/grid2csv to convert .TextGrids to .csv with outputs of previous step as input.

Steps for comparison of aligned segments with human annotator

  1. Use child-project sampler to generate 5x 1 minute segments (high-volubility) and outputs them in outputs/
  2. Use child-project eaf-builder with files generated at previous step and templates at inputs/eaf_templates
  3. Annotate segments by hand on ELAN
  4. Create csv dataframe with each segment in outputs/fivesegments-eaf
  5. Import that .csv with child-project import-annotations
datacite.yml
Title Alignment of Vandam corpus using Montreal Forced Aligner
Authors FREBOURG,Martin;McGill University
Gautheron,Lucas;École Normale Supérieure - PSL
Cristia,Alejandrina;École Normale Supérieure - PSL
Description YODA repo to align vandam corpus using Montreal Forced Aligner (MFA).
License Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
References HomeBank VanDam Public Daylong Corpus - Mark VanDam [https://doi.org/10.21415/t5qh5n] (Dataset)
McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger (2017). Montreal Forced Aligner [Computer program]. Version 0.9.0, retrieved 17 January 2017 from http://montrealcorpustools.github.io/Montreal-Forced-Aligner/. [http://dx.doi.org/10.21437/Interspeech.2017-1386] (Montreal Forced Aligner)
Funding
Keywords Neuroscience
Linguistics
Vandam
Montreal Forced Aligner
MFA
annotations
Resource Type Dataset