|
@@ -42,33 +42,34 @@ usage: main.py [-h] [--group {corpus,child}] [--chains CHAINS] [--samples SAMPLE
|
|
|
main model described throughout the notes.
|
|
|
|
|
|
optional arguments:
|
|
|
- -h, --help show this help message and exit
|
|
|
- --group {corpus,child}
|
|
|
- --chains CHAINS
|
|
|
- --samples SAMPLES
|
|
|
- --validation VALIDATION
|
|
|
- --output OUTPUT
|
|
|
- ```
|
|
|
+-h, --help show this help message and exit
|
|
|
+--group {corpus,child}
|
|
|
+--chains CHAINS
|
|
|
+--samples SAMPLES
|
|
|
+--validation VALIDATION
|
|
|
+--output OUTPUT
|
|
|
+```
|
|
|
+
|
|
|
+The ``--group`` parameter controls the primary level of the hierarchical model. The model indeed assumes that confusion rates (i.e. confusion probabilities) vary across corpora (``corpus``) or children (``child``).
|
|
|
|
|
|
- The ``--group`` parameter controls the primary level of the hierarchical model. The model indeed assumes that confusion rates (i.e. confusion probabilities) vary across corpora (``corpus``) or children (``child``).
|
|
|
+The ``--chains`` parameter sets the amount of MCMC chains, and ``--samples`` controls the amount of MCMC samples, warmup excluded.
|
|
|
|
|
|
- The ``--chains`` parameter sets the amount of MCMC chains, and ``--samples`` controls the amount of MCMC samples, warmup excluded.
|
|
|
+The ``--validation`` parameter sets the amount of annotation clips used for validation rather than training. Set it to 0 in order to use as much data for training as possible.
|
|
|
|
|
|
- The ``--validation`` parameter sets the amount of annotation clips used for validation rather than training. Set it to 0 in order to use as much data for training as possible.
|
|
|
+The ``--output`` parameter controls the output destination. Training data will be saved to ``output/samples/data_{output}.pickle`` and the MCMC samples are saved as ``output/samples/fit_{output}.parquet``
|
|
|
|
|
|
- The ``--output`` parameter controls the output destination. Training data will be saved to ``output/samples/data_{output}.pickle`` and the MCMC samples are saved as ``output/samples/fit_{output}.parquet``
|
|
|
|
|
|
- ### Confusion probabilities
|
|
|
+### Confusion probabilities
|
|
|
|
|
|
- The marginal posterior distribution of the confusion matrix is shown below:
|
|
|
+The marginal posterior distribution of the confusion matrix is shown below:
|
|
|
|
|
|
- ![](output/fit_vanuatu.png)
|
|
|
+![](output/fit_vanuatu.png)
|
|
|
|
|
|
|
|
|
- ### Speech distribution
|
|
|
+### Speech distribution
|
|
|
|
|
|
- Speech distributions used to generate the simulated "null-hypothesis" corpora are fitted against the training data using Gamma distributions. The code used to fit these distributions is found in ``code/models/speech_distribution``.
|
|
|
+Speech distributions used to generate the simulated "null-hypothesis" corpora are fitted against the training data using Gamma distributions. The code used to fit these distributions is found in ``code/models/speech_distribution``.
|
|
|
|
|
|
- The match between the training data and the Gamma parametrization can be observed in various plots in ``output``. See below for the Key child:
|
|
|
+The match between the training data and the Gamma parametrization can be observed in various plots in ``output``. See below for the Key child:
|
|
|
|
|
|
- ![](output/dist_CHI.png)
|
|
|
+![](output/dist_CHI.png)
|