Browse Source

lettering fixed

alecristia 2 months ago
parent
commit
7d5d113ad1
3 changed files with 246 additions and 150 deletions
  1. 31 25
      CODE/SM.Rmd
  2. 204 114
      CODE/SM.html
  3. 11 11
      CODE/sessionInfo.txt

+ 31 - 25
CODE/SM.Rmd

@@ -80,7 +80,7 @@ if(RECALC) source("create-all-rs.R")
 
 ```{r readin}
 
-df.icc.simu <- read.csv("../OUTPUT/df.icc.simu.csv") #365 rows because there are multiple rows per metric as a function of r
+df.icc.simu <- read.csv("../OUTPUT/df.icc.simu.csv") #375 rows because there are multiple rows per metric as a function of r
 
 mydat_aclew <- read.csv(paste0('../data_output/', 'aclew','_metrics_scaled.csv')) 
 mydat_aclew <- mydat_aclew[is.element(mydat_aclew$experiment, corpora),]
@@ -99,7 +99,7 @@ mydat2 <- read.csv("../data_output/dat_sib_ana.csv")
 
 df.icc.age<-read.csv("../OUTPUT/df.icc.age.csv")
 age_levels=c("(0,6]" , "(6,12]",  "(12,18]" ,"(18,24]" ,"(24,30]", "(30,36]" )
-#not present in data: , "(36,42]", "(42,48]", "(48,54]"
+#not present in data because no repeated recs at these ages: , "(36,42]", "(42,48]", "(48,54]"
 df.icc.age$age_bin<-factor(df.icc.age$age_bin,levels=age_levels)
 df.icc.age$Type<-get_type(df.icc.age)
 
@@ -125,7 +125,7 @@ In the simulation, we departed from reality as follows:
 - we did not consider child age, nor variable re-recording periods
 - we had a single pair of recordings (rather than variable number of re-recordings)
 
-We used simstudy, a package created for such simulations, following the vignette https://cran.r-project.org/web/packages/simstudy/vignettes/correlated.html to create correlated data providing a correlation matrix. Results are shown in the Figure below. Each point represents the ICC extracted from a mixed model applied to one metric, combining data from all corpora. It appears that ICC values reflect underlying r values, but underestimating r more the larger r is. 
+We used simstudy, a package created for such simulations, following the vignette https://cran.r-project.org/web/packages/simstudy/vignettes/correlated.html to create correlated data providing a correlation matrix. Results are shown in the Figure below. Each point represents the ICC extracted from a mixed model applied to one metric, combining data from all corpora. **It appears that ICC values reflect underlying r values, but underestimating r more the larger r is.** 
 
 ```{r icc-sim-plot, fig.cap="Results from simulations aimed at helping understand the relationship between underlying r and estimated ICC."}
 
@@ -361,7 +361,7 @@ See table below for results of the Type 3 ANOVA.
 
 ```{r print out anova results rec on cor}
 
-kable(round(reg_anova_cor,2),caption="Type 3 ANOVA on model attempting to explain variation in Child ICC as a function of talker types and pipelines.")
+kable(round(reg_anova_cor,2),caption="Type 3 ANOVA on model attempting to explain variation in Correlations as a function of talker types and pipelines.")
 ```
 
 
@@ -410,7 +410,7 @@ This is still lower than the correlation observed for this same variable, `r rva
 > Out of the `r dim(df.icc.mixed)[1]` fitted models, `r table(df.icc.mixed$formula)["full"]` could be fit with the full model, yielding a measure of Corpus ICC.  For the `r table(df.icc.mixed$formula)["no_exp"]` for which the full model was singular, we fit the data with the No Corpus model, and none was singular then, allowing us to have Child ICC for all `r dim(df.icc.mixed)[1]` metrics. 
 
 
-## SM J: Code to reproduce Table 4
+## SM K: Code to reproduce Table 4
 
 Please see code in the RMarkDown version of the document.
 
@@ -434,7 +434,7 @@ kable(x,row.names = F,digits=2,caption="Table 4 (reproduced). Most commonly used
 
 
 
-## SM K: Code to reproduce text below Table 4 at the beginning of the "Overall reliability" section
+## SM L: Code to reproduce text below Table 4 at the beginning of the "Overall reliability" section
 
 Please see code in the RMarkDown version of the document.
 
@@ -448,9 +448,9 @@ best_metric$icc_child_id=round(best_metric$icc_child_id,2)
 ```
 
 
-> Figure 5 shows the distribution of Child ICC across all `r dim(df.icc.mixed)[1]` metrics, separately for each pipeline. The majority of measures had Child ICCs between .3 and .5. `r sum(df.icc.mixed$icc_child_id > .5)` measures had Child ICCs higher or equal to .5. Surprisingly, the top 6 metrics in terms of Child ICC corresponded to the "other child" category, known to have the worst accuracy according to previous analyses (Cristia et al., 2020). In an analysis fully reported in supplementary materials (SM K), we find some evidence that this may be due to the presence versus absence of siblings. The next metric with the highest Child ICC corresponded to an output measure, namely the total vocalization duration per hour extracted from ACLEW annotations (`r best_metric[best_metric$Type=="Output",c("metric","data_set")]`), with a Child ICC of `r best_metric[best_metric$Type=="Output","icc_child_id"]`. Among adult metrics, the average vocalization duration for female vocalizations for ACLEW (`r best_metric[best_metric$Type=="Female",c("metric","data_set")]`) and the ACLEW equivalent of CTC had the highest Child ICC (`r best_metric[best_metric$Type=="Female","icc_child_id"]` and `r best_metric[best_metric$Type=="Adults","icc_child_id"]`, respectively). 
+> The Child ICC of these metrics is not particularly high or low, given the overall distributions described below and illustrated in Figure 5, which shows the distribution of Child ICC across all `r dim(df.icc.mixed)[1]` metrics, separately for each pipeline. The majority of measures had Child ICCs between .3 and .5. Only `r sum(df.icc.mixed$icc_child_id > .5)` metrics had Child ICCs higher or equal to .5. Surprisingly, the top 6 metrics in terms of Child ICC corresponded to the "other child" category, known to have the worst accuracy according to previous analyses (Cristia et al., 2020). In an analysis fully reported in supplementary materials (SM M), we find some evidence that this may be due to the presence versus absence of siblings. The next metric with the highest Child ICC corresponded to an output measure, namely the total vocalization duration per hour extracted from ACLEW annotations (`r paste(best_metric[best_metric$Type=="Output",c("metric","data_set")])`), with a Child ICC of `r best_metric[best_metric$Type=="Output","icc_child_id"]`. Among adult metrics, the average vocalization duration for female vocalizations for ACLEW (`r paste(best_metric[best_metric$Type=="Female",c("metric","data_set")])`) and the ACLEW equivalent of CTC had the highest Child ICC (`r best_metric[best_metric$Type=="Female","icc_child_id"]` and `r best_metric[best_metric$Type=="Adults","icc_child_id"]`, respectively). 
 
-## SM K: Are high Child ICCs for "other child" measures due to number or presence of siblings? (Exploration)
+## SM M: Are high Child ICCs for "other child" measures due to number or presence of siblings? (Exploration)
 
 ```{r explo-och-sibn}
 
@@ -494,7 +494,7 @@ Perhaps it is not so much the sheer number of siblings that explains variance, b
 As in the sibling number analysis, the full model was singular, so we fitted a No Corpus model to be able to extract a Child ICC. We again verified that sibling presence predicted the outcome, total vocalization duration by other children -- and found that it did: ß = `r round(summary(model_sib_presence)$coefficients["sib_presencepresent","Estimate"],2)`, t = `r round(summary(model_sib_presence)$coefficients["sib_presencepresent","t value"],2)`, p < .001. This effect is, as expected, sizable: It means that there is nearly one whole standard deviation increase in this variable when there are any siblings. In addition to being a better predictor, in this model, the amount of variance allocated to individual children as measured by Child ICC was considerably higher in our original analysis (`r round(df.icc.mixed[df.icc.mixed$metric=="voc_dur_och_ph" & df.icc.mixed$data_set=="aclew","icc_child_id"],2)`) than in this re-analysis including sibling presence (`r round(icc.result.split["icc_child_id"],2)`).
 
 
-## SM L: Are "bad" output measures those coming from VCM? (Exploration)
+## SM N: Are "bad" output measures those coming from VCM? (Exploration)
 
 Among ACLEW measures, a fair number of them come from VCM, a module that classifies child vocalizations in terms of vocal maturity types into cry, canonical, and non-canonical categories. In unpublished analyses, we have found that VCM labels are inaccurate when compared to human labels of the same vocalizations, relatively to other metrics. In this analysis, we checked whether VCM-derived measures had lower Child ICC than other ACLEW measures. As shown in the next Figure, this was not the case: Some output measures from the ACLEW pipeline have lower Child ICC than VCM ones.
 
@@ -516,7 +516,7 @@ panel.background = element_blank(), legend.key=element_blank(), axis.line = elem
 
 ```
 
-## SM M: Code to reproduce Figure 5
+## SM O: Code to reproduce Figure 5
 
 Please see code in the RMarkDown version of the document.
 
@@ -539,7 +539,7 @@ ggsave("fig5.png", plot = fig5, width = 4, height = 3, units = "in")
 ```
 
 
-## SM N: Code to reproduce text below Figure 5
+## SM P: Code to reproduce text below Figure 5
 
 Please see code in the RMarkDown version of the document.
 
@@ -564,18 +564,25 @@ rownames(msds_p)<-msds_p$Group.1
 ```
 
 
-> Next, we explored how similar Child ICCs were across different talker types and pipelines. We fit a linear model with the formula $lm(icc\_child\_id ~ type * pipeline)$, where type indicates whether the measure pertained to the key child, (female/male) adults, other children; and pipeline LENA or ACLEW. The model was overall significant (F(`r round(reg_sum$fstatistic["dendf"],2)`) = `r round(reg_sum$fstatistic["value"],2)`, p < .001). We found an adjusted R-squared of `r round(reg_sum$adj.r.squared*100)`%, suggesting much of the variance across Child ICCs was explained by these factors. A Type 3 ANOVA on this model revealed type was a significant predictor (F(`r reg_anova["Type","Df"]`) = `r round(reg_anova["Type","F value"],1)`, p<.001), as was pipeline (F(`r reg_anova["data_set","Df"]`) = `r round(reg_anova["data_set","F value"],1)`, p = `r round(reg_anova["data_set","Pr(>F)"],3)`); the interaction between type and pipeline was not significant. The main effect of type emerged because output metrics tended to have higher Child ICC (`r msds["Output","x"]`)  than those associated to adults in general (`r msds["Adults","x"]`), females (`r msds["Female","x"]`), and males (`r msds["Male","x"]`); whereas those associated with other children had even higher Child ICCs (`r msds["Other children","x"]`). The main effect of pipeline arose because of slightly higher Child ICCs for the ACLEW metrics (`r msds_p["aclew","x"]`) than for LENA metrics (`r msds_p["lena","x"]`). 
+> Next, we explored how similar Child ICCs were across different talker types and pipelines. We fit a linear model with the formula $lm(icc\_child\_id ~ type * pipeline)$, where type indicates whether the measure pertained to the key child, (female/male) adults, other children; and pipeline LENA or ACLEW. The model was overall significant (F(`r round(reg_sum$fstatistic["dendf"],2)`) = `r round(reg_sum$fstatistic["value"],2)`, p < .001). We found an adjusted R-squared of `r round(reg_sum$adj.r.squared*100)`%, suggesting much of the variance across Child ICCs was explained by these factors. A Type 3 ANOVA on this model revealed type was a significant predictor (F(`r reg_anova["Type","Df"]`) = `r round(reg_anova["Type","F value"],1)`, p<.001), and pipeline was marginal (F(`r reg_anova["data_set","Df"]`) = `r round(reg_anova["data_set","F value"],1)`, p = `r round(reg_anova["data_set","Pr(>F)"],3)`); the interaction between type and pipeline was not significant. The main effect of type emerged because output metrics tended to have higher Child ICC (`r msds["Output","x"]`)  than those associated to adults in general (`r msds["Adults","x"]`), females (`r msds["Female","x"]`), and males (`r msds["Male","x"]`); whereas those associated with other children had even higher Child ICCs (`r msds["Other children","x"]`). The trend for a main effect of pipeline arose because of slightly higher Child ICCs for the ACLEW metrics (`r msds_p["aclew","x"]`) than for LENA metrics (`r msds_p["lena","x"]`). 
+
+See table below for results of the Type 3 ANOVA.
+
+```{r print out anova results}
+
+kable(round(reg_anova,2),caption="Type 3 ANOVA on model attempting to explain variation in Child ICC as a function of talker types and pipelines.")
+```
 
 
 
-## SM P: Code to reproduce text at the beginning of the "Reliability across age groups" section
+## SM Q: Code to reproduce text at the beginning of the "Reliability across age groups" section
 
 Please see code in the RMarkDown version of the document.
 
 
 > Out of `r dim(df.icc.age)[1]` fitted models (`r dim(df.icc.mixed)[1]` metrics times `r length(levels(factor(df.icc.age$age_bin)))` age bins), `r sum(df.icc.age$formula=="no_chi_effect")` were singular when including a random intercept per child, and therefore they could not be included in these analyses at all. In addition, `r sum(df.icc.age$formula=="no_exp")` were singular when including a random intercept per corpus. The remaining `r sum(df.icc.age$formula=="full")` could be analyzed with the full model.
 
-## SM Q: Code to reproduce Figure 6
+## SM R: Code to reproduce Figure 6
 
 Please see code in the RMarkDown version of the document.
 
@@ -616,7 +623,7 @@ ggsave("fig6.png", plot = fig6, width = 6, height = 10, units = "in")
 
 ```
 
-## SM R: Code to reproduce text below Figure 6
+## SM S: Code to reproduce text below Figure 6
 
 Please see code in the RMarkDown version of the document.
 
@@ -634,8 +641,7 @@ reg_anova_age_icc=Anova(age_icc)
 
 ```
 
-> To interrogate these results statistically, and assess whether Child ICCs tended to be higher or lower in certain age bins, we fit a linear model with the formula $lm(Child\_ICC ~ type * pipeline * age\_bin)$. The model was overall significant (F(`r round(reg_sum_age_icc$fstatistic["dendf"],2)`) = `r round(reg_sum_age_icc$fstatistic["value"],2)`, p < .001). We found an adjusted R-squared of `r round(reg_sum_age_icc$adj.r.squared*100)`%, suggesting this model explained more than a third of the variance in Child ICC.  A Type 3 ANOVA on this model revealed that the interactions between type and pipeline, pipeline and age, and the three-way interaction (type, pipeline, age) were not significant. However, both the type by age bin interaction (F(`r reg_anova_age_icc["Type:age_bin","Df"]`) = `r round(reg_anova_age_icc["Type:age_bin","F value"],1)`, p < .001) and the three main effects were significant (
-type: F(`r reg_anova_age_icc["Type","Df"]`) = `r round(reg_anova_age_icc["Type","F value"],1)`, p < .001; 
+> To interrogate these results statistically, and assess whether Child ICCs tended to be higher or lower in certain age bins, we fit a linear model with the formula $lm(Child\_ICC ~ type * pipeline * age\_bin)$. The model was overall significant (F(`r round(reg_sum_age_icc$fstatistic["dendf"],2)`) = `r round(reg_sum_age_icc$fstatistic["value"],2)`, p < .001). We found an adjusted R-squared of `r round(reg_sum_age_icc$adj.r.squared*100)`%, suggesting this model explained more than a third of the variance in Child ICC.  A Type 3 ANOVA on this model revealed that the interactions between type and pipeline, pipeline and age, and the three-way interaction (type, pipeline, age) were not significant. However, both the type*age bin interaction (F(`r reg_anova_age_icc["Type:age_bin","Df"]`) = `r round(reg_anova_age_icc["Type:age_bin","F value"],1)`, p < .001) and the three main effects were significant (type: F(`r reg_anova_age_icc["Type","Df"]`) = `r round(reg_anova_age_icc["Type","F value"],1)`, p < .001; 
 age: F(`r reg_anova_age_icc["age_bin","Df"]`) = `r round(reg_anova_age_icc["age_bin","F value"],1)`, p < .001; 
 pipeline: F(`r reg_anova_age_icc["data_set","Df"]`) = `r round(reg_anova_age_icc["age_bin","F value"],1)`, p = .01).
 
@@ -649,7 +655,7 @@ kable(round(reg_anova_age_icc,2),caption="Type 3 ANOVA on model attempting to ex
 
 
 
-## SM S: Code to reproduce Figure 7
+## SM T: Code to reproduce Figure 7
 
 Please see code in the RMarkDown version of the document.
 
@@ -691,7 +697,7 @@ ggsave("fig7.png", plot = fig7, width = 4, height = 4, units = "in")
 
 
 
-## SM T: Code to reproduce text in the "Reliability within corpus" section
+## SM U: Code to reproduce text in the "Reliability within corpus" section
 
 Please see code in the RMarkDown version of the document.
 
@@ -699,7 +705,7 @@ Please see code in the RMarkDown version of the document.
 > Out of `r dim(df.icc.corpus)[1]` fitted models (`r dim(df.icc.mixed)[1]` metrics times `r length(levels(factor(df.icc.corpus$corpus)))` corpora), `r sum(df.icc.corpus$formula=="no_chi_effect")` were singular when including a random intercept per child, and therefore they could not be included in these analyses at all. (Including a random intercept per corpus is not relevant here, since only data from one corpus is included in each model fit.)
 
 
-## SM U: Code to reproduce Figure 8
+## SM V: Code to reproduce Figure 8
 
 Please see code in the RMarkDown version of the document.
 
@@ -730,7 +736,7 @@ ggsave("fig8.png", plot = fig8, width = 8, height = 4, units = "in")
 
 ```
 
-## SM V: Code to reproduce text below Figure 8
+## SM W: Code to reproduce text below Figure 8
 
 Please see code in the RMarkDown version of the document.
 
@@ -758,7 +764,7 @@ kable(round(reg_anova_cor_icc,2),caption="Type 3 ANOVA on model attempting to ex
 
 ```
 
-## SM W: Code to reproduce Figure 9
+## SM X: Code to reproduce Figure 9
 
 Please see code in the RMarkDown version of the document.
 
@@ -796,7 +802,7 @@ fig9
 ggsave("fig9.png", plot = fig9, width = 8, height = 4, units = "in")
 ```
 
-## SM X: Code to reproduce text in the Discussion section
+## SM Y: Code to reproduce text in the Discussion section
 
 Please see code in the RMarkDown version of the document.
 
@@ -824,11 +830,11 @@ bias_tab$recXcor<-bias_tab$recXcor/sum(bias_tab$recXcor)
 
 > Our data draws mainly from urban (`r round(sum(bias_tab$recXcor[urban])*100)`% of recordings, `r round(sum(bias_tab$chiXcor[urban])*100)`% of the children, `r round(sum(urban)/length(urban)*100)`% of the corpora), English-speaking settings (`r round(sum(bias_tab$recXcor[english])*100)`% of recordings, `r round(sum(bias_tab$chiXcor[english])*100)`% of the children, `r round(sum(english)/length(english)*100)`% of the corpora), largely from North America (`r round(sum(bias_tab$recXcor[northam])*100)`% of recordings, `r round(sum(bias_tab$chiXcor[northam])*100)`% of the children, `r round(sum(northam)/length(northam)*100)`% of the corpora). 
 
-## SM Y: Variability as a function of hardware
+## SM Z: Variability as a function of hardware
 
 Another potential negative contribution to reliability that is currently not discussed is variability in the experimental setup. In a corpus collected in the Solomon Islands, children were wearing two recorders simultaneously. These were USB devices, sourced from two different providers. In this dataset, the duration of the recordings could be very different within the same pair (a ~10% difference was not atypical), which means that what is actually recorded is somewhat random in itself. Even comparing identical audio ranges covered by both recordings of each pair, the corresponding ACLEW metrics differed slightly; they were strongly correlated (R^2 was close to 0.95) but not perfectly correlated. This suggests that randomness in the recorders properties and their placement may also contribute to a decrease reliability. This reliability is importantly not at all due to changing underlying conditions, as both recorders picked up on the exact same day, so it is not due to variability in underlying behaviors. It is also not due to algorithmic variation because ACLEW algorithms are deterministic. Thus, this variability is only due to hardware differences and potentially also differences in e.g. USB placement.
 
-## SM Z: References
+## References
 
 
 Alcock, K., Meints, K., & Rowland, C. (2020). The UK Communicative Development Inventories. London, UK: J&R Press. 

File diff suppressed because it is too large
+ 204 - 114
CODE/SM.html


+ 11 - 11
CODE/sessionInfo.txt

@@ -1,6 +1,6 @@
 R version 4.3.0 (2023-04-21)
 Platform: aarch64-apple-darwin20 (64-bit)
-Running under: macOS Ventura 13.6.1
+Running under: macOS Ventura 13.6.3
 
 Matrix products: default
 BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
@@ -18,26 +18,26 @@ attached base packages:
 other attached packages:
  [1] ggbeeswarm_0.7.2      car_3.1-2             carData_3.0-5        
  [4] stringr_1.5.0         tidyr_1.3.0           dplyr_1.1.2          
- [7] psych_2.3.6           kableExtra_1.3.4.9000 ggpubr_0.6.0         
-[10] ggthemes_5.0.0        ggplot2_3.4.2         performance_0.10.4   
-[13] lme4_1.1-33           Matrix_1.5-4.1       
+ [7] psych_2.3.3           kableExtra_1.3.4.9000 ggpubr_0.6.0         
+[10] ggthemes_5.0.0        ggplot2_3.4.2         performance_0.10.8   
+[13] lme4_1.1-33           Matrix_1.5-4         
 
 loaded via a namespace (and not attached):
  [1] beeswarm_0.4.0    gtable_0.3.3      xfun_0.39         bslib_0.5.0      
- [5] insight_0.19.2    rstatix_0.7.2     lattice_0.21-8    vctrs_0.6.3      
+ [5] insight_0.19.7    rstatix_0.7.2     lattice_0.21-8    vctrs_0.6.4      
  [9] tools_4.3.0       generics_0.1.3    parallel_4.3.0    tibble_3.2.1     
 [13] fansi_1.0.4       highr_0.10        pkgconfig_2.0.3   webshot_0.5.5    
 [17] lifecycle_1.0.3   farver_2.1.1      compiler_4.3.0    textshaping_0.3.6
-[21] mnormt_2.1.1      munsell_0.5.0     vipor_0.4.5       htmltools_0.5.5  
-[25] sass_0.4.7        yaml_2.3.7        pillar_1.9.0      nloptr_2.0.3     
-[29] jquerylib_0.1.4   MASS_7.3-60       cachem_1.0.8      boot_1.3-28.1    
+[21] mnormt_2.1.1      munsell_0.5.0     vipor_0.4.7       htmltools_0.5.5  
+[25] sass_0.4.6        yaml_2.3.7        pillar_1.9.0      nloptr_2.0.3     
+[29] jquerylib_0.1.4   MASS_7.3-58.4     cachem_1.0.8      boot_1.3-28.1    
 [33] abind_1.4-5       nlme_3.1-162      tidyselect_1.2.0  rvest_1.0.3      
-[37] digest_0.6.33     stringi_1.7.12    purrr_1.0.1       labeling_0.4.2   
+[37] digest_0.6.31     stringi_1.7.12    purrr_1.0.2       labeling_0.4.2   
 [41] splines_4.3.0     cowplot_1.1.1     fastmap_1.1.1     grid_4.3.0       
 [45] colorspace_2.1-0  cli_3.6.1         magrittr_2.0.3    utf8_1.2.3       
 [49] broom_1.0.5       withr_2.5.0       scales_1.2.1      backports_1.4.1  
 [53] rmarkdown_2.23    httr_1.4.6        gridExtra_2.3     ggsignif_0.6.4   
 [57] ragg_1.2.5        evaluate_0.21     knitr_1.43        viridisLite_0.4.2
-[61] mgcv_1.8-42       rlang_1.1.1       Rcpp_1.0.10       glue_1.6.2       
-[65] xml2_1.3.5        svglite_2.1.1     rstudioapi_0.15.0 minqa_1.2.5      
+[61] mgcv_1.8-42       rlang_1.1.1       Rcpp_1.0.11       glue_1.6.2       
+[65] xml2_1.3.5        svglite_2.1.1     rstudioapi_0.14   minqa_1.2.5      
 [69] jsonlite_1.8.7    R6_2.5.1          systemfonts_1.0.4