 title: Supplementary Materials to Establishing the reliability and validity of measures extracted from long-form recordings
+The majority of measures had ICCs between .3 and .5. `r sum(df.icc.mixed$icc_child_id > .5)` measures had higher ICCs, and surprisingly, `r sum(df.icc.mixed$icc_child_id[grep("och",df.icc.mixed$metric)] > .5)` of them corresponded to the "other child" category, known to have the worst accuracy according to previous analyses (Cristia et al., 2020). 
+### Checking whether high ICC for other child measures are due to presence of other siblings
 ```{r explo-och-sibn}
-read.csv("../input/quechua_md.csv")->x ##OH this needs to be done!!!
-The majority of measures had ICCs between .3 and .5. `r sum(df.icc.mixed$icc_child_id > .5)` measures had higher ICCs, and surprisingly, `r sum(df.icc.mixed$icc_child_id > .5)` of them
+We reasoned this may be because children in our corpora vary in terms of the number of siblings they have, and that siblings' presence may be stable across recordings. To address this possibility, we fit the full model again to predict number of vocalizations from other children, but this time including sibling number as a fixed effect $lmer(metric~ age + sibling_number + (1|corpus/child))$, so that individual variation that was actually due to sibling number was captured by that fixed effect instead of the random effect for child. We had sibling number data for `r sum(has_n_of_sib[,"TRUE"])` recordings from `r length(levels(factor(mydat2$child_id[!$n_of_siblings)])))` in `r length(levels(factor(mydat2$experiment[!$n_of_siblings)])))` corpora (`r corp_w_sib_clean`). We fit this model for the metric with the highest Child ICC, ACLEW's total vocalization duration by other children. Results indicated the full model was singular, so we fitted a No Corpus model to be able to extract a Child ICC. In fact, there was no difference in Child ICC in our original analysis (`r round(df.icc.mixed[df.icc.mixed$metric=="voc_dur_och_ph" & df.icc.mixed$data_set=="aclew","icc_child_id"],2)`) versus this re-analysis including the number of siblings (`r round(icc.result.split["icc_child_id"],2)`).
-Six measures had higher ICCs, and surprisingly, they corresponded to the "other child" category, known to have the worst accuracy according to previous analyses (Cristia et al., 2020). We reasoned this may be because children in our corpora vary in terms of the number of siblings they have, and that siblings' presence may be stable across recordings. To address this possibility, we fit the full model again to predict number of vocalizations from other children, but this time including sibling number as a fixed effect $lmer(metric~ age + sibling_number + (1|corpus/child))$, so that individual variation that was actually due to sibling number was captured by that fixed effect instead of the random effect for child. We did this for the metric with the highest Child ICC, ACLEW's total vocalization duration by other children. Results indicated the full model was singular, a first sign that included variables explained shared variance. When we fitted the No Corpus model, Child ICC was indeed reduced from `r round(df.icc.mixed[df.icc.mixed$metric=="voc_dur_och_ph" & df.icc.mixed$data_set=="aclew","icc_child_id"],2)` to `r round(icc.result.split["icc_child_id"],2)`.
+### Code to reproduce text before Table 3
 ```{r reg model icc}
 #I moved this chunk here -- check that nothing is broken by it
-Going back to our overarching analyses, we explored how similar Child ICCs were across different talker types and pipelines. We fit a linear model with the formula lm(icc_child_id ~ type * pipeline), where type indicates whether the measure pertained to the key child, (female/male) adults, other children; and pipeline LENA or ACLEW. We found an adjusted R-squared of `r round(reg_sum$adj.r.squared*100)`%, suggesting much of the variance across Child ICCs was explained by this model. A Type 3 ANOVA on this model revealed only type was a signficant (F(`r reg_anova["Type","Df"]`)=`r round(reg_anova["Type","F value"],1)`, p<.001), whereas neither pipeline nor the interaction between type and pipeline were significant.
+Going back to our overarching analyses, we explored how similar Child ICCs were across different talker types and pipelines. We fit a linear model with the formula $lm(icc_child_id ~ type * pipeline)$, where type indicates whether the measure pertained to the key child, (female/male) adults, other children; and pipeline LENA or ACLEW. We found an adjusted R-squared of `r round(reg_sum$adj.r.squared*100)`%, suggesting much of the variance across Child ICCs was explained by this model. A Type 3 ANOVA on this model revealed only type was a signficant (F(`r reg_anova["Type","Df"]`)=`r round(reg_anova["Type","F value"],1)`, p<.001), whereas neither pipeline nor the interaction between type and pipeline were significant.
 ```{r reg model cor}
-#bug here probably inherited from above
 lr_cor <- lm(m ~ Type * p, data=rval_tab) 
+cor_t=t.test(rval_tab$m ~ rval_tab$p)
-To see whether correlations in this analysis differed by talker types and pipelines, we fit a linear model with the formula lm(cor ~ type * pipeline), where type indicates whether the measure pertained to the key child, (female/male) adults, other children; and pipeline LENA or ACLEW. We found an adjusted R-squared of `r round(reg_sum_cor$adj.r.squared*100)`%, suggesting this model did not explain a great deal of variance in correlation coefficients. Moreover, a Type 3 ANOVA on this model revealed no significant effects or interactions (all p's > .1). See SMXX for fuller results.
+To see whether correlations in this analysis differed by talker types and pipelines, we fit a linear model with the formula $lm(cor ~ type * pipeline)$, where type indicates whether the measure pertained to the key child, (female/male) adults, other children; and pipeline LENA or ACLEW. We found an adjusted R-squared of `r round(reg_sum_cor$adj.r.squared*100)`%, suggesting this model did not explain a great deal of variance in correlation coefficients. A Type 3 ANOVA on this model revealed a significant effect of pipeline (F = `r round(reg_anova_cor["data_set","F value"],2)`, p = `r round(reg_anova_cor["data_set","Pr(>F)"],2)`), due to higher correlations for ACLEW (m = `r round(cor_t$estimate["mean in group aclew"],2)`) than for LENA metrics (m = `r round(cor_t$estimate["mean in group lena"],2)`). See below for fuller results.
-#bug here probably inherited from above
@@ -538,10 +548,10 @@ df.icc.corpus$Type <- get_type(df.icc.corpus)
-Figure 5A addresses this question, showing the distribution of ICC across our 53 metrics in each of the `r length(levels(factor(df.icc.corpus$corpus)))` included corpora.  Out of `r dim(df.icc.corpus)[1]` fitted models (53 metrics times `r length(levels(factor(df.icc.corpus$corpus)))` corpora), `r sum(df.icc.corpus$formula=="no_chi_effect")` were singular when including a random intercept per child, and therefore they could not be included in these analyses at all, and the remaining `r sum(df.icc.corpus$formula=="no_exp")` were singular when including a random intercept per corpus.
+```{r reg model corpusm}
+The fact that we cannot infer reliability from one corpus based on another one was confirmed statistically: We checked whether Child ICC differed by talker types and pipelines across corpora by fitting a linear model with the formula $lm(Child_ICC ~ type * pipeline * corpus)$, where type indicates whether the measure pertained to the key child, (female/male) adults, other children;  pipeline LENA or ACLEW; and corpus the corpus ID. We found an adjusted R-squared of `r round(reg_sum_cor_icc$adj.r.squared*100)`%, suggesting this model explained nearly half of the variance in Child ICC. A Type 3 ANOVA on this model revealed several significant effects and interactions, including a three-way interaction of type, pipeline, and corpus; a two-way interaction of type and corpus; and a main effect of corpus. See below for more information.
-```{r print out anova results rec on icc by corpus,eval=F}
-# The fact that we cannot infer reliability from one corpus based on another one was confirmed statistically: We checked whether Child ICC differed by talker types and pipelines across corpora by fitting a linear model with the formula $lm(Child_ICC ~ type * pipeline * corpus)$, where type indicates whether the measure pertained to the key child, (female/male) adults, other children;  pipeline LENA or ACLEW; and corpus the corpus ID. We found an adjusted R-squared of `r round(reg_sum_cor_icc$adj.r.squared*100)`%, suggesting this model explained over half of the variance in Child ICC. A Type 3 ANOVA on this model revealed several significant effects and interactions, including a three-way interaction of type, pipeline, and corpus; a two-way interaction of type and corpus; and a main effect of corpus. See the Supplementary Materials for more information.
+```{r print out anova results rec on icc by corpus}
@@ -622,10 +626,10 @@ df.icc.age$age_bin<-factor(df.icc.age$age_bin,levels=age_levels)
-Out of `r dim(df.icc.age)[1]` fitted models (53 metrics times `r length(levels(factor(df.icc.age$age_bin)))` age bins), `r sum(df.icc.age$formula=="no_chi_effect")` were singular when including a random intercept per child, and therefore they could not be included in these analyses at all. In addition, `r sum(df.icc.age$formula=="no_exp")` were singular when including a random intercept per corpus. The remaining `r sum(df.icc.age$formula=="full")` could be analyzed with the full model.
-```{r relBYage-fig6A, echo=F,fig.width=6, fig.height=10,fig.cap="Distribution of ICC attributed to corpus (a) and children (b), when binning children's age."}
 #this complicated section is just to add N of participants in each facet, we first estimate it:
+```{r reg model age}
+age_icc <- lm(icc_child_id ~ Type * data_set * age_bin, data=df.icc.age) 
+#binomial could be used,  diagnostic plots look good
+As we did in the previous section for corpus, we checked whether Child ICC differed by talker types and pipelines across age bins by fitting a linear model with the formula $lm(Child_ICC ~ type * pipeline * age_bin)$. We found an adjusted R-squared of `r round(reg_sum_age_icc$adj.r.squared*100)`%, suggesting this model explained about a third of the variance in Child ICC. However, a Type 3 ANOVA on this model revealed only an interaction of type and age bin, as well as a main effect of age bin, suggesting less complex effects than in the case of corpus. See below for more information.
+```{r print out anova results rec on icc by age}
