# Establishing the reliability of measures extracted from long-form recordings using LENA and the ACLEW pipeline

This repository contains the data and the code necessary to reproduce the results of the paper "Establishing the reliability of measures extracted from long-form recordings using LENA and the ACLEW pipeline".

# Structure

This repository is structured as follows :

* CODE: contains the code necessary to preprocess the and to replicate the analysis in the paper
* data_output: contains the data which is used for the analyses in the paper
* OUTPUT: contains the derived data (ICCs) computed by the script `all-analyses.R` using the data found in DATA
* input: contains the data sets which are used in this paper. Access to the data sets is only necessary to replicate the content of DATA (i.e. compilation of whole the metrics, children age, etc.). If you are not a LAAC member of trusted member, you cannot have access to this data. For more information, contact [Alejandrina Cristia](mailto:alecristia@gmail.com).

# Reproducing the paper analyses


Some readers may want to check our materials for reproducibility. To regenerate our supplementary materials, you will need [RStudio](https://www.rstudio.com/). For further information on using Rmd for transparent (knittable) analyses, see [Mike Frank & Chris Hartgerink's tutorial](https://libscie.github.io/rmarkdown-workshop/handout.html).


If you simply want to check the reproducibility of the paper analyses, you can download a zipped version from [our GIN repo](https://gin.g-node.org/laac-lscp/relival), by clicking on the button that looks like a downward pointing arrow, near the top right of the page (under Publications; see to-download-zip.jpg).

1. Unzip the downloaded zip folder.
2. Double click on the CODE folder, and on SM.Rmd to launch RStudio with the correct working directory (or if RStudio is already running, change working directory into the unzipped folder)
3. Click on the "knit" button near the top of the RStudio window
4. If anything fails, the most likely issue will be that you are missing a library. For most of the packages, you can install the package through the GUI menu or by typing in the commands section (near the bottom of the RStudio window) `install.package("LIBRARYNAME")` (replace LIBRARYNAME with the package that the system said was not found). If the package missing is papaja, please follow instructions [here](https://github.com/crsh/papaja).
Dependencies can be quickly installed by issuing the following command in Rstudio:

```R
list.of.packages <- c("lme4","performance","ggplot2","ggthemes","ggpubr","kableExtra","psych","dplyr","tidyr","stringr","car","ggbeeswarm")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
```

**If anything looks different**, please double-check that you are using the same versions of all packages by looking at the capture of the environment at the very end of the .pdf file

It is also possible to generate the supplementary materials from the command line (without opening Rstudio) in a single instruction:

```bash
Rscript -e 'library(rmarkdown); rmarkdown::render("CODE/SM.Rmd", "pdf_document", output_file = "SM.pdf")'
```

# Raw Data Access

Raw data access is NOT necessary for you to reproduce the supplementary analyses, and thus the numbers and figures in the manuscript. At present, the raw data is only accessible with additional ethics and security approval.

## Re-using the dataset

### Requirements

You will first need to install the [ChildProject](https://childproject.readthedocs.io/en/latest/) package for Python (optional) as well as DataLad. Instructions to install these packages can be found [here](https://childproject.readthedocs.io/en/latest/install.html).

### Configuring your SSH key on GIN

This step should only be done once:

0. Create an account on (GIN)[https://gin.g-node.org/] if you don't have one already

1. Copy your SSH public key to your clipboard (usually located in ~/.ssh/id_rsa.pub). If you don't have one, please create one following [these instructions](https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)
2. In your browser, go to [GIN > Your parameters > SSH keys](https://gin.g-node.org/user/settings/ssh)
3. Click on the blue "Add a key" button, then paste the content of your public key in the Content field, and submit

Your key should now appear in your list of SSH keys - you can add as many as necessary.

### Installing the dataset

The next step is to clone the dataset :

```bash
datalad install -r git@gin.g-node.org:/LAAC-LSCP/RELIVAL.git
cd RELIVAL
```

### Getting data

You can get data from a dataset using the `datalad get` command, e.g.:

```bash
datalad get CODE/* # download scripts
datalad get DATA/* # download data
```

Or:

```bash
datalad get . # get everything
```

You can download many files in parallel using the -J or --jobs parameters:

```bash
datalad get . -J 4 # get everything, with 4 parallel transfers
```

For more help with using DataLad, please refer to our [cheatsheet](https://childproject.readthedocs.io/en/latest/cheatsheet.html) or DataLad's own [cheatsheet](http://handbook.datalad.org/en/latest/basics/101-136-cheatsheet.html). If this is not enough, check DataLad's [documentation](http://docs.datalad.org/en/stable/) and [Handbook](http://handbook.datalad.org/en/latest/).

### Fetching updates

If you are notified of changes to the data, please retrieve them by issuing the following commands:

```bash
datalad update --merge
datalad get .
```

### Removing the data

It is important that you delete the data once your project is complete.
This can be done with `datalad remove`:

```bash
datalad remove -r path/to/your/dataset
```