|
@@ -11,6 +11,7 @@
|
|
|
- [Matching classifications back to the metadata](#matching-classifications-back-to-the-metadata)
|
|
|
- [Importing classifications into the source dataset](#importing-classifications-into-the-source-dataset)
|
|
|
- [Comparing Zooniverse annotations with other annotations](#comparing-zooniverse-annotations-with-other-annotations)
|
|
|
+- [Going big](#going-big)
|
|
|
|
|
|
## Summary
|
|
|
|
|
@@ -218,4 +219,14 @@ The [compare](https://gin.g-node.org/LAAC-LSCP/zoo-campaign/src/master/annotatio
|
|
|
|
|
|
Which will output:
|
|
|
|
|
|
- ![Comparing the VTC and Zooniverse classifications](annotations/comparison.png)
|
|
|
+ ![Comparing the VTC and Zooniverse classifications](annotations/comparison.png)
|
|
|
+
|
|
|
+ ## Going big
|
|
|
+
|
|
|
+This example only contains around a hundred subjects extracted from a sole recording.
|
|
|
+Real-life projects usually involve much more data - typically tens of thousands of subjects.
|
|
|
+In order to go big, we advise you of the following.
|
|
|
+
|
|
|
+- Ask Zooniverse for increased subjects quota.
|
|
|
+- If you are using a version control system such as git/DataLad, you may not want to commit the audio chunks. This can be avoided with appropriate rules in a `.gitignore` file. Versioning too many files within one repository may cripple it and render operations much slower. Also, provided the metadata for the selected chunks and the original recordings are properly stored and backed-up, the audio chunks can be extracted again at any later time if necessary.
|
|
|
+- Some operations such as sampling or extracting chunks may be demanding for large datasets. We recommend performing this step on a cluster using several CPU cores. The ChildProject provides a `--threads` option for parallel processing.
|