Dataset: Draft hybrid genome assembly and annotation of Danionella translucida, a transparent fish with the smallest vertebrate brain. This repository contains data linked to Kadobianskyi et al., 2019.
http://doi.org/10.1038/s41597-019-0161-z

Jörg Henninger 1fc4ae1c59 Update 'README.md' 5 år sedan
FASTQC.zip 0e3dd9605e added quality reports 6 år sedan
LICENSE 159ba3968b LICENSE added 5 år sedan
README.md 1fc4ae1c59 Update 'README.md' 5 år sedan
datacite.yml bbf02af704 Add 'datacite.yml' 5 år sedan
dr.ensembl.exome.gff3 55c1bac3b6 added intron codes and data 5 år sedan
dt.all.maker.annotation.functional.pfam.gff ae696b2a5c added the main annotation file 6 år sedan
dt.all.maker.gff 26427cb55b added raw MAKER output file 6 år sedan
dt.all.maker.proteins.functional.fasta 4e1b293916 added transcript and protein files 6 år sedan
dt.all.maker.transcripts.functional.fasta 4e1b293916 added transcript and protein files 6 år sedan
dt.exome.gff3 55c1bac3b6 added intron codes and data 5 år sedan
final.dt.scf.pilon.fasta c6345b20a6 Upload files to '' 6 år sedan
igv_annotation.gff 93e6035bb3 added the annotation track for IGV 6 år sedan
npcdna2dt.sorted.bam 870e435167 added nanopore read mapping 6 år sedan
npcdna2dt.sorted.bam.bai 870e435167 added nanopore read mapping 6 år sedan
orthologs.tsv e9321c5674 added DT-DR ortholog table 6 år sedan
orthologs_introns.m cbd64e2997 fixes 5 år sedan
orthologs_nocopies.txt 55c1bac3b6 added intron codes and data 5 år sedan
rnaseq_3dpf.cov.tdf 313f575a8e added short-read RNA-seq coverage files 6 år sedan
rnaseq_adult.cov.tdf 313f575a8e added short-read RNA-seq coverage files 6 år sedan

README.md

kadobianskyi2019

Data associated with the manuscript: "Draft hybrid genome assembly and annotation of Danionella translucida, a transparent fish with the smallest vertebrate brain" by Kadobianskyi et al, 2019.

List of files in the dataset

Genome assembly

final.dt.scf.pilon.fasta – FASTA sequence file with the Danionella translucida (DT) assembly scaffolds.

Genome annotation

dt.all.maker.gff – GFF3 annotation file with the raw MAKER output after 3 cycles of gene prediction training and re-annotation with the combined short- and long-read RNA-seq and Ensembl proteome evidence. Contains all evidence mapping tracks, as well as repeat sequences, tRNAs and gene models.

dt.all.maker.annotation.functional.pfam.gff – GFF3 annotation file with the raw MAKER output, with added putative protein functions and PFAM domains.

igv_annotation.gff – Gene annotation track for viewing in the genomic browser software. Contains extracted 3/5'-UTRs and CDSs.

dt.all.maker.transcripts.functional.fasta – FASTA file with functionally annotated MAKER-derived transcript sequences.

dt.all.maker.proteins.functional.fasta – FASTA file with functionally annotated MAKER-derived protein sequences.

rnaseq_3dpf.cov.tdf and rnaseq_adult.cov.tdf – IGV viewer-compatible 25 bp sliding window RNA-seq coverage tracks file for 3 dpf and adult RNA-seq libraries, respectively.

npcdna2dt.sorted.bam and .bai – BAM alignment file with Nanopore RNA-seq mapping to the DT genome.

Data analysis

orthologs.tsv – Tab-separated values file with the CRB-BLAST generated ortholog proteins between zebrafish and DT.

orthologs_nocopies.txt – ortholog protein file with removed duplicate DR hits.

dr.ensembl.exome.gff3 and dt.exome.gff3 – GFF exon annotation files for DR and DT, respectively.

orthologs_introns.m – MATLAB code to generate intron size distribution plots.

Quality control

FASTQC.zip – ZIP archive with FastQC-generated quality reports for the libraries used in the assembly and annotation

How to view the assembly and annotation

-- Download the IGV viewer for your platform.

-- To load the genome into the viewer, go to "Genomes" –> "Load genome from File..." and load final.dt.scf.pilon.fasta

-- The coverage, mapping and annotation tracks can be loaded through "File" –> "Load from File..."

-- Every gene can be searched for using its identifier (DTNNNNN-RX), where NNNNN is a number and X is a transcript isoform (A, B, C...), or its putative function (e.g., dmnt1).

datacite.yml
Title Dataset: Draft hybrid genome assembly and annotation of Danionella translucida, a transparent fish with the smallest vertebrate brain. This repository contains data linked to Kadobianskyi et al., 2019.
Authors Kadobianskyi,Mykola;Charité Universitätsmedizin Berlin and Humboldt University, Einstein Center for Neuroscience, NeuroCure Cluster of Excellence, Berlin
Schulze,Lisanne;Charité Universitätsmedizin Berlin and Humboldt University, Einstein Center for Neuroscience, NeuroCure Cluster of Excellence, Berlin;https://orcid.org/0000-0001-7875-3107
Schuelke,Markus;Charité Universitätsmedizin Berlin and Humboldt University, Einstein Center for Neuroscience, NeuroCure Cluster of Excellence, Berlin
Judkewitz,Benjamin;Charité Universitätsmedizin Berlin and Humboldt University, Einstein Center for Neuroscience, NeuroCure Cluster of Excellence, Berlin;https://orcid.org/0000-0002-8570-3869
Description Original data resources published alongside Kadobianskyi et al, 2019. Detailed description of each file can be found in the README file.
License CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
References
Funding DFG, EXC 257 NeuroCure
EU, ERC-2016-StG-714560
Alfried Krupp von Bohlen und Halbach-Stiftung
Keywords Genome assembly
Genome annotation
Danionella translucida
Resource Type Dataset