kadobianskyi2019
Data associated with the manuscript: "Draft hybrid genome assembly and annotation of Danionella translucida, a transparent fish with the smallest vertebrate brain" by Kadobianskyi et al, 2019.
List of files in the dataset
Genome assembly
final.dt.scf.pilon.fasta – FASTA sequence file with the Danionella translucida (DT) assembly scaffolds.
Genome annotation
dt.all.maker.gff – GFF3 annotation file with the raw MAKER output after 3 cycles of gene prediction training and re-annotation with the combined short- and long-read RNA-seq and Ensembl proteome evidence. Contains all evidence mapping tracks, as well as repeat sequences, tRNAs and gene models.
dt.all.maker.annotation.functional.pfam.gff – GFF3 annotation file with the raw MAKER output, with added putative protein functions and PFAM domains.
igv_annotation.gff – Gene annotation track for viewing in the genomic browser software. Contains extracted 3/5'-UTRs and CDSs.
dt.all.maker.transcripts.functional.fasta – FASTA file with functionally annotated MAKER-derived transcript sequences.
dt.all.maker.proteins.functional.fasta – FASTA file with functionally annotated MAKER-derived protein sequences.
rnaseq_3dpf.cov.tdf and rnaseq_adult.cov.tdf – IGV viewer-compatible 25 bp sliding window RNA-seq coverage tracks file for 3 dpf and adult RNA-seq libraries, respectively.
npcdna2dt.sorted.bam and .bai – BAM alignment file with Nanopore RNA-seq mapping to the DT genome.
Data analysis
orthologs.tsv – Tab-separated values file with the CRB-BLAST generated ortholog proteins between zebrafish and DT.
orthologs_nocopies.txt – ortholog protein file with removed duplicate DR hits.
dr.ensembl.exome.gff3 and dt.exome.gff3 – GFF exon annotation files for DR and DT, respectively.
orthologs_introns.m – MATLAB code to generate intron size distribution plots.
Quality control
FASTQC.zip – ZIP archive with FastQC-generated quality reports for the libraries used in the assembly and annotation
How to view the assembly and annotation
-- Download the IGV viewer for your platform.
-- To load the genome into the viewer, go to "Genomes" –> "Load genome from File..." and load final.dt.scf.pilon.fasta
-- The coverage, mapping and annotation tracks can be loaded through "File" –> "Load from File..."
-- Every gene can be searched for using its identifier (DTNNNNN-RX), where NNNNN is a number and X is a transcript isoform (A, B, C...), or its putative function (e.g., dmnt1).