Yes, we can use it: A formal test on the accuracy of low-pass nanopore long-read sequencing for mitophylogenomics + barcoding research using the Caribbean spiny lobster Panulirus argus

Whole mitogenomes or short fragments (e.g., 300-700 bp of the cox1 gene) are markers of choice for revealing within-and among-species Protocols for sequencing and assembling mitogenomes include 'primer walking' or PCR' followed by Sanger sequencing or low-coverage genome (LC- sequencing or The


INTRODUCTION
The aims of this study were threefold. First, I tested if a mitochondrial genome can be sequenced and assembled from long read nanopore sequencing data exclusively using both a de novo and a reference-based strategy. Second, I explored the quality (i.e., accuracy) of the long read-based assembled genomes by comparing them to a 'gold' standard mitochondrial genome retrieved from the same individual but generated using short read Illumina sequencing data. Sequence accuracy was explored for different long read assembly pipelines with multiple metrics including completeness, identity, and coverage. Furthermore, a detailed quantitative analysis of error type for long read assemblies was conducted. Third and lastly, I tested if the de novo and reference-based long read assemblies are useful for mitophylogenomic and barcoding research. I specifically assessed if long read assemblies do have phylogenomic information that permit to reliably identify the sequenced specimen and distinguish it from other closely and distantly related species in the same genus, family, and superorder.
To accomplish the goals above, I used as a model system the Caribbean spiny lobster Panulirus argus, an ecologically relevant species in shallow water coral reefs and target of the most lucrative fishery in the greater Caribbean region.

MATERIAL & METHODS
Field collections were approved by FWCC (SAL-11-1319-SR). A single adult specimen of P. argus was collected from Alligator Reef, FL, USA. A small piece of muscle was dissected from a dismembered pereopod, and immediately preserved in 95% ethanol. Next, total genomic DNA from ~15-25 mg muscle tissue was extracted using standard protocols. Approximately one microgram of gDNA extracted from each sample was used for Illumina and MiniON ONT library construction following standard protocols.
The pipeline Novoplasty was used for mitogenome assembly with short reads. For long reads, the mitochondrial genome of P. argus was assembled using the following workflow:
All long read pipelines, with and without 'extra' polishing with Medaka, also assembled and circularized the mitochondrial genome of P. argus.

CONCLUSIONS
In conclusion, using nanopore long read sequencing and various bioinformatics pipelines, this study assembled for the first time a complete and highly accurate mitochondrial genome for the Caribbean spiny lobster Panulirus argus. The assembled genomes were 'imperfect' but permitted to identify reliably the sequenced specimen as belonging to P. argus and differentiate the specimen from other closely and distantly related species in the same genus, family, and superorder. This study will facilitate the transferring of genomic technologies to low-income countries in the greater Caribbean with which to monitor mislabeling of this resource.

Accuracy of long read mitogenome assemblies
Alignment of the different long read assemblies to the reference genome revealed discordance between each of the long read assemblies and the reference assembly that was mostly due to indels at the flanks of homopolymer regions comprising all four nucleotide types.
The number of single nucleotide homopolymer deletions was by far the most common error detected in all long read assemblies followed by single nucleotide homopolymer insertions The main effect of polishing with Medaka, across de novo-and reference-based mitochondrial genomes, was a decrease in the number of homopolymer deletions (Fig. 2). Figure 2. Sequence errors per de novo (Fyer and Unicycler) and referencebased assemblers (Rebaler) with and without 'extra' polishing with the software Medaka for the Caribbean spiny lobster Panulirus argus mitochondrial genome. All long read assemblies were benchmarked against the Illumina short read assembly with a coverage of 720x.

Annotation of long read mitogenome assemblies
Annotation of long read assembled mitochondrial genomes, either de novo or reference-based with or without extra polishing with Medaka, indicated that gene number and synteny were identical to that of the reference genome of P. argus. Importantly, all but 1-2 of the genes did have at least one and usually more than one internal stop codon that interrupted the open reading frame. Although highly accurate, the errors in each long read assembled mitochondrial genome precluded generating a reliable in silico annotation (Fig. 3).

Mitophylogenomics using long read assemblies
In a maximum likelihood phylogenetic analysis using a short 500 pb fragment of the cox1 gene (1899 terminals retrieved from GenBank), long and short read assembled mitochogenomes clustered together into a single well supported monophyletic clade (bootstrap value [bv] = 100). Importantly, in the same analysis, this clade comprising long read assembled mitogenomes plus the short read reference assembly and a total of 340 other sequences belonging to P. argus obtained from Genbank clustered together into a strongly supported monophyletic clade [bv = 98] (Fig. 4). . Barcoding analysis of the order Achelata using a 500 bp fragment of the cox1 gene, including cox1 gene fragments retrieved from mitochondrial genomes of the Caribbean spiny lobster Panulirus argus assembled with long reads alone and short reads + 1 884 other species belonging to the order Achelata retrieved from Genbank (arrow indicates clade comprising long and short reads mitogenomes obtained during this study).