DDBJ Sequence Read Archive Handbook

    Created: April 3, 2014;  Last updated: July 27, 2017

    DDBJ Sequence Read Archive

    DDBJ Sequence Read Archive (DRA) is an archive database for output data generated by next-generation sequencing machines including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD® System, and others. DRA is a member of the International Nucleotide Sequence Database Collaboration (INSDC) and archiving the data in a close collaboration with NCBI Sequence Read Archive (SRA) and EBI Sequence Read Archive (ERA).

    Three INSDC partners regularly exchange data other than Analysis.

    DRA accepts sequencing data from capillary sequencing platforms in fastq/bam format. To submit sequencing chromatograms in addition to bases and qualities, please submit data to the DDBJ Trace Archive.

    Metadata

    Metadata objects

    The metadata describes how the associated data have been obtained. The metadata are composed of 6 objects, Submission, BioProject, BioSample, Experiment, Run and Analysis. Each of these objects is defined by its XML schema and is related each other. Multiple Experiments can "point" to a single Sample, but not vice-versa.

    Accession numbers with distinct prefixes are assigned to each object. Metadata and accession number system are common in DRA/ERA/SRA. The Experiment, Run and Analysis are the SRA objects, and the BioProject and BioSample are external database objects.

    For details, please see the DRA XML schema

    Submission

    A container object only for grouping objects to be submitted.

    BioProject

    An overall description of a single research initiative; a project will typically relate to multiple samples and datasets.

    BioSample

    Description of biological source material; each physically unique specimen should be registered as a single BioSample with a unique set of attributes.

    Experiment

    A description of sample-specific sequencing library and sequencing methods. An Experiment references 1 BioProject and 1 BioSample. Multiple Experiments can "point" to a single Sample, but not vice-versa.

    Run

    Runs describe the files that belong to the previously created Experiments. They specify the data files for a specific sample to be processed by DRA. Note that all data files listed in a Run will be merged into a single SRA archive file, so files from different samples or replicates should not be grouped in the same Run. Paired-end data files, conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end.

    Analysis

    Packages data associated with sequence read objects that are intended for downstream usage or that otherwise needs an archival home. Submit alignment data in bam file to Run. Please contact to DRA team to ask mirroring of analysis data. Analysis files are provided on the DDBJ ftp site and not indexed by the DRASearch.

    Data model

    Organization of metadata objects

    Followings are examples of metadata. Submitters can organize metadata objects flexibly.

    Most simple case

    Most simple case
    Most simple case

    Comparative genome sequencing of three strains (paired-end)

    Include paired-end read files in a Run.

    Comparative genome sequencing of three strains (paired-end)
    Comparative genome sequencing of three strains (paired-end)

    Technical and biological replicates

    Related FAQ: How many samples do I need for my DRA submission?

    Technical and biological replicates
    Technical and biological replicates

    Related sequencing data are reported in two publications.

    Related sequencing data are reported in two publications.
    Related sequencing data are reported in two publications.

    Items in each metadata object.

    Required*
    Conditionally required*

    Submission

    Center Name

    Enter submitter's organization.

    Center Name*

    A submitter's center name. Center Name List. A center name abbreviation is required to submit data to DRA.

    In the metadata creation tool, the center name is automatically filled with the account information.

    The Center Name is an abbreviation operationally used by SRA and is not for indicating ownership of submission. Submitters listed in Submitter hold ownership of submission.

    Lab Name*
    Laboratory name within submitting institution. The Lab name is pre-entered with "Lab/Group", "Department (2)", "Department (1)", "Organization" of D-way account. Text can be editted.

    Hold Until

    Specify how to release the data.

    Hold Until*
    Direct the DRA to release the record on or after the specified date.Submitter can set the hold date for a maximum of 2 years and can change the date before the record is released.
    Immediate Release*
    Direct the DRA to release the record immediately after submission is processed.

    Submitter

    The DRA contacts the listed address(es) regarding the submission by e-mail.Include contact information of PI and non-PI member(s) who actually submits data.The contact information is not made public. If you want to display the contact information, enter the information in the BioProject.

    Name*
    Name of submitter.
    E-mail*
    E-mail of submitter.

    BioProject

    BioProject ID*
    Select a project registered to BioProject or submit a new project. For submission to BioProject, please refer to the BioProject Handbook.

    BioSample

    BioSample ID*
    Select samples registered to BioSample or create and submit new samples. For submission to BioSample, please refer to BioSample Handbook.

    Experiment

    Alias
    Name of the experiment designated by the archive. This alias is used to reference metadata objects without accession numbers.
    BioSample Used*
    Select the BioSample this experiment uses.
    Title*
    Short text that can be used to call out experiment records in searches or in displays. A title like "[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]" (for example, "Illumina HiSeq 2000 paired end sequencing of SAMD00025741") is automatically constructed. To enter user-defined titles, download Experiment metadata into a tab-delimited text file, edit title values and upload it.
    Library Name
    The submitter's name for this library.
    Library Source*
    The Library Source specifies the type of source material that is being sequenced.
    Library Source Description
    GENOMIC Genomic DNA (includes PCR products from genomic DNA).
    TRANSCRIPTOMIC Transcription products or non genomic DNA (EST, cDNA, RT-PCR, screened libraries).
    METAGENOMIC Mixed material from metagenome.
    METATRANSCRIPTOMIC Transcription products from community targets.
    SYNTHETIC Synthetic DNA.
    VIRAL RNA Viral RNA.
    OTHER Other, unspecified, or unknown library source material.
    Library Selection*
    Whether any method was used to select and/or enrich the material being sequenced.
    Library Selection Description
    RANDOM Random shearing only.
    PCR Source material was selected by designed primers.
    RANDOM PCR Source material was selected by randomly generated primers.
    RT-PCR Source material was selected by reverse transcription PCR.
    HMPR Hypo-methylated partial restriction digest.
    MF Methyl Filtrated.
    repeat fractionation Selection for less repetitive (and more gene rich) sequence through Cot filtration (CF) or other fractionation techniques based on DNA kinetics.
    size fractionation Physical selection of size appropriate targets.
    MSLL Methylation Spanning Linking Library.
    cDNA complementary DNA.
    cDNA_randomPriming
    cDNA_oligo_dT
    PolyA PolyA selection or enrichment for messenger RNA (mRNA); should replace cDNA enumeration.
    Oligo-dT enrichment of messenger RNA (mRNA) by hybridization to Oligo-dT.
    Inverse rRNA depletion of ribosomal RNA by oligo hybridization.
    ChIP Chromatin immunoprecipitation.
    MNase Micrococcal Nuclease (MNase) digestion.
    DNAse Deoxyribonuclease (DNase) digestion.
    Hybrid Selection Selection by hybridization in array or solution.
    Reduced Representation Reproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling.
    Restriction Digest DNA fractionation using restriction enzymes.
    5-methylcytidine antibody Selection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C)MBD2 protein methyl-CpG binding domain : Enrichment by methyl-CpG binding domain.
    MBD2 protein methyl-CpG binding domain MBD2 protein methyl-CpG binding domain.
    CAGE Cap-analysis gene expression.
    RACE Rapid Amplification of cDNA Ends.
    MDA multiple displacement amplification.
    padlock probes capture method Padlock Probes capture strategy to be used in conjuction with Bisulfite-Seq.
    other Other library enrichment, screening, or selection process.
    unspecified Library enrichment, screening, or selection is not specified.
    Library Strategy*
    Sequencing technique intended for this library.
    Library Strategy Description
    WGS Whole genome shotgun.
    WGA Whole genome amplification.
    WXS Random sequencing of exonic regions selected from the genome.
    RNA-Seq Random sequencing of whole transcriptome.
    miRNA-Seq Micro RNA and other small non-coding RNA sequencing.
    ncRNA-Seq Capture of other non-coding RNA types, including post-translation modification types such as snRNA (small nuclear RNA) or snoRNA (small nucleolar RNA), or expression regulation types such as siRNA (small interfering RNA) or piRNA/piwi/RNA (piwi-interacting RNA).
    ssRNA-seq strand-specific RNA sequencing
    WCS Whole chromosome (or other replicon) shotgun.
    CLONE Genomic clone based (hierarchical) sequencing.
    POOLCLONE Shotgun of pooled clones (usually BACs and Fosmids).
    AMPLICON Sequencing of overlapping or distinct PCR or RT-PCR products.
    CLONEEND Clone end (5', 3', or both) sequencing.
    FINISHING Sequencing intended to finish (close) gaps in existing coverage.
    RAD-Seq Restriction Site Associated DNA Sequence
    ChIP-Seq Direct sequencing of chromatin immunoprecipitates.
    MNase-Seq Direct sequencing following MNase digestion.
    DNase-Hypersensitivity Sequencing of hypersensitive sites, or segments of open chromatin that are more readily cleaved by DNaseI.
    Bisulfite-Seq Sequencing following treatment of DNA with bisulfite to convert cytosine residues to uracil depending on methylation status.
    EST Single pass sequencing of cDNA templates.
    FL-cDNA Full-length sequencing of cDNA templates.
    CTS Concatenated Tag Sequencing.
    MRE-Seq Methylation-Sensitive Restriction Enzyme Sequencing strategy.
    MeDIP-Seq Methylated DNA Immunoprecipitation Sequencing strategy.
    MBD-Seq Direct sequencing of methylated fractions sequencing strategy.
    Tn-Seq Gene fitness determination through transposon seeding.
    FAIRE-seq Formaldehyde Assisted Isolation of Regulatory Elements
    SELEX Systematic Evolution of Ligands by EXponential enrichment
    RIP-Seq Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLIP).
    ChIA-PET Direct sequencing of proximity-ligated chromatin immunoprecipitates.
    Hi-C Chromosome Conformation Capture technique where a biotin-labeled nucleotide is incorporated at the ligation junction, enabling selective purification of chimeric DNA ligation junctions followed by deep sequencing
    ATAC-seq Assay for Transposase-Accessible Chromatin (ATAC) strategy is used to study genome-wide chromatin accessibility. alternative method to DNase-seq that uses an engineered Tn5 transposase to cleave DNA and to integrate primer DNA sequences into the cleaved genomic DNA
    Targeted-Capture
    Tethered Chromatin Conformation Capture
    Synthetic-Long-Read binning and barcoding of large DNA fragments to facilitate assembly of the fragment
    Other Library strategy not listed.
    Library Construction Protocol

    Free form text describing the protocol by which the sequencing library was constructed. Please include protocols of DNA fragmentation, ligation and enrichment. If a library preparation kit is used, include the name and version (if any) of the kit (for example, Illumina Nextera DNA Library Preparation Kit).

    Reference: Alnasir J, Shanahan HP. Investigation into the annotation of protocol sequencing steps in the sequence read archive. Gigascience. 2015 May 9;4:23. doi: 10.1186/s13742-015-0064-7. eCollection 2015. PMID: 25960871 (Open Access)

    Instrument*
    Select a sequencing instrument model.
    Instrument Model
    454 GS
    454 GS 20
    454 GS FLX
    454 GS FLX+
    454 GS FLX Titanium
    454 GS Junior
    Illumina Genome Analyzer
    Illumina Genome Analyzer II
    Illumina Genome Analyzer IIx
    Illumina HiSeq 1000
    Illumina HiSeq 1500
    Illumina HiSeq 2000
    Illumina HiSeq 2500
    Illumina HiSeq 3000
    Illumina HiSeq 4000
    Illumina MiSeq
    Illumina MiniSeq
    Illumina HiScanSQ
    HiSeq X Five
    HiSeq X Ten
    NextSeq 500
    NextSeq 550
    Helicos HeliScope
    AB SOLiD System
    AB SOLiD System 2.0
    AB SOLiD System 3.0
    AB SOLiD 3 Plus System
    AB SOLiD 4 System
    AB SOLiD 4hq System
    AB SOLiD PI System
    AB 5500 Genetic Analyzer
    AB 5500xl Genetic Analyzer
    AB 5500xl-W Genetic Analysis System
    Complete Genomics
    MinION
    GridION
    PromethION
    PacBio RS
    PacBio RS II
    Sequel
    Ion Torrent PGM
    Ion Torrent Proton
    Ion Torrent S5
    Ion Torrent S5 XL
    AB 310 Genetic Analyzer
    AB 3130 Genetic Analyzer
    AB 3130xL Genetic Analyzer
    AB 3500 Genetic Analyzer
    AB 3500xL Genetic Analyzer
    AB 3730 Genetic Analyzer
    AB 3730xL Genetic Analyzer
    Spot Type*
    Select a layout of reads in sequencing data files.
    Spot TypeDescription
    singleSingle read
    paired (FF)Paired reads with same direction.
    paired (FR)Paired reads with opposite direction.
    Nominal Length*
    Size of the insert for Paired reads.
    Nominal Sdev
    Standard deviation of insert size.
    Spot Length*

    The read length in submitted sequencing files. For mate pairs, this number includes mate pairs, but does not include gap lengths.

    • When the spot length is constant, enter a constant value.
    • For 454 platforms producing reads with variable length, enter a constant flow count.
    • For fastq files with variable length, enter an average length.

    Run

    Alias
    Name of the run designated by the archive. This alias is used to reference metadata objects without accession numbers.
    Title*
    Short text that can be used to call out run records in searches or in displays. A title like "[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]" (for example, "Illumina HiSeq 2000 paired end sequencing of SAMD00025741") is automatically constructed. To enter user-defined titles, download Run metadata into a tab-delimited text file, edit title values and upload it.
    Experiment Referenced*
    Select the experiment this run belongs to.

    Data files for Run

    Select data files for a Run.

    Run/Analysis
    Specify whether a data file belongs to the Run or Analysis. In the web submission form, this field is un-editable and is automatically filled according to the selected Run or Analysis. To upload metadata in tsv file, this field needs to be specified manually.
    File Name*
    The name of a sequence data file. Uploaded filenames are automatically filled in.
    Run/Analysis contains files*
    Select a Run to which the data file belongs.
    File Type*
    The sequence data file format. For the fastq files with variable read length, select 'generic_fastq'. For the fastq files with constant read length, select 'fastq'.

    File Type Description
    generic_fastq fastq files with variable read length
    fastq fastq files with constant read length
    sff 454 Standard Flowgram Format file
    hdf5 PacBio hdf5 Format file
    bam Binary SAM format for use by loaders that combine alignment and sequencing data
    tab A tab-delimited table maps "SN in SQ line of BAM header" and "reference fasta file"
    reference_fasta Reference sequence file in single fasta format used to construct SRA archive file format. Filename must end with ".fa"
    MD5 Checksum*
    MD5 checksum of a sequence data file. How to obtain the MD5 checksum values.

    Analysis

    Alias
    Name of the analysis designated by the archive.This alias is used to reference metadata objects without accession numbers.
    Title*
    Title of the analyis object.
    Description*
    Describes the contents of the analysis.
    Analysis Type*
    Select an Analysis type. Submit alignment data to Run in bam format.
    Analysis Type Description
    De Novo Assembly A placement of sequences including trace, SRA, GI records into a multiple alignment from which a consensus is computed..
    Sequence Annotation Per sequence annotation of named attributes and values.
    Example: Processed sequencing data for submission to dbEST without assembly.
    Reads have already been submitted to one of the sequence read archives in raw form.
    The fasta data submitted under this analysis object result from the following treatments, which may serve to filter reads from the raw dataset:
        - sequencing adapter removal
        - low quality trimming
        - poly-A tail removal
        - strand orientation
        - contaminant removal.
    Abundance Measurement Identify the tools and processing steps used to produce the abundance measurements (coverage tracks).

    Data files for Analysis

    Select data files for an Analysis.

    Run/Analysis
    Specify whether a data file belongs to the Run or Analysis. In the web submission form, this field is un-editable and is automatically filled according to the selected Run or Analysis. To upload metadata in tsv file, this field needs to be specified manually.
    File Name*
    The name of an analysis file.
    Run/Analysis contains files*
    Select an Analysis to which the data file belongs.
    File Type*
    The analysis data file format.
    File Type Description
    bam Binary form of the Sequence alignment/map format for read placements, from the SAM tools project.
    See http://sourceforge.net/projects/samtools/.
    tab A tab delimited text file that can be viewed as a spreadsheet. The first line should contain column headers..
    ace Multiple alignment file output from the phred assembler and similar programs.
    See http://www.phrap.org/consed/distributions/README.16.0.txt for a description of the ACE file format..
    fasta Sequence data format indicating sequence base calls.The format is simple: a header line initiated with the > character, data lines following with base calls..
    wig The wiggle (WIG) format allows display of continuous-valued data in track format.This display type is useful for GC percent, probability scores, and transcriptome data.
    See http://genome.ucsc.edu/goldenPath/help/wiggle.html for a description of the Wiggle Track format..
    BED BED format provides a flexible way to define the data lines that are displayed in an annotation track.
    See http://genome.ucsc.edu/FAQ/FAQformat#format1 for a description of the BED format..
    VCF Variant Call Format.
    See http://www.1000genomes.org/wiki/analysis/variant%20call%20format/vcf-variant-call-format-version-41 for a description of the VCF format.
    MAF Mutation Annotation Format
    GFF General Feature Format
    csv
    tsv
    MD5 Checksum*
    MD5 checksum of a run data file. How to obtain the MD5 checksum values.

    Run data files

    • The DRA does NOT accept fasta only datasets. The minimum submission level for SRA is base/color calls with quality scores.
    • Barcoded data files should be demultiplexed prior to submission and a unique BioSample should be created for each barcoded sample; in other words, each BioSample must be linked to one or more unique data files.
    • In case of fastq files, submit paired reads in separate files. For bam and sff files, paired reads need to be described in single file.
    • Upload data files directly under a submission directory. Submitted archive files should NOT contain any directory structure.
    • Binary data formats, including BAM, SFF and HDF5 should be submitted without applying any additional compression.

    Formats of sequencing data files

    The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). "To submit raw data contain technical reads" and "To use metadata elements in DRA XML schema the but not in the submission tool", submitters need to create metadata in XML files.

    Generic formats

    Format Platform Recommended
    BAM all platforms Yes
    fastq all platforms Yes

    Platform specific formats

    Format Platform Recommended
    SFF 454 and Ion Torrent Yes
    PacBio HDF PacBio Yes
    SOLiD csfasta/qual SOLiD No (please convert to fastq/bam)
    Illumina qseq and scarf Illumina No (please convert to fastq/bam)

    BAM file

    In the case of submitting alignment data, you need to submit "BAM", "INSDC, refseq accession number OR reference multi-fasta" and "SN-reference mapping table". Submit one bam file per Run.

    When submitting bam file into Analysis instead of Run, the mapping table is unnecessary. However, please consider to submit bam including unaligned reads as primary data into Run.

    When submitting unmapped bam (without SQ header line) from PacBio and IonTorrent, the mapping table and reference sequences are not necessary.

    If only BAM alignment files are submitted, then please make sure that the BAM files also contain the unaligned reads. This is critical to enable primary re-analysis and re-alignment of the dataset using new tools or future genome assembilies.

    mapping between bam and reference sequences
    mapping between bam and reference sequences

    BAM

    The alignment data can be submitted in the BAM format. The bam files should be readable by SAMtools and picard. The BAM files are nearly optimal in terms of compression and should be submitted uncompressed.

    Specify reference by INSDC/RefSeq accession number

    If references are found in list, references can be specified by their accession.version number (for example, NC_000001.11). Version number is necessary. Accession numbers for references can be searched in NCBI Assembly.

    Specify reference by supplying multi-fasta

    If references are not found in the list, submit a reference file in multi-fasta format. Select "reference_fasta" in the Run file type. The reference name in the bam header and reference sequence are linked by the name in bam header and fasta defline via the mapping table. If sequence length is different between @SQ-LN and multi-fasta, a warning is raised.

    Specify reference by both INSDC/RefSeq accession number and multi-fasta

    If a part of references are found in list, these references can be specified by their accession.version number (for example, NC_000001.11). The rest of references needs to be supplied by uploading a multi-fasta file. In the SN-reference mapping table, list accession.version numbers and sequence names of multi-fasta deflines.

    SN-reference mapping table

    A tab delimited text file describing mapping between "SN in SQ line in BAM header" and "accession OR sequence name in fasta file". Select "tab" in the Run file type

    BAM header
    @HD VN:1.0 GO:none SO:coordinate
    @SQ SN:chr1 LN:249698942
    @SQ SN:chr2 LN:242508799
    @SQ SN:chr3 LN:198450956
    ...
    
    SN-fasta mapping table. In the example, the reference named ref1 in multi-fasta file corresponds to the SN:chr1.
    chr1 ref1
    chr2 ref2
    chr3 ref3
    ...
    
    Reference multi-fasta.
    >ref1
    CGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTGAAACACAAAGTGTGGGGTGTCT
    ...
    >ref2
    TCCACCAACGTTAGAAGGCCTTGGCCCCCAGAGAGCCAATTTCACAATCCAGAAGTCCCC
    ...
    >ref3
    GTGTGTGACCAGGGAGGTCCCCGGCCCAGCTCCCATCCCAGAACCCAGCTCACCTACCTT
    ...
    
    SN-fasta mapping table. In the example, the reference "NC_000001.11" corresponds to the SN:chr1.
    chr1 NC_000001.11
    chr2 NC_000002.12
    chr3 NC_000003.12
    ...
    

    fastq

    Run filetype needs to be specified depending on whether read length is constant or not.

    • For the fastq files with constact read length, select 'fastq' in the file type of Run. Paired reads should appear in the same order in the paired files.
    • For the fastq files with variable read length, select 'generic_fastq' in the file type of Run.

    Format of fastq, for details, please see NCBI website.

    • Quality values must be in Phred scale. By default, 33 (!) is used for Phred quality offset. In the case of 64 (@), update the ascii_offset of Run XML to 'ascii_offset="@"'.
    • No technical reads (adapters, linkers, barcodes) are allowed.
    • Paired reads must split and submitted using two Fastq files. The read names must have a suffix identifying the first and second read from the pair, for example '/1' and '/2'.
    • The first line for each read must start with '@'.
    • The base calls and quality scores must be separated by a line starting with '+'.
    • The Fastq files must be compressed using gzip or bzip2.

    454

    The DRA accepts sequencing run data from the 454 platform in the sff and fastq/bam format. These files should reflect the sequencing run setup. If a sff file contains data derived from more than one sample, please break up resulting fastq file into files contain data from only one sample.

    The read names found in the .sff file are meaningful and reflect the addressing scheme for the picotitre plate as well as a globally unique run id. Please do not rewrite this name in the sff as such addressing information will be lost. The sff file format is nearly optimal in terms of footprint, so there is little to be gained by further compressing them. Therefore, please provide .sff files uncompressed.

    Illumina Genome Analyzer

    Illumina pipeline v1.4 and later

    DRA does not accept qseq files. Please convert qseq to fastq/bam.

    SOLiD

    SOLiD Native Format

    DRA does not accept SOLiD native files. Please convert the native files to fastq/bam.

    Ion Torrent

    Submit Ion Torrent data in the sff or fastq/bam format.

    Helicos Heliscope

    Submit Helicos data in the sms(helicos_native) or fastq/bam format created with the fixed-quality value, "14".

    Complete Genomics

    Submit Complete Genomics data in the fastq/bam format.

    Pacific Biosciences

    Pacific BioSystems uses HDF5, a container file with a directory-like structure, to store raw data. The DRA accepts both bas.h5 and bax.h5 file submissions. Note that submission of data from the RS II instrument requires one Run consists of one *.bas.h5 file and three *.bax.h5 files. Do not rename files.

    Do NOT include files other than HDF5 in a Run.

    HDF5 file contains information not stored in resulting SRA/fastq archive file. Because HDF5 file is valuable for re-use, DRA adds submitted HDF5 file as Analysis and provide them in addition to Run.

    The DRA also accepts Pacific Biosciences data in the fastq/bam format. Because the read length varies, select the "generic_fastq" for the Run filetype.

    Oxford Nanopore

    Submit Oxford Nanopore data in the fastq/bam format. format.

    Capillary sequencing platform

    Submit capillary sequencing data in the fastq/bam format. format.

    Analysis data files

    PacBio Base Modification Files

    PacBio sequence data also permits the analysis of methylated bases within the sequence, which can be extremely helpful to the scientific community. For example, the precise positions of those modified bases can be used to determine the specificity of the DNA methyltransferases that produced them. The PacBio analysis suite contains an analysis workflow (RS_Modification_and_Motif_Analysis) to extract these sequences and produce several files:

    • motif_summary.csv
    • modifications.csv
    • modifications.gff
    • motifs.gff

    It would be beneficial to the scientific community if you were able to perform this analysis and submit at least the motif_summary.csv file for prokaryotes via as a DRA Analysis object. Please submit these files as data files of the Analysis with Sequence Annotation type in addition to sequencing reads in Run. For assistance, contact us.

    NCBI guidelines of PacBio Base Modification Files

    Submission to the DRA

    Never submit data without the permission of the principal investigator.
    Submission of research data from human subjects
    For submitting data from human subjects (human data) to the databases of DDBJ center, it is submitter's responsibility to ensure that the dignity and right of human subject are protected in accordance with all applicable laws, ordinances, guidelines and policies of submitter's institution. In principle, make sure to remove any direct personal identifiers of human subjects from your data to be submitted. Before submitting human data, read the "Submission of research data from human subjects".
    Submission of Patent Related Sequences
    Please read "Submission of Patent Related Sequences" and "Patent Priority and Other Priority" before submitting patent related sequences.

    Metadata and sequence data are required for submission to the DRA.

    Please submit the assembled sequence data to the DDBJ. The DDBJ Mass Submission System (MSS) accepts the genomic or abundant sequence data generated by massively parallel sequencing platforms.

    Data submission to DRA

    1. Obtain a submission account

    2. Create a DRA submission and upload data files

    • Create a new DRA submission (Add DRA submission functionality to your account)
      All sequencing data in single submission will be released at the same time.
    • Upload data files by scp before submitting BioProject, BioSample, Experiment and Run

    3. Submit project and sample information

    BioProject

    • A description of the reseach effort
    • "Why" you sequenced your samples

    BioSample

    • A description of biologically or physically unique specimens
    • "What" you sequenced

    metadata can be submitted as a tab-delimited text file

    4. Submit Experiment and Run

    DRA Experiment

    • A description of a sample-specific sequencing library
    • "How" you performed the sequencing
    • Multiple Experiments “point” to a single Sample, but not vice-versa.

    DRA Run

    • Validate data files after submitting Experiment and Run
    • All files linked to a Run are “merged” into a single SRA file format

    5. Validate sequencing data files

    • Start to convert sequencing data files into a SRA file for archiving.
    • Submission passed validation step will be reviewed and accessioned.

    How to submit data to the DRA

    Submission to BioProject, BioSample and DRA

    Submission Account

    At the DNA Data Bank of Japan (DDBJ) center, BioProject, BioSample, and DRA submissions are managed in user's account.

    According to the Submission Account Handbook, obtain a submission account and enable DRA submission in the account.

    Organize data

    Examples of metadata object organization is here. In single submission, only one BioProject can be registered. Multiple BioSample, Experiment, Run objects can be registered. To easily organize your data into a submission, please first consider number of BioSamples.

    In this chapter, submission steps are explained by submitting a example submission "paired-end genome sequencing of three bacterial strains".

    Genome sequencing data of three bacterial strains

    Create a new submission

    Login the D-way (https://trace.ddbj.nig.ac.jp/D-way) and the top page is displayed. Move to the DRA submission site from the “DRA” menu at the top.

    Create a new submission by clicking the [New submission]. At this time, in the DRA file server (dradata.ddbj.nig.ac.jp), the corresponding subdirectory is created under the submitter’s home directory. Upload sequence data files to this subdirectory.

    All data in a submission are released at the same time. If you want to release data at different time, please divide a submission.

    Create a new submission

    List of submission status is as follows. The DRA team reviews submission whose status is in "submission_validated" or "data_error".

    List of submission status
    Status Explanation
    New Metadata has not been submitted.
    metadata_submitted Metadata has been submitted.
    data_validating Validating data files.
    data_error Error occurred in data validation process.
    submission_validated Metadata and data have been validated.
    completed Accession numbers have been issued.
    confidential Archive files has been created and submission is kept private
    Public Released to public.

    Upload sequence data

    Sequence data files need to be uploaded before creating metadata. To create metadata first, upload some files.

    Transfer sequence data by using terminal (Linux/Mac OS X)

    Transfer the files by executing,

    $ scp <Your Files> <D-way Login ID>@dradata.ddbj.nig.ac.jp:~/<DRA Submission ID>
    • <Your Files> Files to be transferred. Ex: file1 file2 (file1 and file2), file* (all files whose filenames start with “file”)
    • <D-way Login ID> D-way Login ID (ex. test07)
    • <DRA Submission ID> DRA Submission ID (ex. test07-0018)
    • command example: scp strainA_1.fastq test07@dradata.ddbj.nig.ac.jp:~/test07-0018

    Enter the passphrase set for the keys.

    Enter passphrase for key '/home/you/.ssh/id_rsa':

    You can directly handle the transferred files by logging in the server. SSH login the server by executing,

    $ ssh <D-way Login ID>@dradata.ddbj.nig.ac.jp

    Enter the passphrase set for the keys.

    Enter passphrase for key '/home/you/.ssh/id_rsa':

    After logging in successfully, the following prompt is displayed.

    [test07@dradata ~]$

    The login environment is private for the submitter. Users other than the submitter cannot access the data. Executable commands are restricted to the following ones. Users can delete unnecessary files.

    ls cd cp mv rm more mkdir tar gzip gunzip bzip2 bunzip2 zip unzip

    Transfer sequence data by using WinSCP (Windows)

    Submission to DRA ~upload data files (Windows)~

    Install and run the “WinSCP” (http://winscp.net/eng/download.php) .

    Set items as below and click the [Advanced...] button.

    Be sure to select the "binary mode" for file transfer. Do NOT select the "text mode".

    • File protocol: SFTP
    • Host name: dradata.ddbj.nig.ac.jp
    • Port number: 22
    • User name: (D-way Login ID)
    • Password: (Leave empty)
    Generate private key 1

    Please select the private key, which you created beforehand, from "Private key file" in "Authentication".

    Generate private key 2

    Last, click the [Login] button in the lower center

    Login to the WinSCP

    At the first time of login, a warning message is displayed; however, please select “Yes” (this message will not be displayed again). Next, enter the passphrase set for the keys.

    After login successfully, a folder of your PC is displayed at left, and your private directory in the server is displayed at right. Select the files at the left window and drag & drop them into the right window to transfer the files to the server.

    Transfer files by using the WinSCP

    You can delete the transferred files by selecting the files and clicking the [Delete] button.

    Transfer sequence data by using Cyberduck (Mac OS X)

    Submission to DRA ~upload data files (Mac) ~

    Download and install the Cyberduck (http://cyberduck.ch).

    Run the Cyberduck and click the [Open Connection] button in the Cyberduck menu.

    Open connection by using the WinSCP

    Select “SFTP (SSH File Transfer Protocol)” .

    SFTP in the WinSCP

    Set as follows and tick off “Use Public Key Authentication” in the More Options.

    • Server: dradata.ddbj.nig.ac.jp
    • Port: 22
    • Username: (D-way Login ID)
    • Password: (Leave empty)
    • Add to Keychain: (Check)
    Key authentication in Cyberduck

    By default, the private key is created in “User’s home folder > .ssh folder (invisible in Finder) > id_rsa”.

    Private key in Mac OS X

    At the first time of login, a warning message is displayed; however, please select “Always” (this message will not be displayed again).

    After login successfully, your private directory in the server is displayed in the window. Select the files in your PC and drag & drop them into the window to transfer the files to the server.

    Transfer files by using Cyberduck

    Users can ssh login dradata.ddbj.nig.ac.jp server by using a private key. Executable commands are restricted to the following ones. Users can delete unnecessary files.
    ls cd cp mv rm more mkdir tar gzip gunzip bzip2 bunzip2 zip unzip

    When sending submission files too large for e-mail attachment, submitters can upload the files for the DDBJ Mass Submission System (MSS) by using the DRA file server. After contacting the MSS team, upload the files to the /submission/[submitter ID]/mass directory.

    Create metadata by using the tool

    Move to the submission detail page by clicking the submission ID.

    Move to the submission page

    Click the [Enter / Update metadata] button to run the DRA metadata creation tool.

    run the DRA metadata creation tool

    When no file is uploaded to the submission directory, following message is displayed. To submit metadata, please upload data files.

    To submit metadata first, upload some files (for example, empty text file).

    when no data file is uploaded

    The metadata are composed of the Submission, BioProject, BioSample, Experiment, Run, Analysis (optional) objects. In the metadata creation tool, enter content from left to right tabs.

    Required items are marked with *.

    The entered content is checked when submitters click the [Save] button or before moving to the other tab. When error messages are displayed, please revise the content.

    Submission

    Set the hold date within two years. Include principal investigator(s) and submitter(s) who actually submit data in the Submitter. The DRA dose not disclose the submitter information to public.

    All data in a submission are released at the same time. If you want to release data at different time, please divide a submission.

    Enter metadata in the tool

    Study

    Submit a new project by clicking [New submission], or select a project registered in the account.

    Only one project can be submitted. To reference a project obtained in the other account, please contact DRA team.

    Submit a new BioProject or select submitted one

    To submit a BioProject, enter content from left to right tabs. The second panel is for BioProject submission. Submitter information is copied with that of DRA submission.

    For BioProject metadata, please see the BioProject Handbook.

    BioProject submission

    To submit genome assemblies to DDBJ, a unique Locus tag prefix is necessary.

    Locus tag prefix generation box will appear when [Project data type="Genome Sequencing" or "Metagenome"] AND [Capture="Whole"] AND [Objective="Sequence" or "Annotation" or "Assembly"]. Registration of a unique locus tag prefix is required for studies that result in genome assemblies.

    The locus_tag prefix can contain only alpha-numeric characters and it must be at least 3 characters long. It should start with a letter, but numerals can be in the 2nd position or later in the string. (ex. A1C). There should be no symbols, such as -_* in the prefix. The locus_tag prefix is to be separated from the tag value by an underscore ‘_’, eg A1C_00001.

    Please leave the prefix box empty, when a prefix is not necessary for WGS only submission.

    Prefix is managed by NCBI. When a project is submitted, our system tries to reserve prefix to NCBI. When the prefix has already been reserved, an error message will be displayed. Please enter a different prefix and submit again.

    When multiple prefixes are necessary, please contact us.

    Reserve locus tag prefix

    Check the content in "OVERVIEW" and submit a project by clicking [Submit BioProject].

    Submit BioProject

    After submitting a project, submitted one is selected in Study.

    Submitted project is selected

    Sample

    Submit new samples by clicking [New submission], or select samples submitted in the account.

    To select a range of samples, first check a checkbox and click next box with pressing the "Shift". Filter samples by entering text in the upper box, and click [Select filtered BioSamples] to select all filtered samples.

    To reference samples obtained in the other account, please contact us.

    Submit new samples or select submitted ones

    To submit a BioSample, enter content from left to right tabs. The second panel is for BioSample submission. Submitter information is copied with that of DRA submission.

    Biological and technical replicates are represented by separate BioSamples. Regarding necessary number of sample for sequence submission, please see the "FAQ:How many samples do I need for my DRA submission?"

    For BioSample metadata, please see the BioSample Handbook.

    BioSample submission

    Select a sample type in the "SAMPLE TYPE". For genome samples, minimum sample attributes are defined by MIxS.

    For the Sample type, please see the BioSample Handbook.

    Select a sample type

    Download a template text file according to the selected sample type to enter sample attributes.

    A main sample submission step is to describe samples by required, optional and user-defined attributes.

    Download a text file for entering sample attributes

    A text file is separated by tab and can be opened and editted in spreadsheet editor (e.g. Excel®). Attribute names are in a header line. Attributes with "*" are required.

    From second lines, enter one sample per line. Enter PSUB submission id in bioproject_id for project without PRJD accession numbers. For attributes without measured values, enter "missing" or "not applicable".

    BioSample attribute list. User-defined attributes can be added at rightmost column.

    Enter sample attributes by using spreadsheet software

    Check content in the last "OVERVIEW" and submit samples. In the "ATTRIBUTES" area, the submitted sample attribute file can be downloaded.

    Submit BioSample

    After submitting BioSamples, submitted BioSamples are selected in the "Sample" tab.

    Submitted BioSamples are selected

    Experiment

    Experiment and Run as same as selected BioSamples are automatically created. Each BioSample, Experiment and Run are referenced. The Experiment and Run are automatically generated when the Experiment tab is initially displayed.

    BioProject - BioSample (1) - Experiment (1) - Run (1)
    - BioSample (2) - Experiment (2) - Run (2)
    - BioSample (3) - Experiment (3) - Run (3)

    In this example, 3 Experiments are created and each Experiment reference unique BioSample.

    Add an Experiment by clicking the [Add new Experiment(s)] and delete an Experiment by clicking the [Delete]. Experiment referenced by Run cannot be deleted.

    Experiment referencing selected BioSample, is automatically created

    Experiments can be submitted in a tab-delimited text file. First save and fix Aliases (e.g., test07-0040_Experiment_0001 - 0003) by clicking the [Save]. Alias is used as a name until accession numbers are issued.

    Download content into a tab-delimited text file by clicking the [Download TSV file].

    Save, fix aliases and download as a tab-delimited text file

    Metadata can be editted in spreadsheet software (e.g. Excel®).

    If "Title" values are empty, titles are automatically constructed as "[Sequencing Instrument Model] [paired end] sequencing of [BioSample ID]" (e.g., "Illumina HiSeq 2000 paired end sequencing of SAMD00025741"). Submitters can provide user-defined text in the "Title".

    Reference samples in "BioSample Used" by "SSUB BioSample Submission ID" : "Sample name" (example, SSUB003746 : Genome bacteria strain A). Spaces around ":" are ignored.

    Experiment template file

    Save editted content in a tab-delimited text file and select and upload it by clicking the [Upload TSV file].

    Upload Experiment in a tab-delimited text file

    Upload in tab-delimited text file and NOT in spreadsheet software specific format.

    Run

    Experiment and Run as same as selected BioSamples are automatically created. Each Run references unique Experiment.

    In this example, three Runs are created and each Run references unique Experiment.

    Add Run by clicking the [Add another Run(s)] and delete Run by clicking the [Delete]. Run linked to files cannot be deleted.

    Save and fix Aliases

    After fixing aliases by clicking the [Save], run content can be downloaded into a tab-delimited text file. To distinguish the data files for Run, enter "Run" in the leftmost "Run/Analysis" column.

    Click the [Select data files for Run] and link uploaded files to Run.

    Move to next site to link files to Run

    All files uploaded to the submission directory are shown. Associate a file to a Run by selecting a Run alias in "Run/Analysis contains files".

    Enter File type and MD5 Checksum for files. File attributes can be entered by uploading a tab-delimited text file.

    Note that all data files listed in a Run will be merged into a single SRA archive file, so files from different samples or replicates should not be grouped in the same Run. Paired-end data files, conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end.

    For fastq with variable read length, select "generic_fastq" for filetype.

    Enter file attributes and link files to Run

    When an Analysis (optional) is unnecessary, submit metadata by clicking the [Submit/Update DRA metadata].

    Submit DRA metadata

    After submitting DRA metadata, start validation of data files. Click the link "Validate uploaded data files to finish this submission".

    Go to data validation after submitting metadata

    Analysis (optional)

    Create Analysis as many as required, enter content of each Analysis. Unnecessary Analysis can be deleted by clicking the [Delete].

    Click the [Select data files for Analysis] and link files to Analysis.

    Enter Analysis content

    Enter file attributes and associate them with Analysis. When submitting the file attributes by uploading the tab-delimited text file, to distinguish the data files for Analysis, enter "Analysis" in the leftmost "Run/Analysis" column.

    Enter file attributes and link files to Analysis

    Submit DRA metadata by clicking the [Submit/Update DRA metadata] and proceed to data validation process. Only md5 of analysis files are checked during validation.

    Create metadata in XML files

    The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). "To submit raw data contain technical reads" and "To use metadata elements in the DRA XML schema but not in the submission tool", submitters need to create or edit metadata in XML files.

    • Create a new DRA submission.

    • Prepare the Submission, Experiment, Run and Analysis (optional) XML files.

    • Un-accessioned BioProject and BioSample can be referenced in Experiment XML as follows.

      <STUDY_REF>
        <IDENTIFIERS>
          <PRIMARY_ID label="BioProject Submission ID">PSUB004220</PRIMARY_ID>
        </IDENTIFIERS>
      </STUDY_REF>
      
      <SAMPLE_DESCRIPTOR>
        <IDENTIFIERS>
          <PRIMARY_ID label="BioSample Submission ID">SSUB003742 : sample name</PRIMARY_ID>
        </IDENTIFIERS>
      </SAMPLE_DESCRIPTOR>
      

    • Validate XML files against xsd by following Unix commands. You cannot upload XML with any errors.

      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.submission.xsd?view=co test07-0018.Submission.xml
      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.experiment.xsd?view=co test07-0018.Experiment.xml
      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.run.xsd?view=co test07-0018.Run.xml
      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.analysis.xsd?view=co test07-0018.Analysis.xml
      

    • Upload validated XML files. Select the Submission, Experiment, Run and Analysis (optional) XML files and upload them at once.

      Uploaded XML files are validated against SRA schema and relationship between XML objects are checked. If errors are displayed, modify and re-upload the XML files.

    • Upload modified XML files

    Edit metadata in XML files

    The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). "To submit raw data contain technical reads" and "To use metadata elements in the DRA XML schema but not in the submission tool", submitters need to create or edit metadata in XML files.

    • Create and submit metadata by using the web-based tool.

    • Download the Submission, Experiment, Run and Analysis (optional) XML files of the submission with status "metadata_submitted".

    • Create metadata by using the submission tool and download them in XML files.
    • Edit the downloaded XML files. For how to describe technical reads, please see the example page. For available metadata elements, please see the explanation in DRA XML schema.

    • Un-accessioned BioProject and BioSample can be referenced in Experiment XML as follows.

      <STUDY_REF>
        <IDENTIFIERS>
          <PRIMARY_ID label="BioProject Submission ID">PSUB004220</PRIMARY_ID>
        </IDENTIFIERS>
      </STUDY_REF>
      
      <SAMPLE_DESCRIPTOR>
        <IDENTIFIERS>
          <PRIMARY_ID label="BioSample Submission ID">SSUB003742 : sample name</PRIMARY_ID>
        </IDENTIFIERS>
      </SAMPLE_DESCRIPTOR>
      

    • Validate XML files against xsd by following Unix commands. You cannot upload XML with any errors.

      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.submission.xsd?view=co test07-0018.Submission.xml
      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.experiment.xsd?view=co test07-0018.Experiment.xml
      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.run.xsd?view=co test07-0018.Run.xml
      xmllint --schema http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA/SRA.analysis.xsd?view=co test07-0018.Analysis.xml
      

    • Upload modified XML files. Select the Submission, Experiment, Run and Analysis (optional) XML files and upload them at once.

      Uploaded XML files are validated against SRA schema and relationship between XML objects are checked. If errors are displayed, modify and re-upload the XML files.

    • Upload modified XML files

    Validation of data files

    Submitted data files are converted to the SRA files for archiving. During this conversion process, MD5 value, file format and integrity between files and metadata are validated.

    In the “Data Files”, filenames in the Run and Analysis, MD5 values in the Run and Analysis and those of uploaded files, are displayed.

    Click the [Validate data files] and validate uploaded data files.

    Start validationo of data files

    The files are validated in the following order.

    FAQ: How to deal with validation errors?

    MD5 Check

    Consistency between the MD5 values in the metadata and of uploaded files are checked. Inconsistency in the MD5 values cause errors. When MD5 errors occur, revise metadata and re-upload files.

    Data Check

    Submitted data files are converted to the SRA files for archiving. During this conversion process, MD5 value, file format and integrity between files and metadata are validated. When errors occur, revise metadata and re-upload files. Validation of large files takes time.

    If no errors occur, submission status become "submission_validated", and validated files are moved to separate directory.

    The DRA staff review submissions with status "submission_validated". Please do not touch submissions until the DRA staff contact submitters.

    Revise a submission with "data_error"

    Any errors in the validation process make the submission status to "data_error". Revise metadata and/or re-upload data files after stopping the validation by clicking the [Stop validation] button. After revision, click the [Validate data files] button and start validation again.

    FAQ: How to deal with validation errors?

    Stop validation

    Submission status is backed to "metadata_submitted". Revise and re-submit metadata or re-upload data files.

    Revise submission

    Accession numbers

    When both the metadata and sequence data are validated (Status “submission_validated”), accession numbers with the prefix DR (Submission (DRA),Experiment (DRX),Run (DRR),Analysis (DRZ)) are assigned ("acc_issued", "complete" or "private"). Accession numbers are displayed in the “Component”.

    Limited-time access to archived fastq/SRA files

    To allow submitter to download and check archived fastq/SRA files, the files are copied to the following directories on the dradata.ddbj.nig.ac.jp server. To save disk space, the copied files are automatically deleted in one month.

    Due to unexpected decrease of available disk space, copied fastq/SRA files may be deleted within one month or the copy service may be suspended. We will inform submitters on the website in advance as much as possible, however, this annoucement could be immediately before the deletion or service suspension.

    • (submitter's home)/report/dra/(DRA submission accession)/fastq/
    • (submitter's home)/report/dra/(DRA submission accession)/sra/

    • submitter/report/dra/DRA000001/fastq/DRR000001.fastq.bz2
    • submitter/report/dra/DRA000001/fastq/DRR000002.fastq.bz2
    • submitter/report/dra/DRA000001/fastq/DRR000002_1.fastq.bz2
    • submitter/report/dra/DRA000001/fastq/DRR000002_2.fastq.bz2
    • submitter/report/dra/DRA000001/sra/DRR000001.sra
    • submitter/report/dra/DRA000001/sra/DRR000002.sra

    Data release

    After the registered data is loaded into the database, the Status becomes “complete (private)” and the submission is kept private until one of the following conditions are met.

    All data in a submission are released at the same time. If you want to release data at different time, please divide a submission.

    1. Submitter requests to release their data.
    2. Submitter has published their accession number(s) and it has been confirmed.
      We do not release the data when its accession number(s) has been published wrongly by other than the submitter.
      "publish" means to disclose accession number(s) to the public through paper, thesis, academic meeting, internet, press report etc.
    3. Specified hold-date has come.
    4. DDBJ/EMBL-Bank/GenBank records (e.g., TSA, WGS, CON etc.) citing DRA Run (DRR) accession number(s) have been made public.

    Data are released without permission from submitters in the cases B, C and D. In the case D, an entire DRA submission contains cited DRR Run(s) is made public.

    FAQ: How are linked BioProject/BioSample/sequence data released?

    When the data is released, in a few days, the released data will become searchable at DRASearch and the data will be mirrored to the NCBI SRA.

    The list of available fastq files at the DRA file server: fastqlist

    Update submission

    Update in each database

    Change hold date

    You can set the hold date for a maximum of 2 years and can change it. To change the hold date, click the [Change] button in the Hold Date and move to hold date change page.

    Change the hold date

    To immediately release the submission, click the [Release Now]. In the middle of the night, the submission is released, data files will be made available at ftp and metadata will be indexed by the DRA search system in a few days.

    Update metadata

    Update metadata by clicking the [Enter / Update metadata] button. A part of fields are blocked from editing. After editing your metadata, please be sure to click the [Submit/Update DRA metadata] button and reflect the updates to the DRA server.

    Add data files

    Data files cannot be directly added to the archived Run. In another DRA submission, create new Experiment-Run objects referencing existing BioProject and BioSample records to add data files.

    Similar to Run, data files cannot be directly added to the archived Analysis. To replace archived Analysis, please contact to the DRA team.

    Login D-way and create a new submission by clicking the [New submission]. Select the BioProject and BioSample IDs to which data to be added. Next, add the DRA Experiment and Run objects.

    • To add a new sample, share a BioProject ID and create a BioSample - Experiment - Run in a new DRA submission.
    • To add data files to existing sample, share BioProject and BioSample IDs and create an Experiment - Run in a new DRA submission.

    Submit metadata and validate the appended data files. Accession numbers will be issued to the appended Experiment/Run objects.

    The BioProject ID remains same, but different DRA submission number is assigned.

    Add data files
    Add data files to existing sample

    To add data files to the existing DRA submission, please contact us.

    Withdraw archived objects

    To withdrawing archived Experiment, Run and Analysis objects, please contact us.

    Supplement: MD5

    MD5 (Message Digest Algorithm 5) is a hash function which calculates a hash value (MD5 number, 32-digit numbers and letters) of a given file. Because the MD5 number of the damaged file is distinct from the original one, we can check whether the transferred file is intact or not by comparing the numbers before and after the file transfer.

    Obtain MD5 number (Linux)

    Obtain the MD5 numbers of the files by executing,

    $ md5sum file1 file2
    9F6E6800CFAE7749EB6C486619254B9C file1
    B636E0063E29709B6082F324C76D0911 file2
    

    Obtain MD5 number (Mac OS X)

    Obtain the MD5 numbers of the files by executing,

    $ md5 file1 file2
    9F6E6800CFAE7749EB6C486619254B9C file1
    B636E0063E29709B6082F324C76D0911 file2
    

    Obtain MD5 number (Windows)

    Install and run the Fsum Frontend (http://sourceforge.net/projects/fsumfe/) .
    At first, tick off "md5".

    Generate md5 in the tool 1

    After clicking the [+] button, open the sequence data files that you need. You can select multiple files at the same time.

    Generate md5 in the tool 2

    Click the [Calculate hashes] button. The MD5 numbers of the files are displayed.
    By clicking the [Export] button, you can obtain the list of the MD5 numbers as a html, a csv, or a xml file.

    Generate md5 in the tool 3