FAQ: 30

Are there any required variables/phenotypes that need to be included?

In the JGA submission, fields including the Subject ID and Gender are required. Specifically, that the main variable (e.g., heart disease) and co-variates (e.g., age, weight) used in the analysis are submitted to JGA so that other people can reproduce the information in your publication. The goal is to include the data that would be required for another researcher to be able to reproduce the published analysis.

Database: Japanese Genotype-phenotype Archive
Created: May 31, 2016

Do I have to register a separate BioProject/BioSample for each genome I am sequencing?

If multiple cultured genomes are part of the same research effort, then they can belong to the same BioProject. However, each culture must be registered as a separate BioSample.

Metagenomic assemblies, where multiple genomes are assembled with high confidence from a single metagenomic sample, register a BioProject for metagenomic assembly project, and BioSamples for each sample of metagenomic assembly.

Database: BioProject, Biosample, Sequence Read Archive
Created: February 12, 2015; Last updated: December 13, 2016

Do I need to make a separate BioProject for every type of data?

No, you do not. You should organize your BioProjects the most appropriate way for your research effort.

From 12 November 2014, multiple Project data types can be selected for a project in the submission system.

To merge genome sequencing and transcriptome analysis projects, select both 'Genome Sequencing' and 'Transcriptome or Gene Expression' for the Project data type. One material is allowed for the Material, so select 'Other'.

Another way is to register 'Genome Sequencing' and 'Transcriptome or Gene Expression' as separate projects and unite them by an Umbrella BioProject.

Database: BioProject
Created: November 20, 2014; Last updated: October 13, 2015

What should be provided when information is unavailable?

Please see the Missing value reporting.
Database: Biosample
Created: September 2, 2014; Last updated: October 28, 2015

What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?

BioSample is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. Biological and technical replicates need to be registered as separate BioSamples distinguished by the "replicate" attribute having values such as "biological replicate 1" and "biological replicate 2".

Each SRA Experiment is a unique sequencing library for a specific sample. Importantly, much of the descriptive information that is displayed in the public record of your data is captured at the level of the DRA Experiment.

SRA Runs are simply a manifest of data file(s) that should be linked to a given sequencing library – no information present in the Run is displayed on the public record of your project. Note that all data files listed in a Run will be merged into a single SRA archive file (and fastq file for distribution), so files from different samples should not be grouped in the same Run. Paired-end data files (forward/reverse), conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end. Do not divide a sample for a paired-end library (for example, forward and reverse).

Database: Biosample, Sequence Read Archive
Created: June 4, 2014; Last updated: January 4, 2017

How do I import a BioProject or BioSample accession into the DRA?

BioProject and BioSample submissions must be made through the Submission Portal D-way. Once you begin a BioProject or BioSample submission, it will be assigned a temporary tracking ID (PSUB/SSUB[number], respectively) – this is not the final accession! Once a BioProject is complete, it is assigned an accession like PRJDB[number]. Once a BioSample submission is complete, each sample will receive an accession like SAMD[number]. When creating DRA experiments, please specify the PSUB ID or PRJDB[number] accession as your BioProject, and SSUB ID or SAMD[number] as your BioSample. Note that a given data file can be linked to a single BioSample only.

When sample preparation and sequencing are carried out by different research groups, submitting DRA Experiment can refer BioProject and BioSample IDs obtained in the other submission account. If you need to refer external BioProject and BioSample IDs, contact to the DRA team. When referencing external objects, please be aware of triggering of data release among BioProject, BioSample and DRA submissions.

Database: BioProject, Biosample, Sequence Read Archive
Created: June 4, 2014; Last updated: December 13, 2016

How many samples do I need for my DRA submission?

BioSample is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. Biological and technical replicates are represented by separate BioSamples with distinct 'replicate' attribute, e.g., 'replicate = biological replicate 1'.

For environmental samples, each physical isolate should be considered a BioSample, whereas uniquely attributable reads within an isolate are not. Note that a given DRA data file can be linked to a single BioSample only.

Basic guidance for BioSample registration are:
  • Register a separate BioSample for each unique source, e.g., RNA from the wings is a separate BioSample than RNA from legs if those two sources were sequenced independently.
  • A genome assembly can have only one BioSample. For a genome assembled from reads of multiple BioSamples, register a new BioSample and indicate which other BioSamples were used to generate the assembly. For example, if the reads from a male and from a female were submitted to DRA separately but the reads were combined to assemble the genome, register a new BioSample for the male plus the female, providing the accessions of the male and the female BioSamples in the new BioSample registration. Example genome entry.
  • Endosymbionts: Because sequences are annotated by genome, one would need separate BioSamples for an insect and its endosymbiont. In the insect genome assembly submission, we recommend indicating that the endosymbiont’s BioSample is separate and references the insect BioSample.
Examples:
  • 23,000 unique 16S amplicons from a single seawater collection point - 1 BioSample (1 sample was collected and then analyzed to deduce 16S diversity)
  • 3 "identical" transgenic mice treated with the same drug as part of an experiment - 3 BioSamples (biological and technical replicates are represented by separate BioSamples)
  • To examine gene expression profiles, CHO cells infected with a virus and sampled at 0, 2, 4, and 8 hours post infection - 4 BioSamples (4 time points)
  • To analyze differences in gene expression levels, RNA-seq data from a single male anteater taken from the brain, heart, lungs, testes, and liver - 5 BioSamples (5 different tissues isolated)
Database: Biosample, Sequence Read Archive
Created: June 4, 2014; Last updated: December 13, 2016

How should I describe a pooled sample distinguished by barcode sequences in metadata?

Divide sequence data files per sample and submit each file as single BioSample-Experiment-Run set. If you need to describe the relationship between barcode sequence and sample, please describe in the Library Construction Protocol of Experiment as free-text.

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: December 13, 2016

From 12th, May, 2014, the DDBJ SRA uses the BioProject instead of SRA Study. Please select the BioProject accession in the DRA submission system.

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: October 13, 2015

Is there an appropriate way to submit submissions containing many metadata objects?

When there are many Experiment and Run objects, these can be submitted in tab-delimited text files generated by using spreadsheet editor (for example, Excel). Please read the DRA Handbook.

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: December 13, 2016

How can I turn the "Validate data files" button active?

When all sequencing data files listed in the Run metadata are uploaded to the DRA server, the "Validate data files" button becomes clickable and users are able to start the validation process. If the button remains inactive after submitting metadata ("metadata_submitted"), check the following points.
  • All data files listed in the Run metadata have not yet been uploaded.
  • File contains spaces is not recognized.
  • Uploaded file in directory is not recognized.
Database: Sequence Read Archive
Created: October 5, 2015

How are my data files processed?

Uploaded data files are processed per Run. All files under a Run are merged into single binary SRA file by using SRA toolkit. During this conversion, length and format of all reads are checked.

Read names are editted and identifiers (DRR accession number + serial number) are automatically inserted (example: DRR000001). Original read names should be unique in a Run. A DRR accession number is used as a filename. If the "generic_fastq" is selected for the filetype, read names are replaced with the DRR accession number + serial number. (example: DRR030615).

リード名は編集され,DRR アクセッション番号に連番が付された ID が自動的に挿入されます。ファイル名には DRR アクセッション番号が付与されます (例: DRR000001)。filetype に "generic_fastq" を指定した場合,リード名は DRR 番号に連番が付された ID で置換されます (例: DRR030615)。

Example of read names:

@DRR000001.1 3060N:7:1:1116:340 length=36
GATGGTAAGATAGAAGCAGTTGAAGTTTACAAACCG
+DRR000001.1 3060N:7:1:1116:340 length=36
IIIII%IIIIIIIIII7IHII26:C6EI)+,9,%%*
@DRR000001.2 3060N:7:1:1114:186 length=36
GATATTGGCCTGCAGAAGTTCTTCCTGAAAGATGAT
+DRR000001.2 3060N:7:1:1114:186 length=36
IIIIIIIIIIIIIGI8IIDI6II;?:,+9+>.A1,I
@DRR000001.3 3060N:7:1:945:361 length=36
GTCAGGATCGGTCTCGCCTTTTAATAGAGGGAGATA
+DRR000001.3 3060N:7:1:945:361 length=36
IIIIIIIIIIIIIIII=3IIII>>I;-52/./+.I,

When "PAIRED" is selected in Experiment, paired reads are grouped in a Run.

DRA generates fastq from SRA files by using SRA toolkit and provide sequencing data in both file formats.

More than two fastq files are provided for paired reads. Paired reads are divided into a file with "_1" (example, DRR000001_1.fastq.bz2) and "_2" (example, DRR000001_2.fastq.bz2). Reads without pair are provided in a file without "_1" nor "_2" (example, DRR000001.fastq.bz2).

Database: Sequence Read Archive
Created: December 25, 2014; Last updated: December 25, 2015

I can not scp transfer my files.

First, confirm the following basic points.

  • Authentification is by using SSH key not by password.
  • A private key is pair of a public key registered in a D-way submission account.
  • A private key file has read permission.
  • A passphrase for private key is correctly entered.

When transferring data files by using a private key generated in the other operating system, please check format of a private key. Convert private key

In Unix/Mac OS X: Convert a key in the Windows PuTTY file format into the OpenSSH.

In Windows WinSCP: Convert a key in the Unix/Mac OS X OpenSSH file format into the Windows PuTTY format.

When these are correct, please confirm your system administrators whether scp (port 22) is allowed or not.

Database: Sequence Read Archive
Created: November 19, 2014; Last updated: February 12, 2015

What is an MD5 checksum and how do I compute it?

MD5 checksums are used by the DRA to verify the integrity of transmitted data. MD5 checksums are a 32-character alphanumeric string like. Please refer to the manual.

bf4ac50dcd58bd2860dfac48c7fca348

Database: Sequence Read Archive
Created: June 4, 2014; Last updated: June 6, 2014

How to deal with validation errors?

data excessive while validating formatter within short read archive module - cummulative length of reads data in file(s): 152 is greater than spot length declared in experiment: 76 in spot 'xxxx'

Spot length value in Experiment differs from actual read length. For paired library, enter a sum of paired read lengths in the Spot length.

fastq-load err: data inconsistent while validating formatter within short read archive module - cummulative length of reads data in file(s): 70 is less than spot length declared in experiment: 152, most probably mate-pair is absent in spot 'xxxx'

When 'fastq' is selected for the filetype in Run, "read length should be constant" and "paired reads must appear in the same order in the paired files". If the fastq files do not meet these conditions, validation errors occur. Revise the filetype from 'fastq' to 'generic_fastq'.

constraint violated while executing function within virtual database module

Read names are possibly not unique in Run.

path not found while accessing directory within file system module - no message text available

Files are not recognized. This error occurs in the following cases: "filename contains whitespace", "files are in sub-directories" and "fastq files are tar archived".

CheckSum Error

The md5 values in Run differs from actual md5. Check "files are not corrupted" and "md5 values in Run are not wrong".

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: February 2, 2015

How do I update my BioSample?

At this time, it is necessary for submitters to contact the BioSample team to request updates and withdrawals as necessary. Please note that when BioSamples are updated, the submission overview page in the D-way submission portal will not reflect this change. That page is only a record of the initial submission, and does not display changes made in the BioSample database.

Database: Biosample
Created: October 13, 2015

How do I update my BioProject?

At this time, it is necessary for submitters to contact the BioProject team to request updates and withdrawals as necessary. Please note that when BioProjects are updated, the submission overview page in the D-way submission portal will not reflect this change. That page is only a record of the initial submission, and does not display changes made in the BioProject database.

Database: BioProject
Created: October 13, 2015

How do I change hold date?

Please login to the submission system and change the date. You can set the hold date for a maximum of 2 years, and this date may be brought forward or pushed back at any time.

hold_date

We will send you an e-mail reminder 30 days before the scheduled release date, inviting you to postpone the release date as necessary.
Please see the video tutorial.

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: October 9, 2015

How do I add reference information?

DDBJ Sequence Database

See the relevant item in Data Updates/Corrections and contact us from this form with "Our paper was published" in [Subject].

DRA

Add publication information to the BioProject referenced by relevant DRA submission. Contact BioProject team to add publication.

BioProject

Contact BioProject team to add publication information. Basically, citation of the BioProject accession is not recommended.

BioSample

When sequencing data derived from relevant samples are deposited in DDBJ Sequence Database and DRA, please add publication information as described above.

For a publication about isolation and growth condition specifications of the organism/material, add pubmed id etc to isol_growth_condt. For a primary genome report, please add the relevant pubmed id etc to ref_biomaterial.

If you want to add publication of the other types, please contact BioSample team.

Database: BioProject, Biosample, Sequence Read Archive
Created: January 23, 2014; Last updated: September 5, 2016

Should I cite BioSample accession numbers in my manuscript?

Typically, it is appropriate to cite the accession numbers that are assigned to your data submissions, e.g. the DDBJ, WGS or DRA accession numbers. If individual BioSamples do need to be referenced, state that "BioSample metadata are available in the DDBJ BioSample database (http://trace.ddbj.nig.ac.jp/biosample/index_e.html) under accession number SAMDxxxxxxxx".

Database: Biosample
Created: October 13, 2015

Should I cite BioProject accession numbers in my manuscript?

No, typically, you should cite the accession numbers that are assigned to your data submissions, e.g. the DDBJ, WGS or DRA accession numbers. If individual BioProjects do need to be referenced, state that "The data have been deposited with links to BioProject accession number PRJDBxxxxxx in the DDBJ BioProject database (http://trace.ddbj.nig.ac.jp/bioproject/index_e.html)."

Database: BioProject
Created: October 13, 2015

Which accession numbers should be cited in publication?

A DRA submission is composed of following objects with unique prefix. LINK : Prefix Letter List

  • Submission : DRA
  • BioProject (Study) : PRJD
  • Experiment : DRX
  • BioSample (Sample) : SAMD
  • Run : DRR
  • Analysis : DRZ
Metadata objects
Metadata objects

Please cite accession number(s) of objects you want to refer in your publication.

In general, do not cite the BioProject accession number.

Database: Sequence Read Archive
Created: April 2, 2015; Last updated: October 13, 2015

I have not received accession numbers yet - is something wrong?

Please login to the submission system and check the status of your submission.

  • If the status is "metadata_submitted", you need to validate your data files by clicking the [Validate data files] button.
  • If the status is "data_error", please check the error messages of data validation and modify metadata, re-upload data files as necessary.
  • If the status is "data_validating", the DRA system is validating your data files. Validation of large files may take time.
  • The DRA team is reviewing the submissions.

Please contact DRA team, when necessary.

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: June 4, 2014

How do I download files?

Download files from DDBJ ftp server at ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq.

wget

wget is a convenient way to download files over FTP.

wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/DRA000/DRA000001/DRX000001/DRR000001.fastq.bz2

ascp

Aspera ascp command line client can be dowloaded here. Please select the correct operating system. The ascp command line client is distributed as part of the Aspera connect high-performance transfer browser plug-in.

Your command should look similar to this:

ascp -i <aspera connect SSH key> <option> -P 33001 anonftp@ascp.ddbj.nig.ac.jp:<file or files to download> <download location>

Examples:

ascp -i <aspera connect SSH key> -QT -l 300m -P 33001 anonftp@ascp.ddbj.nig.ac.jp:/ddbj_database/dra/fastq/DRA000/DRA000001/DRX000001/DRR000001.fastq.bz2 .

Database: Sequence Read Archive
Created: January 23, 2014; Last updated: June 4, 2014

Why is reads number of fastq less than that of SRA file?

The DRA generates fastq files from the raw data SRA files by using the fastq-dump in the NCBI SRA Toolkit with following options.

fastq-dump -M 25 -E --skip-technical --split-3 -W <SRA file>

  • -M 25: Minimum read length to output is 25 (default is 25)
  • -E: No sequences starting or ending with >= 10N
  • --skip-technical: Dump only biological reads
  • --split-3: Legacy 3-file splitting for mate-pairs: first and second biological reads satisfying dumping conditions are placed in files *_1.fastq and *_2.fastq, respectively. If only one biological read is present, it is placed in *.fastq.
  • -W: Apply left and right clips

Reads are filtered and trimmed according to above dumping conditions, reads number of fastq is generally less than that of SRA file. Users can generate unfiltered and untrimmed fastq files by using following fastq-dump options.

fastq-dump -M 1 --split-3 <SRA file>

Database: Sequence Read Archive
Created: January 23, 2014

What is the difference between env_biome, env_feature and env_material?

These three sample attributes describe environmental systems have influences on living organisms.

env_biome

In the Environment Ontology (ENVO), the biome [ENVO_00000428] classes are subclasses of environmental system. The env_biome represents environmental systems to which resident ecological communities have evolved adaptations. Thus, a env_biome may be thought of as a community-centric ecosystem, whose extent is defined by the presence of the communities adapted to it. This requires that a env_biome possesses a degree of spatial and temporal stability that has allowed at least some of its constituent communities to adapt. Classes such as tundra biome [ENVO_01000180] and coniferous forest biome [ENVO_01000196] are included in ENVO. Currently, the biome branch of the ontology makes no commitment to a specific spatial or temporal scale.

env_feature

The biome described above are useful in ecological settings; however, environments are often described by referencing a single entity that has a strong causal influence on its surrounding space. For example, a coral reef environment is determined by the presence and influence of a coral reef [ENVO_00000150]. Similarly, the human gut environment is determined by the human gut. Removal of either the coral reef or the human gut would cause the associated environmental system to collapse. Environmental systems of this kind make no specific reference to ecological communities or populations (as do biomes), but to some central, supporting ‘feature’. Entities that act in this way as the causal ‘hubs’ or supports of a given environmental system are referenced by classes in ENVO’s top-level environmental feature [ENVO_00002297] hierarchy. For example, the environmental feature seamount [ENVO_00000264] would support a seamount environment, i.e. an environmental system which is supported by, and whose properties are determined by, the presence of a seamount.

env_material

In contrast to the classes above, which identify countable entities, the subclasses of the top-level environmental material [ENVO_00010483] class refer to masses, volumes, or other portions of some medium included in an environmental system. A portion of environmental material is understood to be more complex and variable in composition than a simple collection of material entities (e.g. a collection of silicate particles). For example, the environmental material soil [ENVO_00001998] typically contains aggregates of fine rock particles, sand grains, clay particles, silt particles, communities of animals, plants, fungi and microbes, small parts of organisms, organic matter, water inclusions, and airspaces.

Database: Biosample
Created: July 24, 2014; Last updated: November 19, 2014

How to transfer data files from the NIG supercomputer to my DRA directory?

If the private key was generated on Unix/Mac OS X

Transfer your private key to the NIG supercomputer (Linux). Next, transfer the files by executing.

scp <Your Files> <D-way Login ID>@dradata.ddbj.nig.ac.jp:~/<Submission ID>

  • <Your Files> Files to be transferred.
    Ex: file1 file2 (file1 and file2), file* (all files whose filenames start with “file”)
  • <D-way Login ID> D-way Login ID (ex. drauser)
  • <Submission ID> Submission ID (ex. drauser-0003)

If the private key was generated on Windows PC

After the conversion of the key into the OpenSSH format used in Linux, transfer the private key to the supercomputer. Then, specify the private key using -i option of scp.

scp -i <Private Key> <Your Files> <D-way Login ID>@dradata.ddbj.nig.ac.jp:~/ <Submission ID>

  • <Private Key> The private key file path (ex. /home/mishima/id.rsa) 
Database: Sequence Read Archive
Created: December 12, 2014; Last updated: January 20, 2015

How are linked BioProject/BioSample/sequence data released?

Linked BioProject, BioSample, DDBJ and DRA data are released as follows.

  • Release of the BioProject records DO NOT trigger release of the other linked data.
  • Release of the BioSample records DO NOT trigger release of the other linked data, however, DO trigger release of the referencing BioProject.
  • Release of the DDBJ and DRA nucleotide sequence data DO trigger release of the linked BioProject and BioSample records.

All metadata and sequencing data in a DRA submission are released at once.

Release of linked BioProject/BioSample/sequence records
Release of linked BioProject/BioSample/sequence records

DRA Handbook: Release of DRA
BioProject Handbook: Release of BioProject
BioSample Handbook: Release of BioSample

Database: BioProject, Biosample, Sequence Read Archive
Created: December 15, 2014; Last updated: February 16, 2017

Do DDBJ JGA/NCBI dbGaP/EBI EGA exchange data?

DDBJ JGA/NCBI dbGaP/EBI EGA do not exchange restricted-access individual-level data. dbGaP and EGA exchange summary metadata for cross indexing.

Database: Japanese Genotype-phenotype Archive
Created: July 1, 2016

How can I contact you when the form is not available?

When the contact form to BioProject/BioSample/DRA/D-way/JGA is not available, please send an e-mail with the following information.

*Required

Name *
E-mail address *
Title *
D-way account
Accession number/Submission ID
Message *

Contacts (click the service name to send an e-mail)