Array design

Overview*

The aim of the ADF component is to describe a microarray design in a spreadsheet or, for complex cases, a set of spreadsheets. Conceptually, microarray designs are devised to measure presence and/or abundance of genomic sequence entities in biological samples. Genomic sequences of interest are represented by one or more synthetic sequences which are in turn arranged in one or more physical locations in the two-dimensional space of a microarray surface. Therefore, to fully describe a microarray layout, information about genomic sequences, synthetic sequences, physical position on array and relationships (mappings) between those must be captured. The same array design can be used in many different hybridizations across many different experiments.

This page is about submitting microarray designs (array layout and annotation) to DOR.

Use the diagram below to decide if you need to submit an array design to us.

Diagram to decide if you need to submit an array design to DOR.
Diagram to decide if you need to submit an array design to DOR.

Checking if an array design is already registered*

If the array design you used has already been described in ArrayExpress/DOR then you do not need to submit it. Many commercial and academic array designs from organizations such as Affymetrix, Agilent, Illumina, Nimblegen and Sanger are already loaded into ArrayExpress/DOR.

To search for registered array designs, use ArrayExpress platform designs query interface.

If your array design is already in ArrayExpress/DOR then you can use it in your MAGE-TAB experiment submission by entering the accession number of the array design in the 'Array Design REF' column in the SDRF section of your spreadsheet.

New commercial array designs*

If you used a commercial catalogue array design and you cannot find it in ArrayExpress/DOR, please contact to the DOR team and tell us the exact name of the array design you used and the manufacturer.

Creating an array design format (ADF) file*

An array design format (ADF) file is simply a table with standardized column names describing what was printed/synthesized at each position on a microarray. The ADF file can be created in any spreadsheet application but must be saved as a tab delimited text file.

Each ADF file may start with an optional header section providing top-level information about the array design. As described for the IDF, optional "Comment[]" rows may be used to provide extra information needed. Additional rows providing Term Source information are included in the ADF header to allow the full encoding of array design information in the absence of any investigation-level detail. These Term Source rows are treated in the same way as for the IDF, and are used to indicate the source databases or files used for sequence database accessions and ontology terms. As many Term Sources may be used as needed, listed horizontally in columns as for the IDF. See Table for a list of ADF header row types. All tags are optional, and a tag can have at most one value. The tags (rows) can appear in any order, except that associated attributes must immediately follow the object they are associated with.

ADF Information in header
Tag Value
Array Design Name Text
Version Text
Provider Text
Printing Protocol Text
Technology Type Ontology term
Technology Type Term Source REF Term Source Name
Technology Type Term Accession Number Term Accession Number
Surface Type Ontology term
Surface Type Term Source REF Term Source Name
Surface Type Term Accession Number Term Accession Number
Substrate Type Ontology term
Substrate Type Term Source REF Term Source Name
Substrate Type Term Accession Number Term Accession Number
Sequence Polymer Type Ontology term
Sequence Polymer Type Term Source REF Term Source Name
Sequence Polymer Type Term Accession Number Term Accession Number
Term Source Name Text tag as used in main ADF table
Term Source File URI
Term Source Version Text
Comment[]

Spot Location: The concept of Feature*

Each spot on the array is called a feature. The position of each feature is described by 4 coordinates: Block Column, Block Row, Column, Row. These 4 columns are mandatory in the ADF and each line in your ADF will correspond to one feature. Features cannot be duplicated on an array as each spot can occur only once, but reporters can be printed at several different locations. All the features that appear in your raw data files must be included in the ADF even if there is nothing spotted there.

image of the Block Column and Block Row coordinate
ADF Feature coordinate columns
Block Column Block Row Column Row
1 1 1 1
1 1 1 2
1 1 1 3

Spot Content/Spot Sequence: The concept of Reporter*

Synthetic sequences, used as proxies for genomic entities, can be deposited in one or more spot locations and array designs. These elements correspond to Reporter, and it is a MIAME requirement to publish the actual sequences physically present on the array. Therefore, a Reporter is uniquely defined by its ID and its sequence. Additional information is also required by the model, such as the role (experimental or control), and, where appropriate, the kind of control it represents.

The Reporter Name entered should be the same as the one you use for the reporter in final gene expression matrices and other normalized data files. We use the reporter name values in the array design files and data files to link array annotation to measurement values in data files.

ADF General case for oligonucleotide based microarrays
Reporter Name Reporter Sequence Reporter Group[role] Control Type
R1 ATGGTTGGTTACGTGT Experimental
R2 CCGCGTTGCCCCGCC Experimental
R3 TCCCTTCCGTTGTCCT Control control_spike_calibration
General case for PCR based microarrays
Reporter Name Reporter Database Entry[flybase] Reporter Group[role] Control Type
R1 Fb2353 Experimental
R2 Fb2354 Experimental
R3 Fb2345 Control control_spike_calibration

We need database entries or actual sequence to describe the sequences on your array. We need to know which database these accession numbers are from and we ask you to supply a database code inside the [square brackets] in the header row. You can find a complete list of allowed databases here (use the values in the 'Name' column). A short list of common ones is below.

Short list of common database names for Database Entry column.
catma flybase nasc refseq trembl
ddbj image omim rgd unigene
ensembl locus plasmodb tair wormbase

Describe what type of controls were used in the "Control Type" column. If the spot is not a control then do not fill in anything in this column. The allowed values for this column are:

  • control_biosequence - for example a spike
  • control_buffer - buffer spotted on the array
  • control_empty - nothing spotted on the array
  • control_genomic_DNA - e.g. salmon sperm DNA
  • control_label - landing lights
  • control_reporter_size - size standard
  • control_spike_calibration - spike at varying concentrations
  • control_unknown_type

Genomic Entities of interest: The concept of Composite Element*

This section addresses the description of the biological sequence of interest which is interrogated by the synthetic probe (Reporter) sequences. For simple microarray designs, spot location, spot sequence and genomic sequences are directly associated in a one-to-one relationship. Interpretation is straightforward: one location, one probe, one gene or biological entity. For these cases, all layers can be combined in a single spreadsheet, and the ADF can be considered completely and unequivocally represented. In more elaborate microarray designs, hybridization signals observed from series of spot sequences can be combined to provide measure estimates about surveyed genomic sequences. The format proposed here is designed to encode simple cases where there is a one-to-one or many-to-one mapping from Reporters (probe sequences) to Composite Elements (biologically relevant sequences).

Each feature, reporter or composite element can be annotated with "Comment[<category>]" columns which allow users to provide information that is additional to the usual Database Entry annotations.

Two Reporters map to one Composite Element.
Reporter Name Composite Element Name Composite Element Database Entry[tair] Comment[Chromosome]
R1 HSP18.2 AT5G59720.1 5
R2 HSP18.2 AT5G59720.1 5

ADF Use Cases*

Case 1: Absence of technical replicates, direct association between representative sequences and genomic sequences:

ADF Case 1

Case 2: Technical replicates, and direct association between representative sequences and genomic sequences. Description of Composite Element is not required, and the relevant Composite Element columns may be omitted from the ADF:

ADF Case 2

Case 3: Absence of technical replicates, and any genomic sequence being represented by more than one representative sequence. This use-case requires extra columns to describe the Composite Elements, and is only supported for cases where many Reporters map to one Composite Element:

ADF Case 3

ADF Sections*

There are two ADF sections:

  • An optional header section, with top-level information.
  • The main ADF table itself; supported column headings are given in Table. This table should be preceded by a "[main]" header (section delimiter) which is case-insensitive.
ADF Summary of column headings
Object Associated attributes
(Feature) Block Column, Block Row, Column, Row, Comment[]
Block Column
Block Row
Column
Row
Reporter Name Reporter Database Entry[], Reporter Sequence, Reporter Group, Control Type, Comment[]
Reporter Database Entry[]
Reporter Sequence
Reporter Group[] Reporter Group Term Source REF
Reporter Group Term Source REF Reporter Group Term Accession Number
Reporter Group Term Accession Number
Control Type Control Type Term Source REF
Control Type Term Source REF Control Type Term Accession Number
Control Type Term Accession Number
Composite Element Name Composite Element Database Entry[], Comment[]
Composite Element Database Entry[] Comment[]
Comment[]

The "Reporter Group[]" ADF heading may be used to describe a variety of different group types; typical examples would be "role" (with values "experimental" and "control") or "species" for multi-species arrays. Enter the group types (e.g., "role" and "species") in the square bracket of the column in free-text.

Example ADF files for download*

ADF Examples

Checking your ADF before submission*

A tool is provided by the ArrayExpress which will check your ADF for common formatting errors. The tool will report any problems in the ADF - please try to fix as many as you can before submitting the ADF as this will speed up the processing of your array design submission:
ADF format checking tool (ArrayExpress) »

ADF checkList

  • File is in tab-delimited text format (not Excel)
  • Feature coordinates are in Block Column, Block Row, Column, Row format
  • The following columns are included:
    • Reporter Name
    • Reporter Sequence and/or Reporter Database Entry[]
    • Reporter Group[role]
    • Control Type
  • If it is an oligo array the column Reporter Sequence is included
  • One or more Reporter Database Entry[] columns are included (if sequence not included)