Chain Format
The chain format describes a pairwise alignment that allow gaps in both sequences simultaneously.
Each set of chain alignments starts with a header line, contains one or more alignment data lines,
and terminates with a blank line. The format is deliberately quite dense.
Example:
chain 4900 chrY 58368225 + 25985403 25985638 chr5 151006098 - 43257292 43257528 1
9 1 0
10 0 5
61 4 0
16 0 4
42 3 0
16 0 8
14 1 0
3 7 0
48
chain 4900 chrY 58368225 + 25985406 25985566 chr5 151006098 - 43549808 43549970 2
16 0 2
60 4 0
10 0 4
70
Header Lines
chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id
The initial header line starts with the keyword chain
, followed by
11 required attribute values, and ends with a blank line. The attributes include:
-
score
-- chain score
-
tName
-- chromosome (reference sequence)
-
tSize
-- chromosome size (reference sequence)
-
tStrand
-- strand (reference sequence)
-
tStart
-- alignment start position (reference sequence)
-
tEnd
-- alignment end position (reference sequence)
-
qName
-- chromosome (query sequence)
-
qSize
-- chromosome size (query sequence)
-
qStrand
-- strand (query sequence)
-
qStart
-- alignment start position (query sequence)
-
qEnd
-- alignment end position (query sequence)
-
id
-- chain ID
The alignment start and end positions are represented as zero-based half-open intervals. For
example, the first 100 bases of a sequence would be represented with start position = 0 and end
position = 100, and the next 100 bases would be represented as start position = 100 and end
position = 200. When the strand value is "-", position coordinates are listed in terms of
the reverse-complemented sequence.
Alignment Data Lines
Alignment data lines contain three required attribute values:
size dt dq
-
size
-- the size of the ungapped alignment
-
dt
-- the difference between the end of this block and the beginning of
the next block (reference sequence)
-
dq
-- the difference between the end of this block and the beginning of
the next block (query sequence)
NOTE: The last line of the alignment section contains only one number: the ungapped alignment size
of the last block.