The SFF file consists of a global header section followed by pairs of read-header and read-data sections with one pair for each sequence. Values are stored in big-endian order.
Field | Type | Description |
---|---|---|
magic | uint32 | 0x2E736666 (".sff") |
version | uint32 | As of the writing of this documentation, version is 1 |
index_offset | uint64 | Not used (value is 0) |
index_length | uint32 | Not used (value is 0) |
n_reads | uint32 | The number of reads in the file |
gheader_length | uint16 | Number of bytes in this global header (includes eight-byte padding) |
key_length | uint16 | Length of the key sequence used with these reads |
flow_length | uint16 | Number of nucleotide flows used in the experiment |
flowgram_format | uint8 | Specifies the manner in which signal values are encoded. Currently only one method is defined and so the only legal value is 1. Important note: unfortunately, the files we use for tests do not follow this convention. For all of the test cases you can assume that the value of flowgram_format is 0. |
flow_sequence | char[TS:flow_length] | A character array withith entry specifying theith nucleotide flowed. |
key_sequence | char[TS:key_length] | A character array withith entry specifying theith nucleotide of the sequence key. |
eight_byte_pad | uint8[TS:?] | If the number of bytes in the header is not divisible by 8, zero-valued bytes are added to pad the length out such that it is divisible by 8 |
Field | Type | Description |
---|---|---|
rheader_length | uint16 | Number of bytes in this read header (includes eight-byte padding) |
name_length | uint16 | Number of characters in the name of the read. Note the read name is not null-terminated. |
n_bases | uint32 | Number of bases in the read |
clip_qual_left | uint16 | If any clipping has been applied, the 1-indexed coordinate of the first base after the clipped region is stored here. If no clipping has been applied this field will be zero. |
clip_qual_right | uint16 | If any clipping at the end of the read has been applied, this field contains the 1-indexed coordinate of the last base before the clipped region. If no end-clipping is applied this field is zero. |
clip_adapter_left | uint16 | Similar to clip_qual_left - identifies adapter clipping, if applied. |
clip_adapter_right | uint16 | Similar to clip_qual_right - identifies adapter clipping, if applied. |
name | char[TS:name_length] | The read name. This string is not null-terminated. |
eight_byte_pad | uint8[TS:?] | If the number of bytes in the read header is not divisible by 8, zero-valued bytes are added to pad the length out such that it is divisible by 8 |
Field | Type | Description |
---|---|---|
flowgram | uint?[TS:flow_length] | The type used for this field depends on the flowgram_format value in the global header. Currently only one code is defined (1), which uses uint16 as the basic type for this field. This encoding specifies that signal values are stored as round(signal * 100). |
flow_index | uint8[TS:n_bases] | This array identifies for each called base the flow that resulted in the base call. The values are 1-indexed and they are each stored relative to the previous value in the array. |
bases | char[TS:n_bases] | The called bases. |
quality | uint8[TS:n_bases] | The quality score associated with each base call, in -10*log10 probability scale. |
eight_byte_pad | uint8[TS:?] | If the number of bytes in the header is not divisible by 8, zero-valued bytes are added to pad the length out such that it is divisible by 8 |