The letters for nucleotide sequences are A, C, G , U,
T and - to indicate gaps. The letters are always to be specified to the
The msf format is used as output from a number of multi sequence alignment
programs. However, as 'msf' formats can vary for different programs, we feel
that it is necessary to state what this program expects from the format.
First any number of lines are skipped, until a line is encountered, where the
fields (seperated by whitespaces) match the following:
first is "MSF:",
third is "Type:",
second is "MSF:",
fourth is "Type:",
last is "..",
last-2 is "Check:"
Then the sequence length is read from field 2 or 3, depending on whether field
1 or 2 contains the "MSF:" string.
The number of lines are skipped until "//" is encountered in a seperate line.
Every time "first field" is "Name:" number of sequences are counted one up.
While input remains: Skip 1 line.
An optional number of lines starting with _atleast_ 10
spaces " " are skipped
The next lines (as many as there are registered sequences) have their sequences
concatenated to the result. First field is assumed to be a name, the rest is
assumed to be sequences.