General introduction to C
A general introduction C can be found on the net. One such example is
C Programming Notes by Steve Summit
Another good place to look for help using C is
Catalog of C routines and functions
Also, refer to the set of C programming notes by Tom Macke available in the course curriculum
Course curriculum.
C routines
Below follows a list of subroutines and functions you might find useful. Some are standard C routines and are
described in details using the man command (or using the link above), other are
implemented by me and are documented in the
utils directory. If you see a command, say strlen, you do not know the function of, try to type
man strlen
If this is a standard C routine, you will get a detailed description. If not, look into the utils
directory. This you do by (say you what to find how the routine strpos works)
cd
cd src/utils
grep strpos *.h
This will return
strutils.h:extern int strpos(char *s, char c);
to the screen. The routine strpos is hence defined in the strutils.c file. Open this file
in your favorite editor and search for the definition of strpos
int strpos(char *s, char c)
{
int i, l = strlen(s);
for (i = 0; i < l; i++)
if (c == s[i])
return (i);
return (-1);
}
You see that the routine strpos takes two arguments as input. The first character is a pointer
to a char (that is en efficient way to pass a string), and the second argument is a character. The
routine returns the position of the first occurrence of the character c in the string s, and if the character is
not found in s the value -1. Note that the first character in s is at position 0.
Strings
Here are some type definitions and routines dealing with strings.
WORD: String of size 56 characters.
FILENAME: String of size 256 characters.
LINE: String of size 1024 characters.
Examples
WORD name;
is equal to
char name[56];
Vectors and matrices
int *ivector(int l, int h);
void ivector_free(int *v, int l, int h);
int **imatrix(int rl, int rh, int cl, int ch);
void imatrix_free(int **v, int rl, int rh, int cl, int ch);
float *fvector(int l, int h);
void fvector_free(float *v, int l, int h);
float **fmatrix(int rl, int rh, int cl, int ch);
void fmatrix_free(float **v, int rl, int rh, int cl, int ch);
char *cvector(int l, int h);
void cvector_free(char *v, int l, int h);
char **cmatrix(int rl, int rh, int cl, int ch);
void cmatrix_free(char **v, int rl, int rh, int cl, int ch);
You can either allocate the vectors and matrices as fixed sized variables in your code, or allocate them
dynamically when the program is executed. Examples
float mat[20][20];
mat = fmatrix( 0, 19, 0, 19 )
will both allocate a 20*20 matrix of float numbers. If you need a vector starting from 3 to 25 you use
vec = ivector( 3, 25 );
Note that is always god programming practice to free the memory taken up the dynamically allocated
variables once you no long use them. This is done using the free routine. Example
ivector_free( vec, 3, 25 );
Lists
Reading input from standard input or a file is always troublesome. A simple code reading lines from a file
could look as follows
if ( ( fp = fopen( filename, r )) == NULL ) {
printf( "Error. Cannot read from file %s. Exit\n",
filename );
exit( 1 );
}
while ( fgets(line, sizeof line, fp) != NULL ) {
/* Write some code work in the content of the variable line */
}
fclose( fp );
This is some code that you will need to use over and over again, so it is convenient to do this in a subroutine.
The routine
linelist = linelist_read( filename );
does this. The linelist_read routine reads from the file called filename, and returns a
linked-list of the lines in the file. Each element is the list are elements defined as
typedef struct linelist {
struct linelist *next;
LINE line;
int nw;
char **wvec;
} LINELIST;
Here the essential elements are line and next. The variable line contains the text from one
line the file, and the next is a pointer to the next element in the list. This structure can be shown like
------------ ------------ ------------
| line1 | | line2 | | line3 |
linelist -> ------------ /-> ------------ /-> ------------ /-> NULL
| ----|--/ | ----|--/ | ----|--/
------------ ------------ ------------
You can access each variable of the linelist element as
linelist->line
linelist->next
Once you have read the linked list using the linelist = linelist_read( filename ); command, you
can access each element in the list, and hence each line in the input file, using the for loop command
for ( ln = linelist; ln; ln=ln->next ) {
/* Do some stuff on the line stored in ln->line */
}
Here the first part of the for command (ln = linelist) assigns the variable ln to point
to where the variable linelist is pointing, and that is to the beginning of the list i.e the list
element containing the first line of the file. The next part of the for tell the program to keep
on looping while the variable ln is true. This is a compact way of writing ln != NULL, i.e
until end of the linked list. Finally the last part of the for moves the ln pointer to the
next element of the linked list.
FASTA
Similar linked list structures are available for dealing with FASTA files.
fsalist = fsalist_read( filename );
The fsalist_read routine reads from the file called filename, and returns a linked-list of the FASTA entries
in the file. Each FASTA entry is the list is defined as
typedef struct fsalist {
struct fsalist *next;
char *seq;
char name[255];
int len;
float score;
int *i;
} FSALIST;
Here the essential parts are the variables seq containing the FASTA sequence, name containing the FASTA
name, len containing the length of the FASTA sequence, and next a pointer to the next element in the list.
There are several routines available for dealing with FASTA format files. Here follows a few (they are are documented
in the fsalist.c in the utils directory.)
void fsalist_free( FSALIST *fsa );
FSALIST *fsalist_alloc();
FSALIST *fsalist_read_single( FILE *fp );
FSALIST *fsalist_find( char *name, FSALIST *list, char *pattern );
void fsa_print( FSALIST *l );
void fsalist_iassign_profile_order( FSALIST *fsalist );
In particular the last routine fsalist_iassign_profile_order will be useful. It defines the vector element of the
fsalist structure to contain the position of each amino acid in the FASTA sequence in the BLOSUM alphabet. The routine
is implemented as
for ( fsa = fsalist; fsa; fsa = fsa->next) {
n = fsa->len;
fsa->i = ivector(0, n - 1);
for (i = 0; i < n; i++)
fsa->i[i] = strpos(PROFILE_ORDER, fsa->seq[i]);
}
where PROFILE_ORDER = "ARNDCQEGHILKMFPSTWYVX". This function is thus like the function used in the exercise for
calculating the scoring matrix between two protein sequences.
In C, arguments are passed to functions by value while other languages may pass variables by reference.
This means that the receiving function gets copies of the values and has no direct way of altering the original
variables. For a function to alter a variable passed from another function, the caller must pass its address
(a pointer to it).
Here a examples on how an argument that is passed by value is NOT modified when returning from the function call
void dummy( int i )
{
i = i + 1;
}
main(int argc, char *argv[])
{
int i;
i = 3;
dummy( i );
printf( "I: %i\n", i );
}
and next an example on how the variable is transferred as a pointer to the function and the value is modified when
returning from the function call
void dummy_wpointet( int *i )
{
(*i) = (*i) + 1;
}
main(int argc, char *argv[])
{
int i;
i = 3;
dummy_wpointer( &i );
printf( "I: %i\n", i );
}
This is all for now.
|