Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

General introduction to C

A general introduction C can be found on the net. One such example is

C Programming Notes by Steve Summit

Another good place to look for help using C is

Catalog of C routines and functions

Also, refer to the set of C programming notes by Tom Macke available in the course curriculum Course curriculum.


C routines

Below follows a list of subroutines and functions you might find useful. Some are standard C routines and are described in details using the man command (or using the link above), other are implemented by me and are documented in the utils directory. If you see a command, say strlen, you do not know the function of, try to type

man strlen

If this is a standard C routine, you will get a detailed description. If not, look into the utils directory. This you do by (say you what to find how the routine strpos works)

cd 
cd src/utils
grep strpos *.h

This will return

strutils.h:extern int strpos(char *s, char c);

to the screen. The routine

strpos
is hence defined in the strutils.c file. Open this file in your favorite editor and search for the definition of strpos

int strpos(char *s, char c)

{
        int     i, l = strlen(s);
        for (i = 0; i < l; i++)
                if (c == s[i]) 
                        return (i);
        return (-1);
}

You see that the routine strpos takes two arguments as input. The first character is a pointer to a char (that is en efficient way to pass a string), and the second argument is a character. The routine returns the position of the first occurrence of the character c in the string s, and if the character is not found in s the value -1. Note that the first character in s is at position 0.


Strings

Here are some type definitions and routines dealing with strings.

WORD: String of size 56 characters.
FILENAME: String of size 256 characters.
LINE: String of size 1024 characters.

Examples

WORD	name;

is equal to

char	name[56];

Vectors and matrices

int *ivector(int l, int h);
void ivector_free(int *v, int l, int h);
int **imatrix(int rl, int rh, int cl, int ch);
void imatrix_free(int **v, int rl, int rh, int cl, int ch);

float *fvector(int l, int h);
void fvector_free(float *v, int l, int h);
float **fmatrix(int rl, int rh, int cl, int ch);
void fmatrix_free(float **v, int rl, int rh, int cl, int ch);

char *cvector(int l, int h);
void cvector_free(char *v, int l, int h);
char **cmatrix(int rl, int rh, int cl, int ch);
void cmatrix_free(char **v, int rl, int rh, int cl, int ch);

You can either allocate the vectors and matrices as fixed sized variables in your code, or allocate them dynamically when the program is executed. Examples

float	mat[20][20];
mat = fmatrix( 0, 19, 0, 19 )
will both allocate a 20*20 matrix of float numbers. If you need a vector starting from 3 to 25 you use
vec = ivector( 3, 25 );
Note that is always god programming practice to free the memory taken up the dynamically allocated variables once you no long use them. This is done using the free routine. Example
ivector_free( vec, 3, 25 );

Lists

Reading input from standard input or a file is always troublesome. A simple code reading lines from a file could look as follows

if ( ( fp = fopen( filename, r )) == NULL ) {
	printf( "Error. Cannot read from file %s. Exit\n",
        	filename );
	exit( 1 );
}

while ( fgets(line, sizeof line, fp) != NULL ) {

/* Write some code work in the content of the variable line */
	
}

fclose( fp );

This is some code that you will need to use over and over again, so it is convenient to do this in a subroutine. The routine

linelist = linelist_read( filename );
does this. The linelist_read routine reads from the file called filename, and returns a linked-list of the lines in the file. Each element is the list are elements defined as
typedef struct linelist {
        struct linelist *next;
        LINE    line;
        int     nw;
        char    **wvec;
} LINELIST;
Here the essential elements are line and next. The variable line contains the text from one line the file, and the next is a pointer to the next element in the list. This structure can be shown like
            ------------       ------------       ------------
            |  line1   |       |  line2   |       |  line3   |
linelist -> ------------   /-> ------------   /-> ------------   /-> NULL
            |      ----|--/    |      ----|--/    |      ----|--/
            ------------       ------------       ------------

You can access each variable of the linelist element as

linelist->line
linelist->next

Once you have read the linked list using the linelist = linelist_read( filename ); command, you can access each element in the list, and hence each line in the input file, using the for loop command

for ( ln = linelist; ln; ln=ln->next ) {

/* Do some stuff on the line stored in ln->line */

}

Here the first part of the for command (ln = linelist) assigns the variable ln to point to where the variable linelist is pointing, and that is to the beginning of the list i.e the list element containing the first line of the file. The next part of the for tell the program to keep on looping while the variable ln is true. This is a compact way of writing ln != NULL, i.e until end of the linked list. Finally the last part of the for moves the ln pointer to the next element of the linked list.


FASTA

Similar linked list structures are available for dealing with FASTA files.
fsalist = fsalist_read( filename );
The fsalist_read routine reads from the file called filename, and returns a linked-list of the FASTA entries in the file. Each FASTA entry is the list is defined as
typedef struct fsalist  {
        struct  fsalist *next;
        char    *seq;
        char    name[255];
        int     len;
        float   score;
        int     *i;
} FSALIST;

Here the essential parts are the variables seq containing the FASTA sequence, name containing the FASTA name, len containing the length of the FASTA sequence, and next a pointer to the next element in the list.

There are several routines available for dealing with FASTA format files. Here follows a few (they are are documented in the fsalist.c in the utils directory.)

void    fsalist_free( FSALIST *fsa );
FSALIST	*fsalist_alloc();
FSALIST	*fsalist_read_single( FILE *fp );
FSALIST *fsalist_find( char *name, FSALIST *list, char *pattern );
void    fsa_print( FSALIST *l );
void    fsalist_iassign_profile_order( FSALIST *fsalist );

In particular the last routine fsalist_iassign_profile_order will be useful. It defines the vector element of the fsalist structure to contain the position of each amino acid in the FASTA sequence in the BLOSUM alphabet. The routine is implemented as

for ( fsa = fsalist; fsa; fsa = fsa->next) {

	n = fsa->len;

        fsa->i = ivector(0, n - 1);

       	for (i = 0; i < n; i++)
        	fsa->i[i] = strpos(PROFILE_ORDER, fsa->seq[i]);
}
where PROFILE_ORDER = "ARNDCQEGHILKMFPSTWYVX". This function is thus like the function used in the exercise for calculating the scoring matrix between two protein sequences.

In C, arguments are passed to functions by value while other languages may pass variables by reference. This means that the receiving function gets copies of the values and has no direct way of altering the original variables. For a function to alter a variable passed from another function, the caller must pass its address (a pointer to it).

Here a examples on how an argument that is passed by value is NOT modified when returning from the function call

void    dummy( int i )

{

        i = i + 1;

}

main(int argc, char *argv[])

{
        int     i;

	i = 3;

        dummy( i );

        printf( "I: %i\n", i );

}

and next an example on how the variable is transferred as a pointer to the function and the value is modified when returning from the function call

void    dummy_wpointet( int *i )

{

        (*i) = (*i) + 1;

}

main(int argc, char *argv[])

{
        int     i;

        i = 3;

        dummy_wpointer( &i );

        printf( "I: %i\n", i );

}

This is all for now.