A. Common Errors in C – C Programming Essentials

Appendix A. Common Errors in C

Types of Errors

The errors in a program can be broadly classified as follows.

Programming Errors

These are generated when typographical errors are made by users.

Compile Errors

These are errors detected by the compiler which make the program un-compliable. Every language has syntax rules. These rules determine which statements are legal in the language and which are not. Compiler programs are designed to enforce these rules. When a rule is broken, the compiler reports an error message and an object file is not created. Most compilers will continue scanning the source file after an error and report other errors it finds. However, once an error has been found, the compiler may be confused by later perfectly legal statements and reports them as errors.

Link Error

These are errors generated when the executable of the program cannot be generated. This may be due to wrong function prototyping, incorrect header files, etc. The linker is a program that links object files (which contain the compiled machine code from a single source file) and libraries (which are files that are collections of object files) together to create an executable program. The linker matches up functions and global variables used in object files to their definitions in other object files. The linker uses the name (often, the term symbol is used) of the function or global variable to perform the match. The most common type of linker error is an unresolved symbol or name. This error occurs when a function or global variable is used but the linker cannot find a match for the name.

Execution Error

These errors occur at the time of execution when the execution of a program cannot proceed. Loop-termination conditions and arithmetic errors are the general causes for such happenings.

Logical Errors

These errors solely depend on the logical thinking of the programmer. Such errors are easy to detect if we follow the line of execution and determine why the program takes that path of execution, provided the programmer uses structured programming. Otherwise, it may take long to detect such errors.

Common Instances of Errors

A variety of common errors that may appear in a program is listed below.

There are some errors that a novice programmer often makes. The following examples will highlight the basic mistakes that may occur during the writing of C programs at the initial stage.

Remember to start and end comments correctly

Comment begin: /*
Comment end: */

The assignment operator ‘=’ is often mistaken for the test operator ‘==’

This mistake is easy to make since many other programming languages use ‘=’ as a test operator. The compiler may not discover this type of error since the assignment has a value and, therefore, is legal in an expression. For example, in the following code segment:

if (number = 9)
     number = 0;

the variable number is assigned the value 0 since the 'number = 9' assignment always returns a true value.

The operator ‘=’ is used exclusively for assignment and returns the value assigned. The == operator is used exclusively for comparison and returns an integer value (0 for false, 1 for true). Because of these return values, the C compiler does not flag an error when the operator ‘=’ is used by mistake in place of the comparison operator ‘==’.

Errors with missing parenthesis

Missing brackets around assignments cause errors that the compiler would not be able to detect. For example, in the line:

while (ch = getchar() != EOF)

'ch' will be assigned the value 1 for TRUE until 'getchar()' returns EOF, where ‘ch’ is assigned the value 0 for FALSE. The correct form should be:

while ((ch = getchar()) != EOF)

Use of operators may have side effects

The use of auto increment/decrement operators may lead to unwanted side effects in a program. For example, the statement:

a[n] = n++;

is equivalent either to:

a[n] = n;
n = n + 1;

or to:

a[n + 1] = n;
n = n + 1;

Which case holds, depends on the compiler.

Non-terminated comment

Non-terminated comment is one that is accidentally terminated by some subsequent comment, with the code in-between swallowed.

a=b; /* this is a bug
c=d; /* c=d will never happen */

Accidental assignment/accidental booleans

The statement: if (a=b) c; /* a always equals b, but c will be executed if b!=0 */

Depending on your viewpoint, the bug in the language is that the assignment operator is too easy to confuse with the equality operator; or, maybe the bug is that C does not much care what constitutes a boolean expression: (a=b) is not a boolean expression!

Closely related to this lack of rigour in booleans, consider this construction:

if (0 < a < 5) c; /* this "boolean" is always true! */

The expression (0<a<5) is always true because (0<a) generates either 0 or 1 depending on if (0<a), and then compares the result to 5, which is always true.

Consider this:
    if(a =! b) c; /* this is compiled as (a = !b), an assignment, rather than
                                   (a != b) or (a == !b) */

Switch statement

Remember to use a break in the switch construct carefully. If you forget a break, the program will continue until it meets another break or the end of the switch block has been reached.

Consider the following switch statement where case 1 is missing. In this case, C does not break out of a switch statement.

#include <stdio.h>
main ()
{
      int x = 2;
      switch(x) {
                  case 2: printf("Two\n");
                  case 3: printf("Three\n");
      }
}

The output will be:

Two
Three

Scanf errors

There are two types of common errors while scanf() is used in a program.

  1. Forgetting to put an address_of operator(&) on arguments scanf() must have the address of the variable to store input into. This means that often the address_of operator (&) is required to compute the addresses. Here is an example:

    int x;
    char *st = malloc(31);
    scanf("%d", &x);  /* & required to pass address to scanf()    */
    scanf("%30s", st); /* NO & here, st itself points to variable!   */

    The last line above shows that sometimes arguments free from ampersand(&) is correct!, because the variable st itself holds an address.

  2. Using the wrong conversion character for operand C compilers do not check whether the conversion characters are correctly used for arguments of a scanf() call. The most common errors are using the %f format for doubles (which must use the %lf format), and mixing up %c and %s for characters and strings.

Size of arrays

Arrays in C always start at index 0. This means that an array of 10 integers defined as:

int a[10];

has valid indices from 0 to 9, not 10. It is very common for students to go too far in an array. This can lead to unpredictable behaviour of the program. For example, an array reference a[21] will not generate an error. It gives the content of the address (a+21), i.e., the value of *(a+21). Note that the array name a is nothing but a pointer constant that holds the address & a[0]. So, one should be careful when an array reference is made.

Integer division

Unlike Pascal, C uses the ‘/’ operator for both real and integer division. It is important to understand how C determines which one will it do. If both operands are of an integer type, integer division is used; else, real division is used.

For example:

double half = 1/2;

This code sets half to 0 and not 0.5, since 1 and 2 are integer constants. To make this error-free, change at least one of them to a real constant as shown below:

double half = 1.0/2;

If both operands are integer variables and real division is desired, cast one of the variables to double (or float).

int x = 5, y = 2;
double d = ((double)(x))/y;

Loop errors

In C, a loop repeats the very next statement after the loop statement, if the loop condition is fulfilled. Notice that the code:

int  x = 5;
while(x > 0);
    x--;

leads to an infinite loop. Why? The semicolon after the ‘while’ defines the statement to repeat as the null statement (which does nothing). So, remove the semicolon and the loop works as expected.

Another common loop error is to iterate, one too many times, or, one too few. Check loop conditions carefully.

Prototype

Prototypes tell the compiler important features of a function: the return type and the parameters of the function. If no prototype is given, the compiler assumes that the function returns an int and can take any number of parameters of any type. One important reason to use prototypes is to let the compiler check for errors in the argument lists of function calls. However, a prototype must be used if the function does not return an int. For example, the sqrt() function returns a double, not an int. The following code:

double x = sqrt(2);

will not work correctly if a prototype:

double sqrt(double);

does not appear above it.

Without a prototype, the C compiler assumes that sqrt() returns an int. Since the returned value is stored in a double variable, the compiler inserts code to convert the value to a double. This conversion is not needed and will result in the wrong value. The solution to this problem is to include the correct C header file that contains the sqrt() prototype, math.h. For functions that one write, one must either place the prototype at the top of the source file or create a header file and include it.

Remember brackets with pointers and increment/decrement operators

Consider the following lines of code:

char line[80], *lp;
lp = line;
ch = *lp++; /* the same as *(lp++) */

In the example given above, ch is assigned what lp is pointing at, after which lp is incremented. If we change the last line as:

ch = (*lp)++;

then ch is assigned what lp is pointing at, but in this case it is the value that lp is pointing at which gets incremented by one.

Errors caused by using a pointer as argument to a function

If a function prototype is given as:

void convert(int *px);

where the argument is a pointer to an integer, the function has to be called with a pointer as its argument. For example:

int result;
convert(&result);

The typical error is caused by omission of the address_of operator ‘&’.

Non-initialized pointers

Anytime a pointer is in use, it should point to something specific. That is, a question can be raised: What does this pointer point to? If the answer to this question is not known, it is likely that it does not point to anything. Here is an example of this type of error:

#include <string.h>
int main()
{
      char  *st;   /* defines a pointer to a char or char array */
      strcpy(st, "abc); /* what char array does st point to?? */
      return 0;
}

How to do this correctly? The answer is, either use an array or dynamically allocate an array. So, the code should be like:

#include <string.h>
int main()
{
  char   st[20];    /* defines an char array */
  strcpy(st, "abc"); /* st points to char array */
  return 0;
}

An alternative way is as in the following:

#include <string.h>
#include <stdlib.h>
int main()
{
    char *st;
    st = (char *)malloc(20); /* st points to dynamically allocated
                                memory of 20 contiguous bytes */
    strcpy(st, "abc");
    free(st);  /* don't forget to deallocate when done! */
    return 0;
}

String- and character-constant confusions

C considers character and string constants as very different things. Character constants are enclosed in single quotes and string constants are enclosed in double quotes. String constants act as a pointer to the actually string. Consider the following code:

char     ch = 'A';     /* correct */
char     ch = "A";     /* error   */

The second line assigns the character variable ch to the address of a string constant. This should generate a compiler error. The same should happen if a string pointer is assigned to a character constant as shown below:

const char *st = "A";  /* correct */
const char *st = 'A';  /* error   */

Comparing strings with relational operators

Never use the relational operator ‘==’ (or others like <, >, <=, >=etc) to compare the value of strings. Strings are character arrays. The name of a character array acts like a pointer to the string. Consider the following code:

char st1[] = "abc";
char st2[] = "abc";
if (st1 == st2)
      printf("Yes");
else
      printf("No");

This code prints out No because the == operator is comparing the pointer values of st1 and st2, not the data pointed to, by them. The correct way to compare string values is to use the strcmp() library function (Be sure to include string.h). If the code above is replaced with the following:

if (strcmp(st1,st2) == 0)
        printf("Yes");
else
        printf("No");

the code will print out Yes.

Strings without NULL terminator

C assumes that a string is a character array with a terminating NULL character. This NULL character has ASCII value 0 and can be represented as just 0 or ‘\0’. This value is used to mark the end of meaningful data in the string. If this value is missing, string functions will keep processing data past the end of the meaningful data and often past the end of the character array itself, until it happens to find a zero byte in memory. Most C library string functions that create strings will always properly terminate them with a ‘\0’. Some do not (e.g., strncpy()). Be sure to read their descriptions carefully.

Not leaving room for the NULL terminator

A C string must have a null terminator at the end of the meaningful data in the string. A common mistake is not to allocate room for this extra character. For example, the string defined below,

char    str[30];

has room for only 29 (not 30) actually data characters, since a null must appear after the last data character. This can also be a problem with dynamic allocation. Below is the correct way to allocate a string to the exact size needed to hold a copy of another:

char    *copy_str = (char *)malloc(strlen(orig_str) + 1);
strcpy(copy_str, orig_str);

The common mistake is to forget to add one to the return value of strlen(). The strlen() function returns a count of the data characters that does not include the null terminator. This type of error can be very hard to detect. It might not cause any problems or cause problems only in extreme cases. In the case of dynamic allocation, it might corrupt the heap (the area of the program’s memory used for dynamic allocation) and cause the next heap operation (malloc(), free(), etc.) to fail.

Input/Output errors

The fgetc(), getc(), and getchar() functions, all return an integer value. For example, the prototype of fgetc() is:

int    fgetc(FILE *);

Sometimes, this integer value is really a simple character, but there is one very important case where the return value is not a character.

EOF

A common misconception is that files have a special EOF character at the end. EOF is an integer error code returned by a function. Here is an incorrect way to use fgetc():

int count_line_size(FILE *fp)
{
      char  ch;
      int cnt = 0;
      while ((ch = fgetc(fp)) != EOF && ch != '\n')
            cnt++;
      return cnt;
}

The problem occurs in the condition of the while loop. For illustration, the loop is rewritten to show what C will do behind the scenes:

while ((int) (ch = (char) fgetc(fp)) != EOF && ch != '\n')
   cnt++;

The return value of fgetc(fp) is cast to char to store the result into ch. Then, the value of ch must be cast back to an int to compare it with EOF. So, what next? Casting an int value to a char and then back to an int may not give back the original int value. This means, in the example above if fgetc() returns the EOF value, the casting may change the value so that the comparison later with EOF would be false.

What is the solution? Make the ch variable an int, as shown below:

int count_line_size (FILE *fp)
{
       int ch;
       int cnt = 0;
       while((ch = fgetc(fp)) != EOF && ch != '\n')
             cnt++;
       return cnt;
}

Now, the only hidden cast is in the second comparison:

while((ch = fgetc(fp)) != EOF &&  ch != ((int) '\n'))
      cnt++;

This cast has no harmful effects at all. So, the moral of all this is: always use an int variable to store the result of the fgetc(), getc(), and getchar().

Another widespread misunderstanding is regarding how the C function feof() works. Here is an example of a misuse of the function feof():

#include <stdio.h>
int    main()
{
       FILE  *fp = fopen("test.txt", "r");
       char  line[100];
       while(! feof(fp))
       {
            fgets(line, sizeof(line), fp);
            fputs(line, stdout);
       }
       fclose(fp);
       return 0;
}

This program will print out the last line of the input file twice. Why? After the last line is read in and printed out, feof() will still return 0 (false) and the loop will continue. The next fgets() fails and, so, the line variable holding the contents of the last line is not changed and is printed out again. After this, feof() will return true (since fgets() failed) and the loop ends.

How should this be fixed? One way is the following:

#include <stdio.h>
int    main()
{
       FILE  *fp = fopen("test.txt", "r");
       char line[100];
       while(1)
       {
            fgets(line, sizeof(line), fp);
            if (feof(fp)) /* check for EOF right after fgets()*/
                  break;
            fputs(line, stdout);
       }
       fclose(fp);
       return 0;
}

However, this is not the best way. There is really no reason to use feof() at all. C input functions return values that can be used to check for EOF. For example, fgets returns the NULL pointer on EOF. Here’s a better version of the program:

#include <stdio.h>
int main()
{
       FILE  *fp = fopen("test.txt", "r");
       char line[100];
       while( fgets(line, sizeof(line), fp) != NULL)
            fputs(line, stdout);
       fclose(fp);
       return 0;
}

Leaving characters in the input buffer

C input (and output) functions buffer data. Buffering stores data in memory and only reads (or writes) the data from (or to) I/O devices when needed. Reading and writing data in big chunks is much more efficient than a byte (or character) at a time. Often, the buffering has no effect on programming.

One place where buffering is visible is input using scanf(). The keyboard is usually line-buffered. This means that each line input is stored in a buffer. Problems can arise when a program does not process all the data in a line, before it wants to process the next line of input. For example, consider the following code segment:

int     x;
char    st[31];
printf("Enter an integer: ");
scanf("%d", &x);
printf("Enter a line of text: ");
fgets(st, 30, stdin);

The fgets() will not read the line of text that is typed in. Instead, it will probably just read an empty line. In fact, the program will not even wait for an input for the fgets() call. Why? The scanf() call reads the characters needed that represent the integer number read in, but it leaves the ‘\n’ in the input buffer. The fgets() then starts reading data from the input buffer. It finds an ‘\n’ and reads it without needing any additional keyboard input. What’s the solution? One simple method is to read and dump all the characters from the input buffer until a ‘\n’ after the scanf() call. Since this is something that might be used in lots of places, it make sense to use a function call for this. Here is a function that does just this:

/* function dump_line
*  This function reads and through out extra characters on the cur-
   rent input buffer
*  Parameter: fp - pointer to a FILE to read characters from
*/
 void dump_line(FILE * fp)
 {
   int ch;
   while((ch = fgetc(fp)) != EOF && ch != '\n')
   ; /* null body */
 }

The following code shows the correct use of the above function:

int x;
char st[31];
printf("Enter an integer: ");
scanf("%d", &x);
dump_line(stdin);
printf("Enter a line of text: ");
fgets(st, 30, stdin);

One incorrect solution is to use the function fflush(stdin);

This will compile, but its behaviour is undefined by the ANSI C standard. The fflush() function is only meant to be used on streams open for output, not input.

Returning local array

Consider the following code fragment, a function to multiply two 3×3 matrices and return the result:

int *mat_multiply (int mat1[3][3], int mat2[3][3])
{
       int prod[3][3];
       int i, j, k;
       for (i=0; i<3; ++i)
              for (j=0, prod[i][j]=0; j<3; ++j)
                     for (k=0; k<3; ++k)
                         prod[i][j] += mat1[i][k] * mat2[k][j];
       return prod;
}

The problem with this program is that it returns a local array. The local array prod[3][3] is created in the stack when the function mat_multiply is called. When the function ends, the lifetime of the local array ends too. Thus, there is no surety that the location pointed to by prod will be valid after subsequent function calls. The correct way is to use dynamic memory, which would be created in heap and, thus, has a larger lifetime. So, the correct code is:

int *mat_multiply (int mat1[3][3], int mat2[3][3])
{
       int *prod = (int *) malloc(3*3*sizeof(int));
       int i, j, k;
       for (i=0; i<3; ++i)
           for (j=0, *(prod+3*i+j)=0; j<3; ++j)
                     for (k=0; k<3; ++k)
                                   *(prod+3*i+j) += mat1[i][k] * mat2[k][j];
       return prod;
}