C Programming - Standard Library Functions

5.
What standard functions are available to manipulate strings?

Short answer: the functions in <string.h>.

C doesn't have a built-in string type. Instead, C programs use char arrays, terminated by the NUL ('\0') character.

C programs (and C programmers) are responsible for ensuring that the arrays are big enough to hold all that will be put in them. There are three approaches:

1. Set aside a lot of room, assume that it will be big enough, and don't worry what happens if it's not big enough (efficient, but this method can cause big problems if there's not enough room).

2. Always allocate and reallocate the necessary amount of room (not too inefficient if done with realloc; this method can take lots of code and lots of runtime).

3. Set aside what should be enough room, and stop before going beyond it (efficient and safe, but you might lose data).

There are two sets of functions for C string programming. One set (strcpy, strcat, and so on) works with the first and second approaches. This set copies or uses as much as it's asked to—and there had better be room for it all, or the program might be buggy. Those are the functions most C programmers use. The other set (strncpy, strncat, and so on) takes the third approach. This set needs to know how much room there is, and it never goes beyond that, ignoring everything that doesn't fit.

The "n" (third) argument means different things to these two functions:

To strncpy, it means there is room for only "n" characters, including any NUL character at the end. strncpy copies exactly "n" characters. If the second argument doesn't have that many, strncpy copies extra NUL characters. If the second argument has more characters than that, strncpy stops before it copies any NUL character. That means, when using strncpy, you should always put a NUL character at the end of the string yourself; don't count on strncpy to do it for you.

To strncat, it means to copy up to "n" characters, plus a NUL character if necessary. Because what you really know is how many characters the destination can store, you usually need to use strlen to calculate how many characters you can copy.

The difference between strncpy and strncat is "historical." (That's a technical term meaning "It made sense to somebody, once, and it might be the right way to do things, but it's not obvious why right now.")

An example of the "string-n" functions.

#include <stdio.h>
#include <string.h>
/*
Normally, a constant like MAXBUF would be very large, to
help ensure that the buffer doesn't overflow.  Here, it's very
small, to show how the "string-n" functions prevent it from
ever overflowing.
*/
#define MAXBUF 16
int
main(int argc, char** argv)
{
        char buf[MAXBUF];
        int i;
        buf[MAXBUF - 1] = '\0';
        strncpy(buf, argv[0], MAXBUF-1);
        for (i = 1; i < argc; ++i) {
                strncat(buf, " ",
                  MAXBUF - 1 - strlen(buf));
                strncat(buf, argv[i],
                  MAXBUF - 1 - strlen(buf));
        }
        puts(buf);
        return 0;
}

strcpy and strncpy copy a string from one array to another. The value on the right is copied to the value on the left; think of the order as being the same as that for assignment.

strcat and strncat "concatenate" one string onto the end of another. For example, if a1 is an array that holds "dog" and a2 is an array that holds "wood", after calling strcat(a1, a2), a1 would hold "dogwood". strcmp and strncmp compare two strings. The return value is negative if the left argument is less than the right, zero if they're the same, and positive if the left argument is greater than the right. There are two common idioms for equality and inequality:

if (strcmp(s1, s2)) {
    /* s1 != s2 */
}

and

if (! strcmp(s1, s2)) {
    /* s1 == s2 */
}

This code is not incredibly readable, perhaps, but it's perfectly valid C code and quite common; learn to recognize it. If you need to take into account the current locale when comparing strings, use strcoll.

A number of functions search in a string. (In all cases, it's the "left" or first argument being searched in.) strchr and strrchr look for (respectively) the first and last occurrence of a character in a string. (memchr and memrchr are the closest functions to the "n" equivalents strchr and strrchr.) strspn, strcspn (the "c" stands for "complement"), and strpbrk look for substrings consisting of certain characters or separated by certain characters:

n = strspn("Iowa", "AEIOUaeiou");
/* n = 2; "Iowa" starts with 2 vowels */
n = strcspn("Hello world", " \t");
/* n = 5; white space after 5 characters */
p = strbrk("Hello world", " \t");
/* p points to blank */
strstr looks for one string in another:
p = strstr("Hello world", "or");
/* p points to the second "o" */

strtok breaks a string into tokens, which are separated by characters given in the second argument. strtok is "destructive"; it sticks NUL characters in the original string. (If the original string should be changed, it should be copied, and the copy should be passed to strtok.) Also, strtok is not "reentrant"; it can't be called from a signal-handling function, because it "remembers" some of its arguments between calls. strtok is an odd function, but very useful for pulling apart data separated by commas or white space.

The below program shows a simple program that uses strtok to break up the words in a sentence.

/* An example of using strtok. */
#include <stdio.h>
#include <string.h>
static char buf[] = "Now is the time for all good men ...";
int main()
{
        char* p;
        p = strtok(buf, " ");
        while (p) {
                printf("%s\n", p);
                p = strtok(NULL, " ");
        }
        return 0;
}


6.
How do I determine whether a character is numeric, alphabetic, and so on?

The header file ctype.h defines various functions for determining what class a character belongs to. These consist of the following functions:

Function Character Class Returns Nonzero for Characters
isdigit()-Decimal digits-0-9
isxdigit()-Hexadecimal digits-0-9, a-f, or A-F
isalnum()-Alphanumerics-0-9, a-z, or A-Z
isalpha()-Alphabetics-a-z or A-Z
islower()-Lowercase alphabetics -a-z
isupper()-Uppercase alphabetics-A-Z
isspace()-Whitespace-Space, tab, vertical tab, newline, form feed, or carriage return
isgraph()-Nonblank characters-Any character that appears nonblank when printed (ASCII 0x21 through 0x7E)
isprint()-Printable characters-All the isgraph() characters, plus space
ispunct()-Punctuation-Any character in isgraph() that is not in isalnum()
iscntrl()-Control characters-Any character not in isprint() (ASCII 0x00 through 0x1F plus 0x7F)

There are three very good reasons for calling these macros instead of writing your own tests for character classes. They are pretty much the same reasons for using standard library functions in the first place. First, these macros are fast. Because they are generally implemented as a table lookup with some bit-masking magic, even a relatively complicated test can be performed much faster than an actual comparison of the value of the character.

Second, these macros are correct. It's all too easy to make an error in logic or typing and include a wrong character (or exclude a right one) from a test.

Third, these macros are portable. Believe it or not, not everyone uses the same ASCII character set with PC extensions. You might not care today, but when you discover that your next computer uses Unicode rather than ASCII, you'll be glad you wrote code that didn't assume the values of characters in the character set.

The header file ctype.h also defines two functions to convert characters between upper- and lowercase alphabetics. These are toupper() and tolower(). The behavior of toupper() and tolower() is undefined if their arguments are not lower- and uppercase alphabetic characters, respectively, so you must remember to check using islower() and isupper() before calling toupper() and tolower().


7.
What is a "locale"?

A locale is a description of certain conventions your program might be expected to follow under certain circumstances. It's mostly helpful to internationalize your program.

If you were going to print an amount of money, would you always use a dollar sign? Not if your program was going to run in the United Kingdom; there, you'd use a pound sign. In some countries, the currency symbol goes before the number; in some, it goes after. Where does the sign go for a negative number? How about the decimal point? A number that would be printed 1,234.56 in the United States should appear as 1.234,56 in some other countries. Same value, different convention. How are times and dates displayed? The only short answer is, differently. These are some of the technical reasons why some programmers whose programs have to run all over the world have so many headaches.

Good news: Some of the differences have been standardized. C compilers support different "locales," different conventions for how a program acts in different places. For example, the strcoll (string collate) function is like the simpler strcmp, but it reflects how different countries and languages sort and order (collate) string values. The setlocale and localeconv functions provide this support.

Bad news: There's no standardized list of interesting locales. The only one your compiler is guaranteed to support is the "C" locale, which is a generic, American English convention that works best with ASCII characters between 32 and 127. Even so, if you need to get code that looks right, no matter where around the world it will run, thinking in terms of locales is a good first step. (Getting several locales your compiler supports, or getting your compiler to accept locales you define, is a good second step.)


8.
Is there a way to jump out of a function or functions?

The standard library functions setjmp() and longjmp() are used to provide a goto that can jump out of a function or functions, in the rare cases in which this action is useful. To correctly use setjmp() and longjmp(), you must apply several conditions.

You must #include the header file setjmp.h. This file provides the prototypes for setjmp() and longjmp(), and it defines the type jmp_buf. You need a variable of type jmp_buf to pass as an argument to both setjmp() and longjmp(). This variable will contain the information needed to make the jump occur.

You must call setjmp() to initialize the jmp_buf variable. If setjmp() returns 0, you have just initialized the jmp_buf. If setjmp() returns anything else, your program just jumped to that point via a call to longjmp(). In that case, the return value is whatever your program passed to longjmp().

Conceptually, longjmp() works as if when it is called, the currently executing function returns. Then the function that called it returns, and so on, until the function containing the call to setjmp() is executing. Then execution jumps to where setjmp() was called from, and execution continues from the return of setjmp(), but with the return value of setjmp() set to whatever argument was passed to longjmp(). In other words, if function f() calls setjmp() and later calls function g(), and function g() calls function h(), which calls longjmp(), the program behaves as if h() returned immediately, then g() returned immediately, then f() executed a goto back to the setjmp() call.

What this means is that for a call to longjmp() to work properly, the program must already have called setjmp() and must not have returned from the function that called setjmp(). If these conditions are not fulfilled, the operation of longjmp() is undefined (meaning your program will probably crash). The program The below program illustrates the use of setjmp() and longjmp(). It is obviously contrived, because it would be simpler to write this program without using setjmp() and longjmp(). In general, when you are tempted to use setjmp() and longjmp(), try to find a way to write the program without them, because they are easy to misuse and can make a program difficult to read and maintain.

/* An example of using setjmp() and longjmp(). */
#include        <setjmp.h>
#include        <stdio.h>
#include        <string.h>
#include        <stdlib.h>
#define RETRY_PROCESS 1
#define QUIT_PROCESS  2
jmp_buf env;
int nitems;
int procItem()
{
    char    buf[256];
    if (gets(buf) && strcmp(buf, "done")) 
    {
        if (strcmp(buf, "quit") == 0)
                longjmp(env, QUIT_PROCESS);
        if (strcmp(buf, "restart") == 0)
                longjmp(env, RETRY_PROCESS);
        nitems++;
        return 1;
    }
    return 0;
}
void process()
{
    printf("Enter items, followed by 'done'.\n");
    printf("At any time, you can type 'quit' to exit\n");
    printf("or 'restart' to start over again\n");
    nitems = 0;
    while (procItem());
}
void main()
{
    for ( ; ; ) 
    {
        switch (setjmp(env)) 
        {
            case 0:
            case RETRY_PROCESS:
                    process();
                    printf("You typed in %d items.\n",
                            nitems);
                    break;
            case QUIT_PROCESS:
            default:
                    exit(0);
        }
    }
}