Lesson #10

The Library Class String

Overview

In this lesson I will cover the following topics:

  1. Why you want to know about standard strings.

  2. What are the important operations that can be performed on standard strings.

Body

std::string.

I have been talking about how you can dynamically allocate memory in C++ and about how you can use constructors, assignment operators, and destructors to ensure that dynamically allocated memory is handled properly. This is fine, but the reality is that 99% of the time, when writing real programs, you never have to worry about dynamic memory. The reason is that the C++ standard library comes with a number of very useful classes (they are really templates) that, among other things, do all the dynamic memory handling internally. These library components come with all the necessary constructors, destructors, and assignment operators so that they take care of themselves. You just use them.

Yet I think there was still value in talking about all those details earlier. Now you have a good appreciation for the issues and for how C++ deals with them. When I talk about how standard strings can "grow and shrink dynamically" and about how you don't have to do anything special to manage that, you'll understand how that was accomplished. The library classes don't do magic. They were created using the same C++ that you can use.

Let's talk about strings!

In C you need to use an array of characters to represent a string of text. There are many problems with that.

  1. Arrays are normally statically sized. You must decide ahead of time how much space you might need. If you actually need more space, then you are out of luck.

  2. You can't assign one string to another with an equal sign. In C, this is an error:

          char string_1[128], string_2[128];
          string_1 = string_2;
        
  3. You can't compare strings with ==. In C, this compiles, but it doesn't do what you want:

          char string_1[128], string_2[128];
          if( string_1 == string_2 ) { ...
        

    This compiles because the compiler regards the array names as the addresses of the arrays. Thus, you are really comparing two addresses here and not the material stored in the two arrays.

The C++ standard library contains a real string type. It does what you want.

  1. It can hold strings of any length. It will grow and shrink dynamically to accomidate whatever you put into it (provided there is enough system memory).

  2. You can assign one string to another.

          #include <string>
    
          std::string string_1, string_2;
          string_1 = string_2;
            // Works. The object string_2 is properly copied into string_1.
        
  3. You can compare strings with == and it does what you want.

          #include <string>
    
          std::string string_1, string_2;
          if( string_1 == string_2 ) { ...
        

Naturally this is accomplished using operator overloading, dynamic memory allocation, and so forth. However, as a user of std::string, you don't need to worry about that.

Let's take a closer look at what std::string offers.

Basic operations.

The easiest way to demonstrate what you can do with std::string is to show you a few short programs that use them. These programs are artificial. They don't do anything useful. They just show you strings in a proper context.

Here are the various ways you can construct a string.

#include <string>

using std::string;
  // This keeps me from having to put std:: in front of string. It
  // informs the compiler that I will be using the name "string" from
  // namespace std and that it should handle that name as if I had
  // declared at at this scope.

int main( )
{
    string object_1;
      // Default constructor builds an empty string.

    string object_2( object_1 );
      // Copy constructor initializes a string from another string.

    string object_3( "Hello, World" );
      // Initializing a string from a C-style array of characters.

    string object_4( object_3, 1, 3 );
      // Specifying a subrange of a string using a start position
      // and a character count. String indices start at zero so
      // in this example object_4 gets "ell".

    string object_5( 128, 'C' );
      // Creates a string of 128 'C' characters.

    string object_6( "\000\001\002\003", 4 );
      // Initializes object_6 to four characters from the char*
      // provided. Notice that std::strings can hold embedded null
      // characters.


    // Assignments follow the same pattern as above.

    object_1 = object_2;
    object_1 = "Hello, World";
    object_1 = 'C';

    object_1.assign(object_3, 1, 3);
    object_1.assign(128, 'C');
    object_1.assign("\000\001\002\003", 4);

    return 0;
}

There are several different, overloaded assignment operators and constructors making it easy to create a string. The assign member functions do assignment too (meaning they properly clean up the target object before giving it a new value), but they take more than one parameter. You can't use an equal sign for that because there is no suitable syntax. Notice that you can't initialize a string with a single character, but that you can assign a single character to a string.

The standard strings can contain embedded null characters. You can use std::string to hold completely general binary data. This is something the C standard library does not support very well.

The functions that require a start position must have that position in range, or you will get an exception. However, substring lengths can be over the end of the string and that is taken to mean the rest of the string. The special value std::string::npos is the largest possible value for a string index or length, and it is often used to indicate "all" or "error" depending on the context.

This little program demonstrates string I/O operations. Naturally there are overloaded inserter and extractor operators for strings.

#include <iostream>
#include <string>

int main( )
{
    std::string text( "Hello, World" );

    std::cout << "I say: " << text << "\n";
      // Writes the string to the output stream.

    std::cin >> text;
      // Gets a whitespace delimited word from the input stream.
      // The word can be of any length.

    std::getline( std::cin, text );
      // Gets a line of text from the input stream. The line can
      // be of any length and can contain embedded white space.
      // The '\n' character is removed from the stream but not
      // included in the string.

    std::getline( std::cin, text, '+' );
      // Gets a line of text from the input stream using '+' as
      // the delimiter between lines. The delimiter character
      // is removed from the stream but not included in the
      // string.

    return 0;
}

Notice how the extractor operator gets individual words. If you want to get entire lines, use the getline function. Since strings can expand to any size, this getline function can read lines of any length. This is why the standard does not need the get_long_line function I wrote for an earlier lesson.

std::string line;

// Read the input one line at a time. Support lines of any length.
while( std::getline( std::cin, line ) ) {
  // Process line
}

The getline function returns a reference to the stream it has been given. In a boolean context, a stream will be "false" when it has reached the end of the file (or an error). Thus, the above loop does just what you want.

Here is a program that demonstrates the various ways you might access the individual characters of a string.

#include <string>

int main( )
{
    std::string object( "abcdefghijklmnopqrstuvwxyz" );
    char ch;

    ch = object[2];
      // This assigns 'c' to ch. The indexing operator is not
      // range checked for performance reasons. If you try
      // to access a character off the end of the string
      // you will get undefined behavior.

    object[2] = 'C';
      // The indexing operator returns a reference allowing
      // individual characters in the string to be modified.

    ch = object.at(2);
      // The at() function does range checking on the index. It
      // throws an exception if the given index is out of range.

    object.at(2) = 'C';
      // The at() function also returns a reference allow it to be
      // used on the left side of an assignment operation.

    const char *p1 = object.c_str( );
      // Get a pointer to the internal array. This function will
      // insure that the array is null terminated.

    const char *p2 = object.data( );
      // Like above but the returned array is not null terminated.
      // This is appropriate if you are using strings to hold
      // binary information.

    std::string substring = object.substr( 3, 4 );
      // Starting at index 3, take 4 characters from 'object' and make
      // a new string out of them. In this case 'substring' gets
      // "defg".

    return 0;
}

Strings provide an overloaded version of [] to make accessing the individual characters look like accessing an array (C programmers like this). To enhance speed, the [] operator does no checking. If you try to access a character that is out of range, the result is undefined. This is also just like C arrays. However, if you want to be more careful, you can use the at method. it works just like [] except that if you go out of range it throws an exception. Presumably if you are using at, you have prepared yourself to handle such exceptions and thus avoid serious problems.

Since C code expects "old style" pointers to null-terminated arrays of characters (so-called "c-strings"), the standard string class provides a method that will give you such a pointer. The result of object.c_str() is a pointer to a null terminated array of characters containing the same text as the string. You can use this pointer with C functions that you want to call.

Be aware, however, that any operation on the string that causes it to change its size is likely to invalidate the pointer c_str (or data) returns. This is because the string object might have to move the string in memory as part of a reallocation operation. The address c_str returns would then be left pointing off into space. If you can avoid mixing new and old style strings, you should.

You can also compare strings with the usual comparison operators. Standard strings also provide comparison member functions for directly comparing substrings. This was done for efficiency reasons. Although you could extract a substring as a separate operation, it is generally faster to compare substrings "in place".

You can use += to append strings, C-style strings, or characters to the end of a string. You can also use infix +. Here's how that might look:

#include <string>

int main( )
{
    std::string object_1( "Hello" );
    std::string object_2( "World!" );

    object_1 += ' ';
    object_1 += object_2;
    object_1 += ' ';
    object_1 += "So there!";
      // object_1 now contains "Hello World! So there!"

    object_1 = object_1 + ' ' + object_2 + ' ' + "So there!";
      // object_1 now contains "Hello World! So there! World! So there!"

    return 0;
}

Note that in general the use of += will be faster than the use of the infix operators. However, that is often not a major issue and the infix operators are too nice to give up.

Searching and modifying.

In addition to appending one string onto the end of another there are a number of insert methods.

#include <string>

int main( )
{
    std::string object_1( "Hello, World" );
    std::string object_2( "WOW" );

    object_1.insert( 2, "Hi" );
      // Inserts "Hi" before position 2. object_1 becomes "HeHillo, World".

    object_1.insert( 5, object_2 );
      // Same idea. object_1 is now "HeHilWOWlo, World".

    object_1.insert( 10, 5, 'x' );
      // Same idea. Inserts 5 copies of 'x' at the indicated position.
      // object_1 is now "HeHilWOWloxxxxx, World".

    return 0;
}

Notice that the insertions occur before the indicated position. Thus, a position of zero causes the new material to be inserted at the beginning of the string.

There are many ways to search a string.

#include <string>

int main( )
{
    std::string object_1( "abcdefghijklmnopqrstuvwxyz" );
    std::string object_2( "jkl" );
    int         index;

    index = object_1.find( "cde" );
      // Index is 2 because "cde" starts at position 2 in the string.

    index = object_1.find( object_2 );
      // Index is 9 because "jkl" starts at position 9 in object_1.

    index = object_1.find( 'a' );
      // Index is 0.

    index = object_1.find( 'a', 16 );
      // Index is std::string::npos (meaning "error") because there is
      //  no 'a' in the string starting at position 16 and beyond.

    return 0;
}

There is a version of find that takes a starting position for the c-string and std::string arguments too. The quantity std::string::npos is a special value that is used to indicate an error in this context. It is a value that could never be a legitimate position.

There are many more find methods. The rfind family searches the string backwards. The find_first_of family searches for one of a set of characters all at once. The find_first_not_of family searches for a character not in a given set of characters. There are also find_last_of and find_not_last_of methods. For the most part you don't have to worry about these exotic options.

You can also replace substrings with other strings. In the interests of brevity, I will not present any of those methods right now. However, I will mention the erase method. It is too useful to skip.

#include <string>

int main( )
{
    std::string object_1( "Hello, World!" );

    object_1.erase( 2, 3 );
      // Erases three characters starting at position 2. Collapses the
      // string accordingly. object_1 is now "He, World!"

    object_1.erase( 5 );
      // Erases everything after (and including) position 5. object_1 is
      // now "He, W".

    object_1.erase( );
      // Erases the entire string. object_1 is now empty.

    return 0;
}

Do I ever need arrays of characters?

Now that you know something about C++'s standard library string class you might be wondering why any C++ program would use an array of characters to represent a string. In fact, there aren't many good reasons. From now on I encourage you to use std::string for your string handling needs. C-style arrays of characters are a throwback to the dark ages! With std::string you get fully dynamic strings that can be copied and compared in a natural way. All the memory management is done for you and the std::string destructor cleans up all the memory associated with the string.

The only time you really need to fall back on C style arrays of characters is when you are trying to interface with a C library.

Summary

  1. The C++ standard library comes with a real string type that can be copied and used in a natural way. Standard strings can be arbitrarily long and support completely general binary data.

  2. In the lesson above I showed you a number of operations that can be applied to standard strings. Although there are many more methods defined in class std::string that I'm showing, the operations above are some of the more commonly needed ones.

  3. You can ask a standard string to give you a pointer to a null-terminated array of characters using the c_str method. In this way, you can use standard strings with C-style string handling functions if necessary. Be aware, however, that any operation you perform on a standard string that causes its size to change may cause any past pointer returned by c_str to become invalid. Furthermore, you can't modify the array pointed at by the pointer returned by c_str without causing undefined effects.

  4. The special value std::string::npos is used to indicate "error" or "all" in the standard string handling functions, depending on the context.

© Copyright 2023 by Peter Chapin.
Last Revised: July 19, 2023