Lesson #11

The Library Vector Template

Overview

In this lesson I will cover the following topics:

What are templates and what is a container type.
The C++ standard vector template and the basic operations that can be applied to it.
Value vs reference semantics.

Body

Templates

The C++ language provides a very powerful feature called templates. This feature allows you to define generic classes and functions and then have the compiler fill in the specific types involved later. When a template is defined, one or more of the types used are just placeholders. When you use a template, you must specify which types to substitute for those placeholders. The compiler then "specializes" the template using the specific types you want.

Right now you don't need to worry about how to create a template. But you should be aware, at least, that a large part of the C++ standard library is really a library of templates. This makes it much more general and much more powerful. It also means that you need to know something about how to use templates in order to use the C++ standard library.

Here is a little history: Long ago, in ancient times, C++ did not have a template feature. In those days there were, naturally, no templates in the library (and the library was much smaller and simpler than it is today).

Modern C++ has supported templates for some time, but the heavily templated library it currently has was added during the standardization of C++ 1998. It was first developed by Alexander Stepanov at Hewlett-Packard as part of a research project in generic programming. When C++ was standardized, his library, known as the "Standard Template Library" or STL, was adopted as part of the C++ standard. Technically, STL and the C++ standard library templates are slightly different because some changes were made to Stepanov's work when the library was incorporated into the standard. However, most people use "STL" whenever they talk about the standard library templates anyway. The two are very close and for most practical purposes identical.

Standard containers

Okay, sounds good so far. But before I can show you some concrete examples, I need to talk a little about the standard containers in general terms. So far most of the objects you have used are of types like int, long, pointer to character and so forth. Such objects just hold a single value. In real programs, however, you often need to use objects that hold many values. Such objects are, in general, called containers.

You are already familiar with some containers. In C, you can use an array to hold many different values. An array is a kind of container. The problem is that arrays in C (and C++) are rather primitive. The C++ standard library contains several, much more powerful containers.

Each container in the standard library has its own special features, advantages, and disadvantages. They are organized differently internally, and as a result have different efficiency characteristics. We could spend a lot of time talking about the standard library containers, but in this course I will focus on only one: vector.

Vectors

What is a vector? A vector is a powerful replacement for simple C-style arrays. The word "vector" is being used here in its pure mathematical sense. Physicists and engineers use the word a bit differently, so don't be confused! Here is a short program to illustrate how to create and use a vector.

#include <vector>
  // You must include this header so that the compiler sees the vector
  // template definition.

int main( )
{
    std::vector<int> my_array( 100 );
      // Declares my_array to be a vector of 100 integers.

    for( int i = 0; i < 100; i++ ) {
      my_array[i] = ...;
    }

    // etc...
    return 0;
}

The declaration of my_array probably looks a little strange. Compare it to the declaration of a normal integer.

std::vector<int> my_array(100);
int              i;

The first part is the type: vector<int>. This notation means that you are asking the compiler to specialize the vector template using the type int to fill in the placeholder type used when the template was defined. The compiler understands this and generates the requested type on the fly. A vector of integers is like an array of integers. It is a container that holds integers in a contiguous sequence.

You can ask the compiler to generate any kind of vector you want. For example:

std::vector<int>               my_array1;
std::vector<double>            my_array2;
std::vector<char *>            my_array3;
std::vector<std::string>       my_array4;
std::vector<std::vector<int>>  my_array5;

Each of the vectors above is of a different type. The type "vector of int" is not the same as the type "vector of double." The compiler uses the template to "stamp out" each of these types on demand. That is the specialization process. Notice that I can create a vector of any type I like: including the library string type. In fact, I can create a vector of vectors!

In the declaration:

std::vector<int> my_array( 100 );

I'm passing 100 to the vector's constructor using the function style initialization syntax. The constructor allocates enough memory to hold the specified number of (in this case) integers. Since the constructor only takes one parameter, I could have written this declaration using the C-style syntax:

std::vector<int> my_array = 100;

But I think that looks misleading. It makes it look like I'm putting 100 into the vector and that is not what is happening at all. To prevent confusion, I always use the function style syntax when I construct my vectors.

Once the vector has been created, you can use it just like an ordinary array. In particular, you can apply square brackets to it to access its individual elements. This works thanks to an overloaded operator[] in the vector template.

my_array[i] = ...;

Because operator[] for vectors returns a reference to the appropriate element, there is no problem using it on the left-hand side of an equals sign.

Just like ordinary C arrays, if you try to access an element that doesn't exist, you will get undefined behaivor. This allows operator[] to be very fast. In fact, it is reasonable for you to expect accessing a vector element to be essentially as fast as accessing an array element. Vectors are the fastest containers in the C++ library. You should use them unless you have some specific reason to do otherwise.

Okay, but so what?

So far you might wonder what advantage there is to using vectors over C style arrays. I've made it sound as if the two containers are basically identical. In fact, there are many significant advantages to vector. For one thing, like std::string, vectors can grow and shrink dynamically. The simplest way to expand the size of a vector is using the push_back method. Here is how it might look:

#include <string>
#include <vector>

int main( )
{
    std::vector<std::string> text;
      // No size specified. The vector initially has no elements.

    text.push_back( "Hello, World!" );
      // Puts a "Hello, World" at "the back" of the vector. Currently the only element.

    text.push_back( "Goodbye World!" );
      // Puts "Goodbye World!" in the next slot after the "Hello, World!"

    return 0;
}

After the two push_back operations, text[0] contains "Hello, World!" and text[1] contains "Goodbye World!" The "back" of the vector is the end with the highest index. The zeroth element of the vector is called the "front".

Let's put this to work. Here is a program that reads the standard input device a line at a time and then, when it comes to the end of the file, it prints out all the lines in the reverse order.

#include <iostream>
#include <string>
#include <vector>

int main( )
{
    std::vector<std::string> text;
    std::string              line;

    // Read the input a line at a time and store in the vector text.
    while( std::getline( std::cin, line ) ) text.push_back( line );

    // Output the lines in reverse order.
    for( int i = text.size( ) - 1; i >= 0; --i )
        std::cout << text[i] << "\n";

    return 0;
}

The size method returns the number of elements currently in the vector. In this example, each vector element is a std::string so text[i] is being output to std::cout using the operator<< that applies to class string.

Because both vector and string are dynamic, this little program will correctly process a file of any size, containing lines of any length---as long as the system doesn't run out of memory.

You can also change the size of a vector with the resize method. It works the way you might expect.

std::vector<std::string> text;
. . .
text.resize( 1024 );

The argument to resize specifies the new number of elements. If this expands the vector, the new elements will be initialized with the default constructor (meaning empty strings in the case of a vector of strings). If this contracts the vector, the elements that are removed will be destroyed with the destructor. This is just what you want. As always, C++ ensures that objects are initialized and cleaned up in accordance to the methods defined in the object's class.

Vectors hold copies

It is important to realize that vectors, like C arrays, keep copies of the objects they are given. When you push_back a string onto the end of a vector of strings, a copy of that string is made (using the copy constructor), and the copy goes into the vector. If you change the original string, the copy in the vector is not affected. We say that vectors have "value semantics." In fact, all the C++ standard containers have value semantics. It is what C programmers tend to expect, and it is consistent with the very value-oriented approach C and C++ have in other areas as well.

One disadvantage to value semantics is that you can't keep the same object in two different containers. Take a look at this example:

#include <string>
#include <vector>

int main( )
{
    std::string              message;
    std::vector<std::string> text1, text2;  // I'm declaring two vectors here.

    message = "Hello, World!";
    text1.push_back( message );     // A copy is made for the vector.
    text2.push_back( message );     // Another copy is made.

    message = "Goodbye, World!";    // The vector elements are not changed.
    text1[0] = "Whatever!";         // This does not change text2[0].
    return 0;
}

Often the behavior above is exactly what you want. However, there are cases where you want one message to be somehow shared by both vectors so that when it is changed in one place, it automatically changes everywhere else. If vectors acted that way (and they don't), we would say they had "reference semantics".

You might think you could get the sharing effect by creating vectors of references.

std::vector<std::string &> text1, text2;

But you can't do this. References are not objects and thus can't be put in containers (you can't make an array of references either). However, pointers are objects. There is no problem creating a vector of pointers.

std::string                message;
std::vector<std::string *> text1, text2;

message = "Hello, World!";
text1.push_back( &message );    // Store the address of message.
text2.push_back( &message );    // Store the address again.

// There is only one copy of the message. The vectors just hold
// pointers to it.

message = "Goodbye, World!";
  // The message has changed for both vectors.

The problem with this is that you have to be sure message exists for as long as the vectors do. If message gets destroyed before the vectors are destroyed, the vectors will be holding an invalid pointer. Getting this right is error-prone. Nevertheless, in C and C++, which have comprehensive pointers, value semantics are more general. If you need reference semantics (and sometimes you do), you can use pointer values to get it. In other languages without explicit pointers, such as Java, the situation is different. Think about this issue when you learn those languages.

Other vector operations

Let me show you a few other useful vector methods.

#include <string>
#include <vector>

int main()
{
    std::vector<std::string> text1( 128 );
      // Vector of 128 strings. Initially all The strings are constructed
      // by the default constructor in class string. This sets them to
      // valid, empty strings.

    std::string message = "Hello, World!";

    text1.push_back( message ); // Adds a new element to the end of the vector.
    text1.pop_back( );          // Erases the last element in the vector.

    // The pop_back method contracts the vector by one element. It
    // doesn't just set the last element to a blank string.

    text1[10] = "This is text";                // Accesses an element.
    text1.at( 10 ) = "This is different text"; // Accesses an element.

    // The at method does bounds checking and throws an exception if
    // the index is out of range. The operator[] is likely to be faster.

    std::vector<std::string> text2;

    text2 = text1;
      // You can assign (and copy) one vector to another. All the elements
      // are copied just like you would want. The destination vector is
      // made to be the same size as the source vector.

    if( text1 == text2 ) {
      ...
    }
      // You can compare two vectors with ==. The result is true only if
      // the vectors are the same size and if == returns true when
      // applied to all the corresponding elements.

    std::swap(text1, text2);
      // This exchanges the contents of the two vectors. There is no
      // problem if they have different sizes.

    return 0;
}

The swap method is surprisingly fast. It works by swapping pointers inside the two vectors. The actual elements in the vectors are not copied. As a result, it runs quickly and in a time that is independent of the vector's sizes. Compare:

std::vector<std::string> text1, text2;

std::swap( text1, text2 );
  // Even if the vectors have millions of elements this is almost
  // instantaneous.

std::vector<std::string> temp = text1;
text1 = text2;
text2 = temp;
  // This also swaps text1 and text2 but it is FAR slower since it has
  // to copy vector elements around.

What is the lesson to be learned here? Use the library functions if you can. If you try to write out an operation yourself, it is likely to be much less efficient. This is a good general rule to follow everywhere in the standard template library.

There are also insert and erase methods that you can apply to a vector. The insert methods allow you to inject a new element into the middle of a vector. They expand the vector accordingly. The erase methods allow you to remove an element from the vector and contract the vector accordingly. These operations move vector elements around, and they can be time-consuming to execute on long vectors. Using these methods also requires that you know something about iterators, and we aren't ready to talk about that subject yet.

However, it is useful to know that there is a clear method that erases the entire vector.

text.clear( );
  // After this, text has no elements.

Do I ever need arrays?

Just as std::string largely replaces the need for C-style arrays of characters, std::vector largely replaces the need for C-style arrays of other types. In general, if you are writing a C++ program and find yourself declaring an array, ask yourself: why not use a vector? Unlike C-style arrays, vectors are dynamic and (just as with std::strings) they manage their memory automatically for you. Furthermore, a good implementation of the standard library should provide you with vectors that are nearly as fast as C-style arrays. You should not have to suffer a significant performance penalty for using vectors.

Summary

A template is a class (or a function) where the compiler can later fill in some types involved. For example, the vector template allows you to specify the type of object in the vector when you declare the vector. The compiler uses this information to specialize the template when it compiles your program.

A container is an object that holds other objects. The C-style array is a container, although a simple one. The C++ standard library contains several, more elaborate containers all implemented as templates (so they can contain any type of object you wish).
The C++ standard vector is a replacement for C-style arrays. It allows you to access individual elements using the [] operator just like arrays. However, unlike arrays, vectors also support the ability to grow and shrink dynamically. Vectors can also be assigned and compared in their entirety using the usual operators.
Vectors, and for that matter all the standard C++ containers, have value semantics. This means that they hold copies of the objects you place in them. After you have added an object to a vector, you can change (or destroy) the original object without affecting the copy in the vector. A container with reference semantics would not own the objects it contains but rather just have some way of getting at the objects. Such containers would allow you to effectively share objects between containers. Although C++ standard containers don't have reference semantics, you can get much the same effect, when you need it, by creating a container of pointers.