Lesson #4

The Concept of a Class

Overview

In this lesson I will cover the following topics:

The syntax of a class definition and how to access class members and methods.
Using a class to hide information and why that is important.
How to organize the source files containing a class.

Body

What is a class?

Much of what C++ programmers do is define and use classes. Understanding what classes are, and the basic syntax involved in using them, is of the utmost importance. This information also applies to some extent to other object-oriented languages that are inspired by C++. For example, Java's concept of a class is very similar to that of C++'s.

A class is basically a new type. C++ has the same built-in types as C: int, long, double, etc. Also like C, there are pointer types and array types ("pointer to char" or "array of 10 double"). However, C++ allows you to define entirely new types using classes. Thus, much C++ programming revolves around creating libraries of types called class libraries. C actually has a similar facility, although not nearly as advanced, in its structures. C++ classes are essentially just greatly enhanced structures.

Why would you want to create your own types? Why aren't the built-in types good enough? The built-in types are quite primitive because they are close to the machine. They represent the types of data the computer is good at manipulating. The built-in types are very efficient because of this. However, the built-in types might not have much to do with the issues of your application. It is much easier to write your program in terms of abstract types that define concepts that make sense to your problem. For example, you might define a class that represents a window. Then you could declare objects to be windows in your program as easily as you declare integers.

int    i, j, k;			// Three integers.
window primary, help, output;	// Three windows.

There is no window type in C++, but you can create your own---with suitable behavior---using classes.

As another example, suppose you wrote a program that needed to keep track of large quantities of money. The problem with money is that you need to be accurate to one cent while still being able to handle billions of dollars. The need for accuracy requires an integral type, but the required range overflows 32-bit integers. There are various ways you can handle this, but the cleanest is to create a new type, "Money" that deals with those issues internally. Then you can just declare and use Money objects in your program.

Information Hiding

Another important aspect of classes is information hiding. The internal structure of your new types should be hidden from the user of those types. For example, when you use int you don't (normally) have to worry about how integers are stored in memory. You just perform operations on them, like addition and subtraction, and you let the compiler worry about how that works. If you move your program to a machine that stores integers differently, the compiler on that system will translate your addition and subtraction operations appropriately so that they have the same meaning.

It is the same with user-defined types. The internal workings of a class should be hidden from the user of the class so that the internal workings can be changed without affecting the programs that use that class. Such changes are often necessary to fix bugs, improve performance, or add features. By hiding the internal structure of a class, such changes can be made without introducing incompatibilities. This is a very important idea.

Sounds great. So how does it work?

In this lesson I will show you the basic syntax of defining and using classes. In the next lesson I will discuss in more detail the issues around creating abstract types.

Classes have two parts: a definition and an implementation. Here is a possible definition for a class that represents money. This definition isn't complete. I'll expand on it as I go.

class Money {
private:
    long dollars;
    int  cents;
};

Basically a class is like a plain C structure. It has a name (Money) and it has members. In the example above the members are dollars and cents. Each member has its own type. I'm using long integers to represent dollars because on some machines regular integers are only 16 bits and that would only allow me to handle, at most, $32,767. Long integers are at least 32 bits and will therefore handle over two billion dollars portably (i.e., on all platforms).

The two members I've defined so far are private members. What this means is that users of class Money can't access them. For example:

int main( )
{
    Money  account_balance;

    // Try to set account_balance to $173.53
    account_balance.dollars = 173;  // Error! Member 'dollars' is private.
    account_balance.cents   = 53;   // Error! Member 'cents' is private.

    // etc...
}

The lines marked with "Error" are wrong because I can't access the private members there. My use of the dot operator makes sense because the members of a class are like the members of a structure. However, in this case, the members are private to the class and can't be touched by the class's users. This is information hiding. A future version of Money might represent money differently. The members dollars and cents might not exist in the future. A program that tries to use them now would be incompatible with such a change. To prevent such incompatibilities, I keep programs from "knowing" about the existence of those members by declaring them as private in the class definition.

Notice how I was able to declare account_balance to be of type in a natural way. This is because every class is like a new type. Once the compiler has seen the class definition it understands Money to be a type just like int or char is a type.

This is fine, but if dollars and cents are private, then how can they be given values? The answer is with methods.

class Money {
private:
    long dollars;
    int  cents;

public:
    void set( long D, int C );
};

This version of Money defines a public method named set. The set method takes two parameters and returns nothing. Because it is public, it can be used by anyone. Here is how:

int main( )
{
    Money  account_balance;

    account_balance.set( 173, 53 );
    // etc...
}

Here I apply the set operation to the account_balance object using the dot operator. Since set takes two parameters, I have to give it arguments. In my example, I give it 173 for D and 53 for C. Notice that I don't do this:

Money.set(173, 53);
  // Error! I can't apply an operation to a type; only to an object.

If I had multiple Money objects in my program I could apply set to each one independently.

int main( )
{
    Money account_balance;
    Money deposit;

    account_balance.set(173, 53);
    deposit.set(20, 97);
    // etc...
}

Each object is independent and has its own copy of the internal members. Setting the account_balance object does not in any way modify the values stored in deposit.

It might seem silly to have a set function that just sets the members. Why not make dollars and cents public members (you could do this) and access them directly? Yet doing things with private members is better. If the internal structure of Money changes, the set method could be modified so that it still works the same way. Thus programs that use set would still be correct while programs that directly accessed the members would be incompatible.

How might the set function be written? Here is how:

void Money::set( long D, int C )
{
    dollars = D;
    cents   = C;
}

This is the implementation of the Money's set method. You can tell this is a method of class Money because of the Money:: in front of its name. Other than that, it looks like any other function. It has a return type mentioned first, and then the parameters are declared later.

All this method does is copy its parameters D and C, into the members dollars and cents. It can access the private members because it is a method. Only methods have access to the private section of a class. Yet this definition still looks confusing. Who's dollars and cents are we updating here? When the caller does:

account_balance.set( 173, 53 );
deposit.set( 20, 97 );

the first time we want to set the dollars and cents members inside account_balance, and the second time we want to set the dollars and cents members inside deposit. This is, in fact, exactly what happens. When you invoke a method on a particular object, the compiler passes the address of that object into the method as a hidden parameter. When you define a method, the compiler understands that there is an extra pointer parameter being given to it that you did not write. Inside the method any time you use a member name, the compiler accesses that member via the pointer it was given. In other words the calls to set above are sort of like this:

Money__set( &account_balance, 173, 53 );
Money__set( &deposit, 20, 97 );

and the implementation of the method is sort of like this:

void Money__set( Money *this, long D, int C )
{
    this->dollars = D;
    this->cents   = C;
}

In fact, this is exactly the way most C++ compilers process methods, although the precise manner in which the methods are ultimately named is unspecified by the C++ standard. If you were trying to do this with plain C, this is how you would set it up too. C++ just makes it easier.

Organizing files

I haven't been too specific yet on just where a class's definition and implementation should go. Now I want to talk about that.

Typically the definition of a class goes into a header file. Although I could put Money into the global name space, I will use a vtsu (Vermont State University) name space for my examples. When writing C++ programs, it is generally a good idea to organize your code into various namespaces of your own creation. Thus:

// ----- Money.hpp -----

namespace vtsu {

    class Money {
    private:
        long dollars;
        int  cents;

    public:
        void set( long D, int C );
    };

}

Next I need to create a file to hold the implementations of the methods. I will call it Money.cpp

// ----- Money.cpp -----

#include "Money.hpp"
  // Include the header file so that the compiler knows what "vtsu::Money" is.

namespace vtsu {

    void Money::set( long D, int C )
    {
        dollars = D;
        cents   = C;
    }

}

Inside name space vtsu, I don't need to put vtsu:: in front of Money. The compiler understands that it should look in the vtsu name space to resolve names because that function is already inside that name space. This is like using a relative path on a filename.

Here is an example of how I might use my money class. I'll put this in a file called prog.cpp.

// ----- prog.cpp ----

#include "Money.hpp"
  // Include the header file so that the compiler knows what "vtsu::Money" is.

int main( )
{
    vtsu::Money account_balance;
      // Declare an object of type vtsu::Money.

    account_balance.set( 173, 53 );
      // ... and give it a value.

    // etc...
}

I don't need any namespace qualifiers on account_balance because it's a local object, or on set because the compiler can see that it's a member of vtsu::Money. In practice, the namespace qualifiers are really not much of a problem. They usually only appear on the type names used to declare objects. Of course, I could have also put a using namespace vtsu at the top of my program instead. Then I wouldn't have needed to use the vtsu:: qualifier anywhere.

This program consists of three files:

Money.hpp: The file containing the definition of class vtsu::Money.
Money.cpp: The file containing the implementation of class vtsu::Money (or more specifically, the implementation of that class's methods).
prog.cpp: An application program that wishes to use vtsu::Money.

In a more realistic situation the application program would be huge and consist of hundreds of source files. Any source file that needed to use vtsu::Money would just #include "Money.hpp" and be on its way. To compile this small program, I would use a command such as

$ g++ -o prog prog.cpp Money.cpp

This instructs g++ to compile prog.cpp and Money.cpp and put the combined result into the executable file named prog. There is no need to compile Money.hpp. That file is #included into the other two files as needed.

Changing the implementation

But wait! Suppose our system has 64-bit integers, I can represent Money a bit differently. Instead of using two integers, I can just use a single long integer to hold the total number of cents. This isn't feasible on a 32-bit machine because 2 billion cents (the maximum range on a 32-bit integer) is only 20 million dollars, and that's not enough. Nevertheless, on a 64-bit system I can use this version:

class Money {
private:
    long total_cents;

public:
    void set( long D, int C );
};

Notice how I have not changed the public method in any way. However, I do have to change how set works to take into account the modified internal structure.

void Money::set( long D, int C )
{
    total_cents = ( 100 * D ) + C;
}

Since I have not changed either the declarations or the meanings of the public methods, application programs that use Money will still work correctly. They just need to be recompiled. This is what information hiding is all about.

Other operations.

My current definition of Money only allows me to give it a value. That is pretty limited. Since the users have no direct access to the members, I, as the class designer, need to create methods for all operations I desire to support. This requires some thought. Creating class libraries is about specifying the operations to be supported by each class type, and then figuring out how to implement them effectively. Typically, the private section of a class is left blank while the public methods are being specified. In fact, to emphasize the idea that the private section is secondary, many programmers put the public section first in the class definition.

I will show a more detailed example of how this looks in the next lesson.

Summary

Classes are a lot like structures. Each class defines a new type and has data members that are accessed with the '.' operator. Classes can also have methods. A method is a function that is, conceptually, a member of the class (in fact, they are often called member functions in the C++ community). The methods have names that include the class name followed by a '::'. However, methods are usually invoked on a particular object using the same '.' (or '->') syntax as is used for data members.

Note: In C++, structures can also have methods. The only real difference between a class and a structure is that a structure's members are all public by default. A class's members are all private by default.
Normally only the methods of a class can access the private members. This means that you can change the internal structure of a class and only have to update the methods. You should not have to update any of the programs that use the class since those programs never directly access the members. This is an example of information hiding, and it is a very important concept.
Normally classes are defined in header files so that any program that wants to use a class can #include an appropriate header. The method bodies are normally defined in a separate .cpp file that is compiled once and linked into the final executable.