Lesson #26

Structures: Defining, Accessing, Initializing

Overview

In this lesson I will cover the following topics

What a structure is.
Accessing structure members.
Initializing structures.

Body

What is a structure?

Except for the slight digression about file I/O the last several lessons have really all been about arrays. It is true that a lot of what we've covered has been about pointers, but in C arrays and pointers are very closely related. We are now ready to talk about the other major way that you can group related data items: structures.

Let me first review the essential aspects of an array.

All the items in an array have the same type.
Although the array has a name, each item by itself does not. Instead you access the array's items using an integer index.
The items are called "elements."

A structure is basically the opposite of this.

The items in a structure can have different types.
Each item in a structure has its own name that you can use to refer to that item.
The items are called "members."

Before you can use any structures in your program you need to tell the compiler what they are going to contain. The "definition" of a structure tells the compiler what that structure's members are going to be and assigns a name to the structure overall. Once the compiler has seen the definition of a particular type of structure you can then start creating variables of that structure type. Here is an example

/* This structure type holds together all the interesting bits of
   information about a customer account. */
struct account_information {
  char last_name[32];        // The customer's last name as a string.
  char first_name[32];       // The customer's first name as a string.
  int  account_number;       // A unique account number.
  int  account_balance;      // The account balance in cents.
  int  preferred_customer;   // =1 if this is a preferred customer.
};

void f(void)
{
  // Declare a variable of type "struct account_information".
  struct account_information my_account;

  // Give that variable a value.
  strcpy(my_account.last_name, "Chapin");
  strcpy(my_account.first_name, "Peter");
  my_account.account_number     = 12345678;
  my_account.account_balance    = 0;
  my_account.preferred_customer = 1;

  ...

The definition of structure account_information is first. When the compiler sees that definition it merely takes note of the details. No memory is allocated because there are no account_information variables created (yet). My example structure is intended to contain all the information pertaining to a customer account. Each member is like a field of a database.

Notice that the members of account_information each have their own name. Notice also how they can be of different types. In fact, members can even be arrays. You can put any type of variable into a structure: even other structures. This makes it possible to define rather complicated variables that can hold large quantities of data in a well organized manner. The whole point of structures is to make it easy for you to keep related data in one place. Your program will be much easier to understand if all pieces of information that go together are, in fact, "under the same roof".

In function f I create a variable of type struct account_information. Each time you define a structure you are effectively introducing a new type into the language. Just as you can declare variables of type int or double, you can also declare variables to be of any structure type that has been defined. The variable my_account actually contains its own copy of the structure members. If I had declared other variables of type struct account_information they would have had independent copies of the members. In other words, the first_name member of my_account could hold a different value than the first_name member of some other struct account_information.

After I declare my_account I then wrote some statements that put values into my_account's members. Notice how I use a dot ('.') to separate the name of the structure variable from the name of the member. The combined name refers to that particular member of that particular structure variable. For example my_account.account_balance represents the value of the account_balance member of the my_account variable. It is an integer. The '.' operator is often called the "member selection operator". Notice that when you use the '.' operator you must provide it with the name of a structure variable on the left. Do not provide it with the name of a structure type! The following is illegal:

account_information.account_balance = 1;

It doesn't make sense. Just where is that value of 1 being put? There is no variable named account_information. That is the name of a structure type. The statement above would be similar to something like int = 1 and hopefully you can see right away the problem with that.

In my example above, last_name and first_name are names of arrays. Thus the expression my_account.last_name is the name of an array without an index. The compiler regards this, as usual, as a pointer to the first element of the array. The fact that this array happens to be a member of the my_account variable does not change that. Thus the statement

strcpy(my_account.last_name, "Chapin");

is perfectly reasonable. It copies the literal string "Chapin" into the last_name array that is a member of the my_account variable.

Structure memory layout

It might be helpful in your understanding of structures to know something about how the compiler organizes them in memory. Below is a diagram showing how the my_account variable might be layed out. Keep in mind that the precise sizes of the members depends on the system you are using, but the sizes I indicate below are typical.

Offset
        +-----+--
 0      |     | Start of the last_name array of characters
        +-----+
        |     |

          ...

        |     |
        +-----+--
32      |     | Start of the first_name array of characters.
        +-----+
        |     |

          ...

        |     |
        +-----+--
64      |     | Start of the account_number integer.
        +-----+
        |     |
        +-----+
        |     |
        +-----+
        |     |
        +-----+--
68      |     | Start of the account_balance integer.
        +-----+
        |     |
        +-----+
        |     |
        +-----+
        |     |
        +-----+--
72      |     | Start of the preferred_customer integer.
        +-----+
        |     |
        +-----+
        |     |
        +-----+
        |     |
        +-----+

The my_account variable requires 76 bytes of memory. The first 32 bytes are set aside for the last_name array. The next 32 bytes are set aside for the first_name array. The next twelve bytes are set aside for the three integers account_number, account_balance, and perferred_customer. To make things clearer I put the offset into the structure down the side of the picture. Although I don't know the actual memory address that will be used for my_account, I can say that the first_name array, for example, will be at an address 32 bytes past the address of the beginning of the structure.

Actually, it is not always clear how the compiler will lay out your structures. The ANSI C standard does require certain things. The first member will definitely be at offset zero. The members will be layed out in the same order in which you declare them. However, the compiler is allowed to put "holes" in your structure if it wants. It might do this to cause certain members to line up at certain offsets. Some processors can access data faster if it is aligned in this manner.

For most applications you don't really need to worry about structure layout. I'm presenting it here mostly just to give you a better idea of what a structure is. Remember: every variable of a structure type has its own copy of the members. If I declared another variable of type struct account_information, that other variable would also take 76 bytes and have its own memory areas for last_name, first_name, and so forth.

Structures are "first class" variables

Arrays don't really exist in the C language. You can't compare two arrays with the == operator and you can't assign one array to another with the assignment operator. However, structures do exist.

struct account_information first, second;
  // Declare two account_information structures.

...

first = second;
  // Copy all the members of second into first.

You can assign one structure to another and it does what you would expect. It copies all the members of the source structure into the corresponding members of the destination structure. Notice that in this case struct account_information contains arrays as two of its members. This does not cause a problem. The arrays are copied along with all the other members as desired. Thus even though you can't do first.last_name = second.last_name to copy the last_name member only (you would have to use strcpy), you can use the assignment operator to copy the entire structure just fine.

You might also think that you can use the == operator to compare two structures.

if (first == second) {
  // The two account informations are the same.
}

However, that doesn't work. You might guess that such an operation would be true only if all the members are the same. However, the compiler doesn't assume that meaning. Instead attempts to compare structures with the usual comparison operators are errors. If you want to do something like this, you would have to write a function.

I will say more about how structures work with functions in the next lesson.

More accessing information

Since structures can contain arrays you might be wondering how you would refer to a single element of a member array. Here is an example.

struct account_information my_account;

...

if (my_account.first_name[0] == 'P') { ... }

The expression my_account.first_name[0] refers to the zeroth character of the first_name array inside the my_account variable. To understand why this is the correct interpretation, consult an operator precedence chart. Notice that structure member reference and array element reference have the same precedence. Notice also that they associate from left to right. Thus without any additional parentheses the expression becomes

(my_account.first_name)[0]

The first_name member of my_account is accessed and then the array reference is made. This is just what you would want. My point here is that structure access can be analyzed using the same rules as any other operation. Let's take a look at another example.

struct time {
  int seconds;
  int minutes;
  int hours;
};

struct example {
  char *p;
  struct time now;
};

struct example var;

This example first defines a struct time that holds the data required to represent time as three integer members. It next defines a silly structure example that contains a pointer, and a variable of type struct time. Finally it creates a variable of type struct example named var. Now consider the following statements.

*var.p = 'x';

This is legal. The structure selection operator has higher precedence than the indirection operator. As a result the above is really

*(var.p) = 'x';

Since the p member of var is a pointer, it makes sense to apply the indirection operator to that pointer. This statement stores the character 'x' into the memory location pointed at by the p member of var. I certainly hope that pointer has been given a meaningful value first!

This is also legal.

if (var.now.seconds == 0) { ... }

The structure selection operator associates from left to right so the above is really

if ((var.now).seconds == 0) { ... }

But var.now is a variable of type struct time. Thus it makes sense to access the seconds member of that variable. Since the seconds member is an integer, it makes sense to compare it to zero.

Initializing structures.

Just as you can initialize a integer variable or an array when you declare it, you can also initialize a structure. The syntax is very similar to the array initialization syntax. Here is an example using struct account_information again.

struct account_information my_account = {
  "Chapin", "Peter", 12345678, 0, 1
};

As with arrays, the initializers for the members go between braces. The difference is that with a structure each initializer is potentially of a different type. Notice that the initializers must come in the same order as that in which the members were declared. This example initializes my_account to the same values that were put into my_account in my very first example. Notice also that I'm using the special method of initializing character arrays here. I'm using a string literal to intialize the last_name and first_name members. This prevents me from having to write

struct account_information my_account = {
  { 'C', 'h', 'a', 'p', 'i', 'n', '\0' },
  { 'P', 'e', 't', 'e', 'r', '\0' },
  12345678, 0, 1
};

However, the above is also legal. You can initilize a member array by using a brace enclosed list nested inside the braces used to initialize the structure itself. You can use the same method to initialize a member that is also a structure.

Summary

A structure variable contains many other variables, called members, inside of it. The members each have their own name and type. To use structures, you first have to provide the compiler with a structure definition. Then you can declare variables to be of that structure type. Each structure variable you declare has its own independent copy of the members.
You access the members of a structure using the '.' operator. To the left of the dot you put the name of the structure variable. To the right of the dot you put the name of the member inside the structure variable that you wish to access. If the member is an array or another structure you can apply the [] operator or the '.' operator to that member as well.
Structures can be initialized using a brace enclosed list of initializers very much like the way arrays are initialized.