Unix Overview


Table of Contents

1. Basic Unix Concepts
Introduction
Terminals, Hosts, and All That
Logging In
Special Characters
The File System
Process Tree
Users and Groups
File Permissions
The Superuser
Set-UID and Set-GID Programs
2. Basic Unix Commands
On-Line Documentation. The Unix Manual
A Tour of the File System
Conventions
File Manipulation
Directory Manipulation
Security
Communicating with Users
Programming
Manipulating Text Files
3. The Bourne Shell
Command Line vs Shell Scripts
Shell and Environment Variables
Quoting
Wildcards
I/O Redirection
Multiple Commands, Sub-commands and Background Commands
Control Flow
Arithmetic
Shell Functions
Examples

List of Tables

1.1. Special Control Characters
1.2. Directory Permissions
2.1. Unix Manual Sections
3.1. Predefined Shell Variables

A legacy Unix systems assumes it will be supporting many users via serial terminals. Such terminals are normally dumb devices that provide a text mode display and a simple keyboard. A Unix system may have many such devices connected to it simultaneously. Some of the connections may be direct serial connections while other connections may be via modems.

Since Unix supports dumb terminals, the standard Unix utility programs are strictly command line oriented. They do not use graphics or full screen menus. Furthermore, they typically do not understand "function keys" or other extended keys since such keys are not available on all terminals. In addition most of the standard Unix utility programs are not very wordy. In part this is the Unix tradition, but in an environment where slow serial connections are being used, it also helps make the system feel more responsive.

The rather drab user interface presented by the standard Unix utilities is often seen as a negative aspect of using Unix. On the positive side, however, the utilities can be very quick to use once you know them, and they will work from almost any kind of terminal. In fact, some special "escape sequences" have been standardized to allow programs to do rudimentary graphics on a wide variety of terminal devices. Such programs work as well for dial in users as for direct users.

More complex terminal based applications typically consult a database of terminal characteristics stored on the system so that they can take maximum advantage of whatever features the terminal supports. Many terminals do have function keys and arrow keys, for example, and some applications will be able to use them if they are available.

It is important to understand the difference between a Unix system supporting multiple terminal sessions and local area networked file server. When a file server is used, the workstations are fully functional computers. The central machine merely serves as a glorified disk. All the programs one runs actually execute on the workstations. In contrast, with a Unix system supporting terminal sessions, the terminals are dumb. They do not run any of the programs. All the programs are run on the one central machine. Only the screen output and the keyboard input are exchanged with the terminals.

To make matters more confusing, Unix systems can also be run as file servers in addition to their traditional role of supporting terminals. A common file serving protocol used in the Unix world is called the Network File System (NFS). However, Unix software exists that allows a Unix machine to appear as a Windows server in a network of Windows clients as well.

Many people today use personal computers as if they were terminals. Thus most people who connect to a serial host like a Unix system do so by running a terminal emulator program on their machine. There are several advantages to using a personal computer as a terminal rather than using a real terminal. First, terminal emulator programs can typically emulate several kinds of terminals. Also terminal emulator programs usually have facilities for recording the data sent to or received from the serial host, and for transferring files. In addition, some terminal programs allow you to interact with several different hosts at once (or the same host as several different users).

Of course most Unix machines have a keyboard and monitor connected directly to them. This is what a personal computer user would expect. However, the traditional multi-user serial host does not normally need any kind of built in keyboard or monitor. All interaction done with such a system is done via a dumb terminal. The advantage of using a display device that's directly connected to the machine is that it offers very high speed display capability. In general, however, only one person can use it. Nevertheless, when high speed displays exist, real graphics become possible. In contrast, graphics cannot normally be done with slow serial connections. There's just too much information to transfer to the display.

Some time ago, MIT devised a standard way of sending graphics information to specialized "smart" terminals. Instead of sending data about every bit to draw, programs send graphics commands. In essence, these graphics commands are glorified escape sequences. Of course, the terminal has to be prepared to interpret these commands by running software called an "X server". As a result the terminal requires a powerful CPU of its own with lots of memory. The MIT system is called the "X Window System". It has become a standard in the Unix environment.

If you have an X terminal, your terminal can understand the graphics information sent to it via "X client programs" running on the host. The X terminal correctly handles the output from many client programs at once. Thus you can run multiple client programs on your host and display their output simultaneously on your X terminal. In fact, if your X terminal is on a network, you can run client programs on several hosts at once and display the output of all of them on the same X terminal. Hosts that support the X window system have a client program that will act like a simple serial terminal. Thus you can still run old style programs on your fancy X terminal simply by establishing one or more of these pseudo terminal sessions.

Just as personal computers can run serial terminal emulators, you also can buy X server software for personal computers. X servers are available for Windows, MacOS, OS/2. Some are commercial products, but free X servers can also be downloaded.

As with any multi-user system, Unix requires that you log in before you can do anything. When you log in, the login program starts a "shell program" for you. It's this shell program that prints the prompt and interprets your commands. The login program also sets your working directory to your home directory. Each user's home directory might be named something like /home/username where "username" is that user's username.

Although it's not necessary, it is traditional to make a user's home directory have the same name as the user. However the exact location of the home directory structure varies from system to system. On some systems it might be under the /usr directory (this is an rather old-fashion configuration). On other systems it might be under /home. On very large systems, there might be a directory for each letter of the alphabet with home directories under each as appropriate. For example, jjones's home directory might be /users/j/jjones. This approach keeps the number of home directories in any one directory relatively small.

Also when you log in, your shell program will execute one or more scripts of commands. The traditional Bourne Shell first runs the script /etc/profile. Then it runs the script .profile located in your home directory. The system administrator can set global parameters in /etc/profile, and you can customize your environment by editing .profile.

It is important to understand that multiple shell programs (or command processors) exist and are in wide usage. On Linux systems the "Bourne Again Shell" (bash) is commonly the default. It is largely upwardly compatible with the traditional Bourne Shell. However the "C Shell" (csh) and "Korn Shell" (ksh) have significant user communities. The commands accepted by these shells are mostly the same since most commands are actually separate programs that are executed by the shell. However, many details, such as the location(s) of the log in scripts, vary from shell to shell.

When you are done using the system, you must log out. Type exit at the shell prompt to terminate your shell program. In general if you forget to log out, the next person who uses your terminal will find that you are still logged in. They will have total access to your files.

In more modern environments it is common to interact with a Unix system by way of a "Secure Shell" (SSH) connection. SSH refers to a network protocol that encrypts that data between the client program (a terminal emulator that "speaks" the SSH protocol), and the SSH server on the Unix host. Many different SSH client programs exist and each has its own way of logging in and it's own manner of configuration. However, once you have logged into the Unix host, your experience of using the system is largely the same as I described above. You interact with a shell program such as bash, typing commands that execute on the Unix host.

When you use a Unix system, there are several control characters that you should know about. Precisely which characters is a characteristic of your terminal. You can change these mappings using the stty program. Type stty at the prompt with no arguments to view your current mappings. In the table below the '^' character means you need to press and hold the control key (often labeled as "ctrl") while pressing the following letter.


If you are using the Korn Shell, and if your EDITOR environment variable is set to "emacs", the shell will accept emacs commands for editing the command line. In particular, ^F and ^B will move the cursor on the current command. ^P and ^N will let you step throught your command history. Depending on the terminal you are using, you may be able to use the arrow keys for these functions as well.

One issue that causes a lot of confusion relates to the DEL and BACKSPACE characters. People normally want the character to the left of the cursor to be deleted when they press the backspace key on their keyboard. However, some systems will execute a destructive delete only when they receive a DEL character (ASCII 127). This means that you may need to configure your terminal or terminal emulation software so that it sends a DEL character when you press the backspace key and not the backspace character (ASCII 8). As an alternative, it may be possible to configure your host to do a destructive backspace when it receives the backspace character. You can use the stty command to configure this.

$ stty erase '^?'

This will cause a DEL character to do a destructive backspace.

$ stty erase '^h'

This will cause a backspace character to do a destructive backspace.

As with Windows, the Unix file system is arranged as a hierarchy of files and directories. Unix uses a forward slash (/) character instead of Windows's back slash (\) character as a delimiter between the components of a path. Compare

Windows: \SUBDIR\SUBDIR\FILE.TXT
Unix   : /subdir/subdir/file.txt

Unix has no drive specifiers (no A:, C:, F:, etc). Instead all files on all disks appear to be under one directory tree. For example, files under the /floppy directory might actually be on the floppy disk. Thus instead of using A:... to refer to a file on the floppy disk, you use /floppy/... In other words, all files appear to be in the same file system. As another example, manual files on a CD ROM might appear under a directory such as /archives/lib/manuals.

This is usually much easier for users to understand. Instead of having to remember both in which directory and on which disk a file is located, the user must just remember the directory name. Also, as the system grows, files can be moved to new disks without changing the logical arrangement of the files. In fact, the entire file system can be arranged on the disks differently without the users even knowing what has happened.

Unix file names are case sensitive. That is, the names xyz, XYZ, and Xyz all refer to different files. By tradition, most file names are in lower case. The maximum number of characters allowed in a file name varies from system to system. However it is typically 252 or more. Technically, all characters are legal in a file name except for the slash character and the null character. A file could be named \<*>!. File names can even contain spaces and control characters. In practice, dealing with unusual file names is a hassle; you should avoid it.

Unix does not assign any special significance to file extensions. In contrast, Windows requires, for example, that all executable files have a .exe or a .com extension (this isn't entirely true for modern Windows systems). Unix has no similar requirement; most executable files have no extension. Note that some programs require certain extensions in the names of their data files.

If a file name starts with a '.' character, the Unix file listing program, ls, will not display it unless explicitly directed to do so. Such files are said to be "hidden" files. They are not really hidden. You can manipulate them exactly like other files. However to see them in a directory listing, you must invoke ls with the -a option. Also most programs that scan over directories skip the "dot files" unless told otherwise. Hidden dot files are useful for configuration files and other files that have to be around, but that you don't want cluttering up your directory listing.

In general, file names are specified by giving a path from the root directory to the file itself. For example

/                         The root directory itself
/subdir                   A subdirectory under /
/subdir/subdir            A subdirectory under /subdir
/subdir/subdir/file.txt   A file located in /subdir/subdir

These fully elaborated names (that contain all the directories, etc) are called "absolute pathnames".

At any give moment, a program has a working or default directory. You can specify files more easily by using relative pathnames that start at the working directory. Relative pathnames never start with a slash; absolute pathnames always do. For example

afile.txt                In the working directory.
subdir/afile.txt         In a dir below the working dir.

When you use relative pathnames, you can use the special name .. to refer to the directory immediately above the working directory. For example

../afile.txt             In the parent directory.
../subdir/afile.txt      In a sibling directory.
../../afile.txt          In a grandparent directory.

You can also use the special name . to refer to the working directory. This is useful when a command requires a directory name and you want to use the working directory, but don't want to bother typing its absolute pathname. For example

$ cp ../*.txt .

copies all files matching *.txt from the parent directory to the working directory.

In general, you use an absolute pathname whenever you aren't sure what the working directory is. For example, absolute pathnames are common in shell scripts since the author of the script can never be sure from which directory the script will be launched. You use relative pathnames more often interactively since they are usually less typing.

As with Windows, there is no obvious way to determine if a name is a name of a subdirectory or the name of a file. For example, faced with a name such as /abc/def/ghi you can't tell by looking at it if ghi is a file or a directory. Some commands operate differently depending on whether or not they've been given the name of a file or of a directory. These commands must refer to the disk to determine this information. They cannot tell by looking at the name itself.

There is a difference between a program and a process. A program is a static entity—the machine instructions that are to be executed. A process is a dynamic entity. A process includes not only the instructions of the program, but also its data and stack, its open files, its file locks, its semaphores, etc. Thus a process is what a program becomes once it is in the machine and executing. A process is not just a program. It is also all the resources allocated to the running program by the operating system.

The naive user believes that they start processes. In fact, only processes start other processes. When the user types a command such as

$ emacs afile.txt

the user's shell spawns emacs. We say that emacs is a child process of the user's shell. Emacs may spawn children of its own. These processes become the grandchildren of the user's shell.

Normally the shell waits for the child to terminate before it continues. However, since Unix is multi-tasking, it is possible for a process to continue right away after spawning a child. In this way a parent process can spawn other children before the first child has died. In turn the children can spawn multiple children of their own. Thus a user's shell might be the root of a tree of processes. Since many programs spawn processes to perform their functions, a user might not even be aware of all the processes that are executing on his/her behalf.

What of the user's shell? In fact, it is a child of a special process called init. The init process is the first process that runs on a Unix machine. It is started by the kernel when the system is booted. It is the only process ever started by the kernel. Init reads certain files to configure itself and then it spawns a shell to execute some scripts that are typically stored in the /etc directory. These scripts perform the bulk of the system startup activities.

Once the startup scripts have been processed, init spawns a getty process for every serial terminal on the system. Information about all these terminals is stored in a text file. The getty process displays "login:" on the terminal and waits for a user to come along.

When a user responds to the login prompt, getty overlays itself with the login program. The login program (officially still a child of init) then prompts for the password and verifies that it is correct. The login program then overlays itself with the shell program specified in the user's /etc/passwd entry.

The startup scripts typically also start inetd, the Internet daemon program. This program listens to the network waiting for various types of requests. If a remote user runs a telnet program, inetd picks up the request and spawns telnetd to handle it. The program telnetd then constructs a virtual terminal session, accepts the user's login, and attachs the appropriate shell to the virtual terminal. The inetd process is also responsible for spawning ftpd, fingerd, and other daemon programs as necessary depending on the type of requests that arrive from the network.

When the user exits their shell, init is awakened (it sleeps once all the getty processes have been started). Seeing that one of its children has died, init spawns a new getty and goes back to sleep.

Traditionally Unix uses the file /etc/passwd to define who has an account on the system. The file /etc/passwd is a plain text file with one line for each user. The lines have several colon delimited fields. Here is the format:

username:e-password:UID:GID:info:directory:program

The username is the user's login name. The e-password is the user's password encrypted (actually the password is "hashed" not encrypted but this difference does not concern us here). The UID is the user's ID number. The GID is the ID number of the user's primary group. Info is any arbitrary information about the user that the system administrator wants to record. Typically, the user's full name and office phone extension go here. Directory is the user's home directory—the working directory the user will have when they first log in. Program is the shell program invoked for the user.

For example, Here is a typical /etc/passwd entry:

pchapin:fEeww9j4mODeI:202:20:Peter Chapin:/u/pchapin:/bin/ksh

Here is the /etc/passwd entry of the "user" bin. Bin is not a real user, but bin does own many files on the system. It is common for a multi-user system to contain several pseudo users for special purposes.

bin:*:2:2::/bin:/bin/sh

The '*' in the e-passwd field means that it's impossible to log in as bin. An account without a password would have an empty e-passwd field.

The /etc/passwd file is readable by everybody. Several Unix utility programs use the information in /etc/passwd. This is because the system normally deals with user ID numbers and most utilities use /etc/passwd to convert those numbers into usernames when possible.

The GID specified in the /etc/passwd file is the user's primary group. However, users can be in several other groups as well. The groups that exist on the system are defined by the file /etc/group. Here is its format

group-name:e-password:GID:logname-list

The group-name is the name of the group. The e-password is not used. The GID is the ID number of the group. The logname-list is a list of lognames that represent the membership of the group.

Every process that runs on a Unix system has two UID numbers associated with it. The "real" UID is the ID number of the user that launched the process (directly or indirectly). The "effective" UID is the ID number of the user for which the process has the same security rights. Normally, the real and effective UIDs of a process are the same. I will describe the situation where they are different a little later.

Every file and directory has an owner. Generally, the owner of a file is the same as the effective UID of the process that created the file.

In addition, every process has a real GID number associated with it. This GID is the number associated with the real UID in the /etc/passwd file. Every process also has an effective GID that is normally the same as the real GID. In addition, every process has an associated group access list that defines the GIDs of all the groups that the process is a member. Normally the group access list is taken from the /etc/group file at login time.

Every file and directory has a group association. This group is usually the effective GID of the process that created the file. However, under some circumstances the GID of a file may be totally unrelated to the file's UID.

Each file and directory has a set of nine permission bits associated with it. You can see these bits by doing an ls -l command. The nine bits are grouped into three sets of three. The first three pertain to the user who owns the file. The middle three pertain to users in the same group as the file (but who do not own the file). The final three pertain to all other users. The possible permissions are read (r), write (w), and execute (x). Thus a file's permissions might be:

rwxr-x---

This means that the owner can read, write, and execute the file. People in the same group as the file can read and execute it, but not write to it. Everybody else has no access to the file.

When you talk about a file's permissions (or "mode" as it's often called), you often use a three digit octal number to represent the permission bits. For example

rwxr-x---    Mode = 750 (111,101,000)

Here are some common permissions as applied to files.

rwxrwxrwx         (777) Everybody can do anything.
rwx------         (700) The owner can do anything.
rwxr-x---         (750) People in the group have access.
r--r--r--         (444) Read only.
rw-------         (600) Not a program. Owner only.
rw-r-----         (640) Not a program. Group has access.

For directories, the meaning of the bits is slightly different.


For example suppose you did cp /home/me/afile.txt /home/you. For this to work, you need (x) access to the root directory so you can look up the name "home". You need (x) access to the /home directory so you can look up the name "me". You need (x) access to /home/me so you can look up the name "afile.txt". You need (r) access to /home/me/afile.txt so you can open the file for reading. You need (w) access to /home/you so you can create a directory entry for the new file.

Since normal users cannot write to /etc/passwd, you might be wonder how a normal user can change their password? The "passwd" program used to change passwords has to write to /etc/passwd after all.

When the passwd program executes, its real UID is that of the user who invoked it. However, its effective UID is that of the superuser. Thus, while passwd is executing, it has the privilege of the superuser. This is because passwd is a "set-UID" program owned by root. When a set-UID program runs, its effective UID is set to that of its owner and not to that of its invoker.

Many system programs are set-UID. For example, the login program has to have superuser privilege. It must set the real and effective UID of the user's new shell to the correct values. However, only superuser programs can change their real UID.

You can tell a program is set-UID because an (s) appears in its permission list where the (x) would appear. For example

Permissions    Owner     Group    Name
r-sr-xr-x      root      other    passwd

The (s) in the execute position for the owner means that passwd is a set-UID program. Notice that passwd is executable by everyone.

The set-UID feature is useful for non-superusers as well. For example, suppose I were to write a grade reading program that my students could execute to check my grade book. I don't want to grant read access to my grade book to just anyone. Thus I write a program that will only read the appropriate grades, and I make it set-UID owned by me. Thus when a student executes the program, they have the rights of me, and the program can read my grade book file on their behalf.

Similarly, programs can be set-GID. In that case, when they start executing, their effective GID is set to the group associated with the program and not the real GID of the invoker. For example, VTC3 could write a set-GID program that has the rights of the club but that could be executed by any user.

Set-GID programs are not as common as set-UID programs.

This document gives a very brief overview of the basic Unix commands. Please refer to a book or the on-line documentation for more information. Don't be afraid to experiment with a command to figure out what it does. There is a fair amount of variation from one Unix system to another. Experimentation is usually the best way to understand how your system works.

[Warning]Warning

Many Unix commands will do nasty things without asking for confirmation. Thus if you have no idea what a command does, you should read about the command first. This simple practice will save you a lot of grief. Don't trust Unix utilities to ask first. They don't.

The Unix manual is, of course, available in printed form. In fact, you can buy it at a good computer bookstore. It's rather large—many volumes. Most Unix systems have the manual on-line. You can access it via the man command. For example, if you want to read the manual page on the cp command, type

$ man cp

at the prompt (the '$' above represents the prompt; your prompt might look different). Typically the man command will automatically pipe the output through the more filter for easy reading. If you want to keep a manual page around to read later, try redirecting the output of man to a file.

$ man cp >cp.man

The Unix manual comes in eight sections. Here is what is covered in each section.


The man command lets up look things up in any section. For example

$ man fopen

gives you information on the C library fopen function.

In some cases a name appears in more than one manual section. For example if you do

$ man chmod

you get information on the chmod utility program. If you want information on the chmod system call, you must do

$ man 2 chmod

since the system calls are in section 2.

Unix manual pages have a standard format. Many people use the same format for their own programs. You will get used to it once you've looked at enough manual pages.

All Unix systems have certain standard directories. Although the underlying operating system doesn't care about the precise directory structure, many of the standard Unix utilities have specific directory names hard coded into their source code. Here is a list of some of the more interesting directories and what you might find in them.

There is an effort in progress to standardize the organization of the Unix file system in order to facilitate the creation of portable scripts and administrative tools. For more information see the Filesystem Hierarchy Standard at http://www.pathname.com/fhs/.

Common Unix Directories

/bin /usr/bin

Here are where all the utility programs are stored. The most commonly used programs are put into /bin. The less commonly used programs are put into /usr/bin. On some systems, /bin may be the only directory of the two available when the system is booted into single user mode.

/tmp , /var/tmp

Many programs write temporary files into /tmp. These directories are wide open to all users. Files stored in /tmp should not stay there very long.

/lib , /usr/lib

Here is where compiled library files such as the C library are stored. Also certain programs that are not normally executed directly by the user are stored here. For example, the C compiler has two passes. They are stored in /lib. The dictionary for the spell program, the macro packages for nroff, and many other things are stored in /usr/lib.

/usr/share/man

Here is where the online manual is stored. This directory contains subdirectories for each manual section. For example, /usr/share/man/man1 contains all the manual pages for section 1 in compressed form.

/usr/include

Here is where the header files used with C programming are stored. This directory is the standard place where the C compile looks when it sees header files included with the <...> notation. Notice that there are several subdirectories under /usr/include as well.

/usr/local

This directory contains locally produced items. For example, /usr/local/bin contains programs that are not part of the normal Unix distribution, but that are of interest to all users at a particular site. Also under /usr/local are directories for local administrative commands, local manual pages, and so forth.

/var

The /var directory is where "variable" files are stored. These files tend to be highly volitale and may change from minute to minute. Incoming mail, log files, and other such things are often stored here.

/var/adm

On many systems this is where various administrative files are stored. For example, account files, the shutdown log file, and the file that records information about telnet connections are stored here. On most Linux systems, /var/log contains all the log files.

/var/spool

This directory is used for spooling operations of various types. The print daemon, uucp, cron, mail and various other Unix programs that do things automatically use this directory for temporary files and work files.

/home

On some old systems, user home directories are all kept under /usr. However since there are often many home directories and since they often need to be managed with a different policy than that used by the rest of the system, most current systems store user home directories under /home. Note that in Unix each ordinary user has a personal directory for storing his/her files. Many Unix applications assume such a directory exists and expect to store all of a user's personal configuration information there. Unix does not have a "registry" in the sense of Windows.

/etc

This directory contains files and programs of interest to the system administrator. Many system configuration files are in /etc.

/dev

This directory contains the special device files. Unix treats all I/O devices as if they were files. By tradition, all such files appear in /dev.

ls

This program produces directory listings. If you provide no arguments, it assumes you want to see everything in the working directory. If you give it a filename argument, it produces information on just that file. If you give it a directory name argument, it produces information on all the files in that directory.

For example:

$ ls
$ ls afile.txt
$ ls subdir
$ ls *.c
cp

This program copies files. If the last argument is the name of a directory, cp will copy all the specified files into the specified directory. If the last argument is the name of a file, the first (and only other) argument is the name of the source file.

For example:

$ cp afile.txt bfile.txt
$ cp afile.txt bfile.txt subdir
mv

This program moves files. Rather than copying the file and then deleting the source, this program just updates the directories. The file itself is not touched. This is a much better way to move a file.

Its command syntax is similar to that of cp. If the last argument is not the name of a directory, mv assumes you are trying to rename the file.

For example:

$ mv afile.txt bfile.txt      # Renames afile.txt
$ mv afile.txt subdir         # Moves afile.txt
ln

This program links a file into a directory. Unix allows a file to appear in several subdirectories at once. You can use ln to create such additional directory entries for a file.

Its command syntax is similar to that of mv. If the last argument is not the name of a directory, ln assumes you are linking the file into a new name.

For example:

$ ln afile.txt bfile.txt      # bfile.txt is afile.txt
$ ln afile.txt subdir         # Link to subdir/afile.txt
rm

This program deletes files. Actually, rm removes a link. If the link removed was the last link to the file, the file is deleted from the disk. Thus if you've linked a file into several directories, it must be removed from all such directories before it is actually removed from the disk.

For example:

$ rm afile.txt bfile.txt
cat

This program displays the contents of a file to the terminal. It is similar to the Windows type command. You can also use it to create small files without bothering with an editor.

For example:

$ cat afile.txt               # Look at afile.txt
$ cat >afile.txt              # Create afile.txt
more

This program also allows you to look at the contents of a file. However, unlike cat, it will page the display instead of just scrolling it as quickly as possible. The more command is a much better program to use when you want to see something.

The more is often used at the far end of a pipe to make looking at the output of the previous commands easier.

For example:

$ more afile.txt              # Look at afile.txt
$ ls -l | more                # Look at ls -l's output.

The more has many interesting features. Type the 'h' key when it is paused to see its help screen.

Most modern systems (and all Linux systems) have a related program named less (think: "less is more"). The less is a more powerful version of more with many additional features. Some people use less exclusively and never use more.

chmod

This program lets you adjusts the permissions on a file or directory. The first way you can use this program is to specify the entire mode of the file(s) you want to change. For example:

$ chmod 640 afile.txt bfile.txt cfile.txt

This changes the named files to rw-r------ permissions. The permissions they had are not important.

The second way you can use this program is to specify the changes you want to make. For example

$ chmod u+x afile.txt    # Give user (owner) (x) permission
$ chmod o+r afile.txt    # Give other (public) (r) permission
$ chmod ug+rw afile.txt  # Give user & group (r) and (w)

You can only change the permissions on files you own.

chown

This program lets you change the ownership of a file. You can only change the ownership of files you own. Once you've changed the ownership, you can't change the ownership back. In a system where disk quotas are enforced, chown will probably not be executable by normal users. The syntax of this program's command line is:

$ chown newowner afile.txt bfile.txt ...
chgrp

This program lets you change the group association of a file. You can only change the group association of files that have GID of a group for which you are currently a member. The syntax of this program's command line is similar to that of chown.

umask

This is actually a built-in shell command and not a utility program. (You will find information about it in the sh entry of the manual; not the umask entry) You use umask to define the permission bits that are OFF BY DEFAULT whenever a new file is created. For example

$ umask 022    # rwxr-xr-x will be the default permissions.
$ umask 027    # rwxr-x--- are the defaults.
$ umask 077    # rwx------ are the defaults.

Normally, you execute a umask command in your shell's login script ($HOME/.profile for the Bourne and Korn shells).

In any multi-user system there are ways to communicate with other users. There are two category of communications programs. Some programs let you communicate in real time with other users that are currently logged in (chat). Other programs let you send messages to users that are not currently logged in that they can read later (mail).

write

This is a primitive Unix chat utility. If you'd like to chat with another user type a command such as

$ write logname

Everything you now type is sent to the other user's terminal. The other user can execute a similar command to send text to your terminal.

Be aware that the material to write to a user will appear on their terminal in the middle of whatever else they were doing. Don't be disconcerted if this happens to you.

To terminate a write session use the Unix end-of-file character: ^D.

mesg

This program enables or disables the ability of people to write to you. Use the command mesg n to disable writing and mesg y to enable it. If you just type mesg alone, the program will report on the writable state of your terminal.

mail , mailx , pine , elm

These programs are all mail programs. They all read and send the same mail files so it doesn't matter which you use. Mail is the simplest. The pine and elm programs are the most sophisticated. The program named mail is available on every Unix system in the universe.

Discussion about these programs is beyond the scope of this document. I refer you to the online documentation for more information.

Unix offers an extensive set of programming tools standard. Third parties do sell programming tools to supplement (or replace) the standard ones. However, normally Unix comes right out of the box with support for serious C programming.

In particular, the system offers a C compiler, a debugger, a profiling system, a source code control system (for managing different versions of a program), and a make utility (for managing the construction of large applications). For the sake of brevity, I will just point out three programs here.

cc

The C compiler. To compile the program afile.c, use the command cc afile.c. The executable result will be put into the file a.out. You can rename a.out with the mv command.

The cc command takes many options. Refer to the manual for more information.

vi

One of the standard Unix text editors. Vi was designed to work on any kind of terminal. When vi is in command mode, every letter you type is a command. The "i" command put vi into insert mode. You must be in insert mode to actually enter text into the file. To escape insert mode, you type the ESC key.

If you are entering text and you want to move the cursor to a new location in the file, you must first leave insert mode (ESC) and then type the appropriate command letters to move the cursor (j,k,h, and l).

It's actually not that bad once you get used to it.

emacs

Emacs is a common text editor (but not standard). It's a little easier to use than vi. Instead of having two modes of operation, emacs uses control characters for the commands. There are many different combinations of control characters to be remembered. However, you don't have to worry about changing modes all the time.

This section describes the basics of using the command processor called the Bourne Shell (hereafter simply called "the shell"). Note that although the Bourne Shell is the standard command processor on many Unix systems, it is possible to use other shells under Unix. Three of the more popular alternatives are the C Shell (csh), the Korn Shell (ksh), and the Bourne Again Shell (bash). Of these, the Korn Shell and the Bourne Again Shell are upwardly compatible with the Bourne Shell. It is also possible to use the Bourne Shell under other operating systems. For example, Windows implementations of the Bourne Shell exist. This document assumes you are using the shell under Unix.

The shell allows you to execute any command interactively. Since some commands require more than one line, the shell uses a secondary prompt to accept the additional lines of a multi-line command. For example

$ for FILE in *
> do
>   cp $FILE $HOME/backups
>   echo "Saved $FILE on `date`" >>$HOME/.backupinfo
> done
$

This command was typed interactively. However, since it involves a loop, the shell could not act on the command until it saw the "done" keyword. Thus the shell responded with the secondary prompt (a >) while it collected the entire command.

There is no limit to the size of the command or the number of lines or the number of nested control structures that the shell can accept interactively. However, you can only edit the current line. Once you've typed ENTER, that line is committed. You won't want to enter huge commands interactively unless you are a shell virtuoso.

For more complex operations, you will want to create a shell script. This is a text file that contains the commands you want executed. The first line of the file should be

#!/bin/sh

This line informs the operating system which program will handle the script. For example, if the first line looked like #!/bin/csh, the C Shell would be used to handle the script. As a result you can run scripts for other shells than the one you are currently using.

In a shell script, all material on a line after a # character is ignored (except for the first line, as described above). Use this feature to include comments in your script.

Before the shell can execute a script, you must have execute permission on the file containing the script. You can grant yourself execute permission using the chmod program. The easiest way is to use chmod's symbolic mode:

$ chmod u+x scriptfile

The "u+x" grants (+) execute permission (x) to the user (u) who owns the file. No other permissions are affected. You can also grant execute permission for people in the same group as the file with

$ chmod g+x scriptfile

or even

$ chmod ug+x scriptfile

to do both the user and the group at the same time.

Once you have execute permission to a file, you don't normally need to set it again after editing or copying the file.

The shell allows you to store information in shell variables. Shell variables are all strings. There is no notion of "type" within the shell. Traditionally shell variables are named in all uppercase. However, in usual Unix style, both upper and lower case letters are allowed and the names are case sensitive. Many people use lowercase letters exclusively for the names of their shell variables.

To set a shell variable use an = sign immediately after the name. For example

$ NAME=Peter

If you put a space before the equals sign, the shell will think you are trying to run a program.

$ NAME =Peter
NAME: not found
$

You can get a list of all the shell variables currently defined with the set command. To remove the definition of a shell variable use the unset command. For example:

$ unset NAME
$

If you wish to use a shell variable (either in a script or at the command prompt) you must preceed it's name with a $. The $ causes the shell to expand the variable into its value. For example

$ NAME=Peter
$ echo NAME
NAME
$ echo $NAME
Peter
$

You can put other characters next to the name of a shell variable when you expand it provided the other characters could not possibly be part of the shell variable's name. For example

$ BACKUP=/home/pchapin/progs/backup.dir
$ cd $BACKUP/prog1
$

Here the shell knows the $ only applies to the text BACKUP. The / character must terminate the name of the shell variable.

If you want to put the expansion of a shell variable right next to other letters, you can enclose the name of the shell variable inside a {...} pair. For example

$ P=prog
$ cd ${P}1
$ pwd
/home/pchapin/prog1
$

The shell has numerous other features regarding the use of {...} in this context. For the sake of brevity, I will not discuss them here.

Normally when you run a program, the shell variables you've defined in your login shell are not present in the child program. Try defining a distinctive shell variable. Type set to verify its existence. Next, type sh at the prompt to run another shell. Type set to see that the shell variable you previously created is not present. Type exit to terminate the second shell.

The feature means that shell variables you create in scripts will not be present when the script terminates. This is because scripts are actually run in a child shell (exception: if you use the dot command, the script runs in the current shell. For example

$ . scriptfile

will cause shell variables introduced in scriptfile to exist after the script terminates).

In Unix, every process has an environment. The environment of a parent process is inherited by the child process. Thus information placed into the environment by the parent can be referenced by the child (but not visa versa). The shell allows you to name certain shell variables as environment variables. This is done with the export command.

$ NAME=Peter
$ export NAME
$

Now the shell variable NAME is part of the environment. It's value is accessible in child processes. To see what is part of your environment, type export alone at the prompt.

$ export
export TERM
export PATH
export EDITOR
$

Normally the PATH variable is part of the environment. This allows child shells to find commands the same was as the parent does. The TERM variable usually contains the name of the terminal you are using. This is used by child programs to manage the display. The EDITOR variable is often used by programs to locate a text editor. In this way you can use the editor of your choice with programs that need editor support.

There are several predefined shell variables that are set when you login or at other times. You can define any others you would like to use in your profile shell script (executed within the login shell before it prints the first prompt). Your profile login script is in the file .profile in your home directory.

Here is a list of the important predefined shell variables.


This list is not complete.

Note that there exists a shift command that is very useful within scripts. It causes the value of $1 to be forgotten. The value of $2 is put into $1, the value of $3 is put into $2, and so forth. The value of $# is adjusted. Using this feature, a script can loop over all its command line arguments without concern for the exact number of such arguments. For example

# Inside a script file...
while [ $# -gt 0 ]
do
  # Process $1...
  shift
done

Even though the commands inside the loop process just $1, each argument is brought into $1 one at a time due to the shift.

Another technique for doing this is

# Inside a script file...
for ARG in $*
do
  # Process $ARG...
done

The shell regards many characters as special. Sometimes it is necessary to pass strings containing those special characters into a program or script as arguments. To prevent the shell from interpreting a special character, its special meaning can be temporarily disabled by quoting the character. There are several ways to do this.

A single character can be quoted by preceding its name with a backslash. For example

$ rm The\ File

deletes the file named The File. The space character is part of the first argument to rm. The usual special meaning of space as a delimiter between arguments was disabled for that one character.

The newline character can also be quoted in this way. For example

$ rm The\
> File

deletes the file named The\nFile (using C notation for the newline character). The usual special meaning of the newline character as a command delimiter was disabled for that one character.

Notice that the shell used the secondary prompt to collect the next line of the command.

If many characters need to be quoted, it is easier to use the '...' matched single quotes. For example

$ rm '|<*>|'

deletes the file named |<*>|. The special meaning associated with '>', '<', '|', and '*' have been disabled within the single quotes.

The single quotes does not disable the special meaning of '\', however. For example

$ rm 'I\'m done'

deletes the file named I'm done. The special meaning of the backslash is still active and thus the special meaning of the single quote just before the 'm' is disabled. Only a backslash can disable the special meaning of a backslash.

Note that the single quotes does disable the special meaning of the newline character. Thus

$ echo 'This is a mult-line echo command.
> This stuff is still part of the argument to echo.
> The newline character is embeded in this argument.
> The output appears below.'
This is a mult-line echo command.
This stuff is still part of the argument to echo.
The newline character is embeded in this argument.
The output appears below.
$

Double quotes can be used instead of single quotes. However, double quotes do not disable the special meaning of $ nor of backslash. Since $ is still handled within double quoted strings, you can cause shell variable expansions to occur in quoted material. For example

$ NAME=Peter
$ rm "$NAME's File"

deletes the file named Peter's File. Note that the NAME shell variable is expanded. Note also that the single quote is not special inside a double quoted string.

Quoting is very important when using programs that process commands containing the very same characters treated in a special way by the shell. For example, the grep command expects its first argument to be a regular expression. Regular expressions use many unusual characters. Thus it is commonly necessary to quote the first argument to grep. It is often done even when not necessary just out of habit. For example

$ grep '^A.*Z$' afile.txt

will search afile.txt for all lines that start with an 'A' and end with a 'Z'. Note that '*' and '$' are not treated special by the shell because of the enclosing single quotes. Thus these characters are passed to grep as is.

$ grep 'Peter' afile.txt

will search afile.txt for all lines that contain "Peter." In this case, the quoting is not necessary. However, there is no harm in quoting a character that does not need quoting; it is sometimes done anyway.

It is also possible to place the standard output of a command onto the command line. This is done using back quotes. For example

$ cat afile.txt
user1 user2 user3 user4
$ mail `cat afile.txt`

When the shell sees the back quotes it will execute the contained text (in a subshell) and place the standard output of the command back onto the command line to replace the back quoted material. In the example above, the mail command gets run with user1... user4 as arguments.

When the replacement of text is done, newlines in the standard output of the command are replaced with spaces. Thus if the command produces multiline output, all the text on the various lines gets folded into a single command line.

Since the shell executes a subshell to process the command, there is no limit to what can be accomplished inside the back quotes. In particular I/O redirection can be done. For example

$ for DIR in `echo $PATH | sed 's/:/ /g'`
> do
>   echo $DIR
> done
$

The subshell expands $PATH, processes the I/O redirection and honors the single quotes. The resulting text forms a list of names over which the loop operates.

Command substitution can even be nested. This is possible because the backslash is still processed by the shell as a special character even inside back quoted text. However, the backslash is removed before the subshell sees the back quoted text. For example

$ prog1 `prog2 \`prog3 arg\``

Here a subshell is created and told to run the command

prog2 `prog3 arg`

Of course to run this command the subshell must create still another subshell to run

prog3 arg

Back quotes are still honored inside of double quotes, but not inside of single quotes. For example

$ h
Hello world
$ TEST="`h` I'm happy"
$ echo $TEST
Hello world I'm happy
$ TEST='`h` I\'m happy'
$ echo $TEST
`h` I'm happy
$

Note that in all the cases above TEST contains a single string. Consider

$ TEST=Hello world
$ echo $TEST
Hello
$ TEST="Hello world"
$ echo $TEST
Hello world

Without the quotes, the second word used in the definition of TEST is simply ignored.

Many Unix utility programs accept a list of filenames on the command line. For example

$ rm afile.txt bfile.txt cfile.txt

To facilitate producing such lists, the shell has a wildcard expansion feature. In particular the following characters are used:

*       Matches zero or more of any character in a name.
?       Matches exactly one character.
[...]   Matches exactly one character in the set.

For example, to remove everything in the current directory use

$ rm *

If you are used to Windows, you might be tempted to use rm *.* This, however, only removes files that contain a dot character in their name. Not all files will match that specification.

Be sure you realize that the '*' in the example above is interpreted by the shell. That is, the shell finds all matching files and replaces the '*' on the command line with a list of the names. The rm program does not realize that you used a wildcard.

Thus if you want to handle wildcards in your own programs, all you need to do is be sure your program can process a list of file names off the command line. The shell will take it from there.

Suppose you wanted to remove the files memo1, memo2, and memo3. You might use

$ rm memo?

The shell will produce a list of file names that start with memo and have one additional character at the end of their name. Thus the file memo_list would not be matched while memox would be matched.

You could also do

$ rm memo[123]

Here the shell will only match the names memo1, memo2, or memo3 (if they exist). The file memox would be spared.

Finally you could do

$ rm memo[1-3]

The '-' character inside the square brackets implies a range. This is easier than typing all the possibilities if the range is large. For example

$ rm doc[A-Z].rev.[0-9]

Normally you use directories to partition your files into managable groups. However, if you also use sensible naming conventions for your files (distinctive extensions or prefixes), you can refer to different file sets within one directory using wildcard characters. For example, suppose I gave all my memos a .mmo extension. Then I could extract all the memos from a directory and move them into a $HOME/memos directory with a command such as

$ mv *.mmo $HOME/memos

I don't need to worry about accidently moving some other type of file out of the working directory.

One of the shell's most important features is it's ability to redirect the standard input and standard output of any program. Programs that ordinarly write to the screen can have that output sent to a file. Programs that ordinarly read from the keyboard can have their input redirected from a file. For example

$ prog >afile.txt

Now afile.txt contains the text that prog would have displayed on the screen.

$ prog <bfile.txt

Now prog reads its input from bfile.txt rather than from the keyboard.

Both the input and the output can be redirected.

$ prog <bfile.txt >afile.txt

The I/O redirection part of the command line is not sent to the program. As far as prog is concerned in the examples above, it is being executed with no arguments. If prog wants arguments, they can be put on the command line anywhere relative to the I/O redirection operators. For example

$ prog <bfile.txt arg1 >afile.txt
$ prog arg1 >afile.txt <bfile.txt
$ prog > afile.txt < bfile.txt arg1

all do the same thing.

Ordinarily when output redirection is used, the shell truncates the output file to zero size first if it already exists. Commonly you want to append the new output to the end of the file. This can be done with the '>>' operator. For example

$ echo "Finished processing on `date`" >>$HOME/results

This saves the current date (and time) into the file results in the home directory. It adds a new record onto the end of any existing records.

Many Unix programs that process the data inside files will accept the name of the file(s) to process on the command line. If there are no such names presented, a typical Unix program will process whatever it finds at its standard input. Consider the commands below. They both search the file afile.txt looking for lines that contain the string "Peter."

$ grep 'Peter'   afile.txt
$ grep 'Peter' < afile.txt

In the second case, the grep command does not see any file name on the command line. It thus reads it's standard input (which happens to be the same file as named in the first command).

This behavior allows you to test a program without creating a special data file. Just run it and type at it. For example

$ grep 'Peter'
Hello.
Anybody there?
My name is Peter.
My name is Peter.
What's your name?
Is it Peter also?
Is it Peter also?
^D
$

The '^D' (control+D) causes the terminal to generate an EOF character. Notice how grep printed out lines that contained the string 'Peter.'

The shell allows you to direct the standard output of one program into the standard input of another. This is done with the pipe operator '|.' This ability, coupled with the behavior described above, makes the Unix system powerful. For example

$ ls -l | grep '^d'

This command displays all lines in the output of the ls -l command that start with a 'd'. Such lines are entries for subdirectories.

$ who | wc -l

This command sends the output of who to the word count program to count the number of lines in who's output. The result is a count of the number of users logged into the system.

$ prog args | mail pchapin

This command mails the output of the prog command to pchapin. This works because the mail program accepts the message to be mailed at it's standard input.

It is possible to redirect or pipe the output of entire loops. For example

$ for FILE in *
> do
>   process $FILE
>   echo "Done with $FILE on `date`"
> done > $HOME/results
$

Note the redirection done after the 'done' keyword. All the output generated within the loop (except for stuff explicitly redirected) is put into $HOME/results. This is faster and cleaner than redirecting with append each command within the loop.

Pipes are also possible.

$ for FILE in *
> do
>   process $FILE
> done | sort > $HOME/results
$

When you redirect the output of a loop, the loop is run in a subshell. This may be significant if the point of the loop is to set shell variables.

The shell allows you to run several commands on one command line. Each command must be separated with a semicolon. For example

$ cd $HOME; ls -l | more; dostuff

Often after changing the permissions on a file, you want to do a directory listing to see if you did things right.

$ chmod u+x script; ls -l script

You can arrange things so that a second command executes only if the first command returns a successful status code (zero). For example

$ prog1 && prog2

Here prog1 is executed first. If prog1 returns a status code of zero, prog2 is executed. On the other hand if prog1 fails, prog2 is not attempted. The overall command stops after the first failed program.

You can also do

$ prog1 || prog2

Here prog2 is executed only if prog1 fails. That is the command stops after the first successful program.

You can force commands to be executed in a subshell by enclosing them in parenthesis. Since a subshell has it's own working directory and shell variables, this can be useful in some situations. Compare

$ pwd
/home/pchapin
$ echo $NAME
Peter
$ cd ..; NAME=PHOO
$ pwd
/home
$ echo $NAME
PHOO
$

with

$ pwd
/home/pchapin
$ echo NAME
Peter
$ (cd ..; NAME=PHOO)
$ pwd
/home/pchapin
$ echo NAME
Peter
$

The output of a command executed in a subshell can be redirected as a group. Compare

$ prog1; prog2 > afile.txt
$ (prog1; prog2) > afile.txt

In the first case only the output of prog2 is redirected. In the second case the output of both commands are redirected.

Since Unix is multitasking, it can run programs in the background while you enter additional commands. This is done by putting a '&' character at the end of the command line. For example:

$ prog args &
$

Notice that in the example above, the standard output of the background command is still the terminal. Thus any output produced by prog will be written directly on the terminal right over whatever else you are doing. Since this is usually not desireable, you should normally redirect the output of a background command.

$ prog1 | prog2 > afile.txt 2> errors.txt &
$

As you can see, pipelines can be run in the background. Also note that the standard error file has also been redirected with the '2>' operator. Thus error messages are prevented from interrupting your foreground process also.

It is possible to run multiple commands in the background by specifying a subshell or to run loops in the background (in which case a subshell is implicitly specified). For example

$ (prog1; prog2) &
$ for FILE in *
>   process $FILE
> done &
$

To write interesting scripts you need to know about branching, looping, etc. First, you should realize that every command returns an exit status value to the shell. This value is accessible via the $? shell variable. Normally, however, you do not use $?. Instead you use control flow commands that automatically check $?. Here is the format of the 'if' command.

$ if command
> then
>   # Do this if 'command' returns a zero status (TRUE).
> fi
$

Note that 'fi' is used to terminate an 'if' block. 'Fi' is 'if' spelled backwards.

The basic idea is simple. The command designated by command is run. If that command returns a status value of zero, the commands past the 'then' get executed; otherwise they are skipped. The 'then' keyword plays the role of a command. Since multiple commands can be placed on the command line, another way to format the example above is as

$ if command; then
>   # Do this if 'command' returns a zero status (TRUE).
> fi
$

The syntax of the 'if' command allows for an else. Here are some possibilites.

$ if command1; then
>   # Do this if 'command1' returns a zero status.
> else
>   # Do this if 'command1' returns a nonzero status.
> fi
$

or

$ if command1; then
>   # 'command1' was true.
> elif command2; then
>   # 'command2' was true.
> else
>   # Neither 'command1' nor 'command2' were true.
> fi
$

Many commands return a status code based on what they do. For example, the grep command returns success if it finds at least one match. Grep has an option which suppresses the printing of all matches (silent mode). At first such an option seems silly. However, it allows grep to be used in an 'if' command cleanly. For example

$ if grep -s 'Peter' afile.txt; then
>   # The string 'Peter' was found. Haven't told the user yet.
> fi
$

By far, the program most often used with the 'if' command is a program named test. Test simply makes a test and returns with an appropriate status. Test has many options.

$ test -f afile.txt # TRUE if afile.txt is a file that exists.
$ test -r afile.txt # TRUE if afile.txt is a read only file.
$ test -d afile.txt # TRUE if afile.txt is a directory.
$ test -x afile.txt # TRUE if afile.txt is executable.
$ test -r afile.txt # TRUE if afile.txt is readable.
$ test -w afile.txt # TRUE if afile.txt is writable.

$ test $X -eq $Y
  # TRUE if X and Y are equal (when converted to ints).
$ test $X -ne $Y    # X!= Y
$ test $X -gt $Y    # X > Y
$ test $X -lt $Y    # X < Y
$ test $X -ge $Y    # X >=Y
$ test $X -le $Y    # X <=Y

$ test $X = $Y
  # TRUE if X and Y are equal as strings.
$ test $X != $Y     # X not the same as Y

The difference between strings and integers is very significant. For example

$ FIRST=Hello
$ SECOND=There
$ if test $FIRST -eq $SECOND; then
>  echo They are the same.
> fi
They are the same.
$ if test $FIRST != $SECOND; then
>  echo They are different.
> fi
They are different.
$

The reason why the first test succeeded is because the string "Hello" became the integer zero in the test. Similarly the string "There" became a zero.

Note that spaces are important. The arguments to test are like any other program's arguments. Test is like any other program.

You can use the -o (OR) and the -a (AND) options of test to create more complex tests. For example

$ if test $X -eq $Y -a $A -eq $B; then
>  echo X == Y and A == B
> fi
X == Y and A == B
$

You can use parenthesis to create even more complex tests. However, since parenthesis are special to the shell, you must quote the parenthesis to prevent the shell from handling them. The shell must pass the parenthesis to the test program. For example

$ if test \($X = $Y\) -o \($A != $B\); then
... etc.

Since test is used so much, the shell has a special syntax for invoking it. For example

$ if [ $X -eq $Y ]; then ....

$ if test $X -eq $Y; then ....

Both the commands above are identical. The square bracket syntax is much easier to read. I will use it from now on. Note that there must be a space immediately after the open square bracket. For example

$ X=prog
$ if [$X = $Y]; then ....
[prog: not found
$

The shell attempted to run the command [prog. The square bracket alone signals the special test syntax. The spaces around the trailing ] are optional. For consistency, they are usually included.

The while loop follows many of the same rules and concepts developed above. Here is the basic syntax

$ while command
> do
>   # Do this as long as 'command' returns a TRUE status.
> done
$

Normally the command is an invocation of test with the special syntax. For example

$ while [ $X -lt $Y ]; do ....

The shell also provides an until loop. However, unlike what you would expect, this loop still performs its test before the loop body is entered. For example in

$ until commmand
> do
>   # Do this until 'command' returns a TRUE status.
> done
$

the loop body is skipped if command returns TRUE immediately.

Finally, the shell provides a case statement (similar in concept to C's switch statement). Here is it's syntax

case test-string in
  pattern-1)
    commands;;
  pattern-2)
    commands;;
  pattern-3)
    commands;;
esac

The 'test-string' is an arbitrary string of characters. The shell tries to match each of the patterns to the test-string. When it finds a matching pattern, it executes commands up to the ';;'. The patterns are tested in the order they appear. Furthermore, the shell uses the wildcard matching syntax within the patterns. For example suppose the following appeared in a script:

echo "Enter your response: \c"
read Response junk

case $Response in
  [yY]*)
    echo "You said YES!";;
  [nN]*)
    echo "You said NO!";;
  *)
    echo "I don't know what you're talking about.";;
esac

The script would assume you said "yes" to any response that started with a 'y' (upper or lower case). If your response started with an 'n', the shell would assume "no." Otherwise the shell prints an error message.

In addition, you can join several patterns with a logical OR operator (|). For example:

case $Response in
  yes | Yes)
    echo "You said YES!";;
  *)
    echo "Ok, ok, I won't do it.";;
esac

The shell allows you to define functions. A shell function is loaded when the shell sees the definition. However, the text of the function is not executed at the time the function is loaded. Instead, a shell function is called like a script. Parameters can be passed to the function and used in the function under the names $1, $2, etc.

For example

$ function(){
>   echo My first parameter is $1
>   echo My second parameter is $2
>}
$ function one two
My first parameter is one
My second parameter is two
$

Unlike C, you do not specify parameters inside the (). The () syntax just informs the shell that this is a function definition.

If a shell function is used inside a script, the values of $1, $2, etc in the function refer to the function's parameters and not the script's parameters. They are local to the function. Any other shell variable introduced or used in a function, however, is global. The shell does not really support local variables within functions.

Many scripts consist of several functions together with a small main body. Remember that shell functions are not executed when they are loaded. For example

Initialize(){
  #...
  #...
}

Print_Message(){
  #...
  #...
}

Do_Work(){
  #...
  #...
}

Clean_Up(){
  #...
  #...
}

Initialize
Print_Message "Hello There"
Do_Work
Clean_Up

You can use shell functions to simulate aliases or commands from other operating systems.

cls(){
  echo "\033[2J"
}

dir(){
  ls -l $1
}

copy(){
  cp $1 $2
}

sendall(){
  mail `cat mailing.list` < $1
}

If you load these functions in your .profile script, they will be available to you whenever you are logged in. Note that although the examples above are simple, there is no limit to the complexity of a shell function. Large functions with many nested control structures are possible and realistic.

To really understand how the shell works, you need to see it in action. What follows is a simple examples of real shell script. See if you can understand how it works.

#!/bin/sh
#############################################################################
# FILE        : process
# AUTHOR      : Peter Chapin
# LAST REVISED: September 1999
# SUBJECT     : Sample shell script.
#
#      The following shell script accepts a list of filenames on the command
# line and presents the user with a menu of possible actions for each file in
# that list. Once the user has decided about the disposition of a file, the
# next file is presented.
#############################################################################

if [ ! -w . ] ; then
  echo "No write access to the current directory: Can't continue"
else

  # Loop over all the files specified on the command line.
  for FILENAME in $* ; do

    # Assume that we will stay on this file. Set to "no" below if otherwise.
    RETRY=yes

    # Ignore directories, etc.
    if [ ! -f $FILENAME ] ; then
      echo "$FILENAME is not a plain file: ignoring."
      echo "Strike RETURN to continue...\c"
      read junk
    else

      # Keep presenting the menu until user does something with the file.
      while [ $RETRY = "yes" ] ; do
        echo " "
        echo "\033[2J\033[1;1HFILE: $FILENAME\n"
        echo "  0) No action"
        echo "  T) Type"
        echo "  V) View"
        echo "  N) reName"
        echo "  R) Remove"
        echo "  C) make a new Copy"
        echo "  D) copy to a Directory"
        echo "  Q) Quit"
        echo "Enter command digit: \c"
        read RESPONSE junk

        # Handle each command.
        case $RESPONSE in

          0) # No action.
             RETRY=no;;

        T|t) # Type of file.
             file $FILENAME;;

        V|v) # View file.
             more $FILENAME;;

        N|n) # Rename file. Be sure new name is free for use.
             echo "Enter the new name: \c"
             read NAME junk
             if [ -f $NAME -o -d $NAME ] ; then
               echo "$NAME already exists"
             else
               mv $FILENAME $NAME
               RETRY=no
             fi;;

        R|r) # Remove file (no questions asked).
             rm $FILENAME
             RETRY=no;;

        C|c) # Copy file. Be sure new name is free for use.
             if [ ! -r $FILENAME ] ; then
               echo "Can't read $FILENAME"
             else
               echo "Enter the name of the new copy: \c"
               read NAME junk
               if [ -f $NAME -o -d $NAME ] ; then
                 echo "$NAME already exists"
               else
                 cp $FILENAME $NAME
                 RETRY=no
               fi
             fi;;

        D|d) # Copy to a directory. Be sure name is really that of a dir.
             if [ ! -r $FILENAME ] ; then
               echo "Can't read $FILENAME"
             else
               echo "Enter the name of the destination directory: \c"
               read NAME junk
               if [ ! -d $NAME ] ; then
                 echo "$NAME is not a directory"
               elif [ ! -w $NAME ] ; then
                 echo "Can't write in the directory $NAME"
               else
                 cp $FILENAME $NAME
                 RETRY=no
               fi
             fi;;

        Q|q) exit;;
          *) echo "I don't understand $RESPONSE"
        esac

      echo "Strike RETURN to continue\c"
      read junk

      done  # End of while... loop which waits for RETRY to become "no"
    fi      # End of if...else... which ignores special files.
  done      # End of while... loop which processes all the filenames.
fi          # End of if...else... which checks for writability to .