Due: Friday, February 10, 2023
Reading: Refer to the full definition of the VariableStars database to see examples of several CREATE TABLE commands being used. The PostgreSQL documentation on CREATE TABLE is very complete, although a bit daunting. Similarly, the PostgreSQL documentation on supported data types is very complete.
In the text, section 7.2 on page 149 talks about the SQL Data Definition language (of which CREATE TABLE is one aspect. The text discusses CREATE TABLE in more detail in section 7.2.2 on page 151. Section 7.2.3 on referential integrity is also good to look at. That discusses the foreign key constraints I talked about in class.
In this lab assignment you will do a "version 1.0" design of a database for peakbaggers. A peak bagger is a hiker or climber who sets out to ascend all the mountains on some checklist of mountains. There are proponents and opponents to this activity; discussing the pros and cons of peak bagging is outside the scope of this assignment. Ultimately the database you set up here could potentially be used by a peak bagger website, similar to the one linked above, that allows hikers to find mountains to climb and record information about their ascents.
Before you start, create a new database in your server instance by using the createdb command as you did in Lab #1. Use the name xyzPeakBaggers, replacing xyz with your initials as before.
At this point you have two ways of working. You can choose to either...
Log into Lemuria (creating two sessions may be convenient) and create a text file to hold the SQL needed to create your database. You can use a text editor such as nano (or others) on Lemuria. Then in psql use the \i command to execute the SQL script you created. Be sure you are in your cis-3010 folder when you do this (for backup purposes).
Use DBeaver to create a new PeakBaggers project. Add a connection to that project that connects to your PeakBaggers database, and then open a new SQL script in that project. Use this script to create the database with appropriate commands.
This lab is intentionally open-ended in certain respects. However, you will want to be sure your database can accept the mountain data that I provide. There are also various requirements as described below.
For our purposes we will only consider mountains in the United States. The significance of this decision is twofold. First, it will be meaningful to talk about which state each mountain is in and to use a two letter abbreviation for the states. Second, we will record the mountain elevations in feet (rather than meters, as would be appropriate for a more international system). We also won't worry about "international" character sets being used for the names of mountains.
Your goal is to create an SQL script containing four CREATE TABLE statements to define the base tables of the database. These tables are:
state. This table should contain a mapping between the full name of a state and its two-letter abbreviation. Use the abbreviation as the primary key.
The script state-data.sql provides information about the 50 states in the US that you can use to populate your table.
mountain. This table contains information about the mountains themselves. It should contain at least: the name of the mountain, the state and town in which the mountain is located, the longitude and latitude of the mountain's summit, and its elevation. Use an artificial id number for the key (why?). I suggest calling the key attribute id and have the database system automatically assign values to it (see the more complete variable star example for how to do this).
The script mountain-data.sql provides information about some mountains in New England that you can use to seed your mountain table.
climber. This table contains information about the people doing the climbs. It should contain at least: the name of each person (separate the first and last name so queries against just the last name are easier to express), an email address as contact information, and the climber's birthdate. Again, use an artificial id number for the key (why?).
The script climber-data.sql provides information about some climbers that you can use to seed your climber table.
ascent. This table contains information about the actual climbs done (sometimes called "ascents"). It should contain at least: a reference to the climber who did the climb, a reference to the mountain climbed, the date of the climb, and a field for a URL pointing to a web location for the climber's comments about the climb (if any). For example, the climber might have a blog and the URL attribute would specify the location of the blog entry for the climb. Again, use an artificial id number for the key (why?)
The script ascent-data.sql provides information about some ascents that you can use to seed your ascent table.
Feel free to include other fields that seem reasonable. It is my intention to leave the specification of this database somewhat open-ended to encourage you to reflect on the various possibilities. In a realistic application you would have long conversations with other stakeholders about the requirements of the database, and that would lead to specific choices about the data that needs to be stored and the data types used to represent that data.
As I mentioned above, it is my intention to let you finalize some details of this database. However, here are some things to consider:
In general try to use the most appropriate data type for each field. Consult the PostgreSQL documentation for the data types supported. Using the best data type will help the database ensure data consistency and will make complex queries easier since specialized types often have specialized operators that can be used with them. Often the best type to use for a field is not immediately obvious. Don't just make every field a string!
Do not attempt to store the rank of a mountain on any particular list (the tallest, etc.). This database is intended to support many peak bagging lists. It should be possible to generate a list by using an appropriate query (for example: "find all mountains in the New England states, ordered by descending elevation, and take the top 100 of them"). You will get more practice with queries in the next lab.
Consider which fields must have a value ("NOT NULL") and which fields can be empty of a value. Specify the non-optional fields appropriately.
Consider any additional constraints that might apply to a field other than its data type and consider using a "CHECK" constraint on that field as appropriate. For example, elevations can't be negative (or can they?). The idea is to allow the database to do as much checking of the data as practical to validate the data. This relieves the front-end applications from doing those checks, and it ensures consistency even when used by multiple front-ends written by different groups.
Keep in mind that the VARCHAR data type is likely slower to process than fixed width CHAR fields. Use VARCHAR when appropriate, but use CHAR when possible. It is reasonable to use VARCHAR when the values being stored tend to be long and tend to vary in length considerably from record to record.
Consider adding an attribute to the mountain table that specifies if the mountain has an officially maintained trail to its summit. This attribute should probably have an enumeration type since 'y', 'n', and 'h' (for "herd path") would all be reasonable values. A herd path is a defacto trail created by climbers following the same route, even though it is not officially named and maintained. To create an enumeration type use a CHECK constraint that specifies the allowed set of values.
Submit your SQL script with the four CREATE TABLE statements to Canvas. Include, for example in comments, some of your reasons behind whatever choices you made. Also be sure to include your name in the script at the top (20 points).
Last Revised: 2023-02-01
© Copyright 2023 by Peter C. Chapin <pchapin@vtc.edu>