INF1343

Data Modeling and Database Design

Winter 2012

All materials (slides, exercises, answers to quizzes) and posted under the corresponding weeks.

A note on the final exam

The final exam will be held during the last class in Room 538 in Bissel. (Security code is required to access the 5th floor.)

The questions on the exam will be similar to those asked in quizzes and exercises, with an addition of free-form questions that may cover topics from the most recent weeks. A list of sample questions can be found here.

The best way to prepare for the exam would be by going through this semester's exercises and quizzes as well as through the sample "miscellaneous" questions mentioned above. However, if you want, you can also find last year's practice exam here. Please note, though, that the structure of the exam is going to be different this year (fewer but more comprehensive questions), and this year's quizzes would be a better guide than last year's practice exam. Unfortunately, I cannot provide the answers to the practice exam.


Click here to go to the current week.


Communication Policy

All questions about course material or software should be asked through the Q&A site at http://inf1343.ischool.utoronto.ca/. Before asking a question, please check if the requested information is available in answers to earlier questions, in the course outline, in handouts, or in emails from the instructor. If you ask a question and then find an answer to it, please post the answer. Questions on the content of the course that are likely to require lengthy explanations should ideally be asked in class or brought to office hours.

Students are also encouraged to answer questions posted on the site. Students who consistently provide useful answers will receive extra credit (at the discretion of the instructor).

The content of the Q&A site will be visible to your fellow students as well as the world at large. If you do not want to reveal your identity to others, please pick a username that you do not use on other sites. Please use your Utormail address, however. (Your email will not be shown to other users but will be visible to the instructor.) Please note that if your email is enrolled with Gravatar, then the site will use your Gravatar icon, which may indirectly identify you. If you do not want this to happen, change your Gravatar icon to something else. If you have further privacy concerns in regards to the Q&A website, please discuss them with the instructor.

Email should only be used for questions that cannot be asked on the Q&A site (see below) due to privacy or security considerations (e.g., questions about your grades or absences or your passwords). Please expect a response within 2 business days. (If you do not hear back within 2 days, please resend your message.)

The instructor may send course-related announcements to the students by email. Students are responsible for making sure they receive those announcements and read them.

Students who want to contact the instructor by email should send their messages from their Utormail addresses and include the course code (“INF1343”) in the subject line. Please expect a response within 2 business days. (If you do not hear back within 2 days, please resend your message.)

Students are strongly encouraged to make use of office hours.

Catalogue Description

“The purpose of this course is to provide an introduction to databases by analyzing their structure, content and measurement and by applying principles governing data modeling, database design and production with an emphasis on modeling, design and representation of content, decisions and tradeoffs involved in modeling, design and creation, and issues of standardization, security and emerging trends.”

Some Introductory Notes

This course will stress practical skills and hands-on experience with database design. The rationale for this is two-fold. First, the course will aim equip students to engage in actual database development upon completion. Second, thinking about databases abstractly requires building up intuition about how they work. The best way to acquire such intuition is through practical experience. Therefore, please read the section “software” below.

Materials and Resources

Required Materials

Required readings will be primarily from two books:

Those two books are referred to in the readings as “SQL” and “RDD” respectively.

Both of the books are currently available in the University of Toronto Bookstore (as of Dec. 27) and can be purchased either individually or together as a bundle. (Buying a bundle will save you 10%.) Students are encouraged to purchase paper copies. However, the University of Toronto library also provides access to electronic version of the first book (“SQL”, click here) and, until recently, provided electronic access to the second book (“RDD”) as well. We will be looking into whether anything could be done to restore access to RDD.

Update: Electronic access to RDD via the library has been restored as of Jan. 6.

Students who choose to use the electronic copies are advised to download and open the required chapters ahead of time. Unavailability of the electronic resources at the necessary time or software problems related to their use will not be accepted as valid reasons for late assignments.

Some of the required readings will come from other sources. Those are all available over the Internet, either openly or through the University Library’s MyAccess service.

Students may find the following additional resources useful:

SQL

We will be using a MySQL 5.5 database server for examples, exercises, and the final project. MySQL’s implementation of SQL is sometimes different from from the official standard described by Harrington. MySQL 5.5 Reference Manual does a good job describing MySQL’s implementation. It is also searcheable. Students are strongly encouraged to learn how to make use of this resources.

SQL in a Nutshell provides a summary of SQL commands with notes on what is supported in different implementation.

Python

We will be doing some very basic Python programming. This class does not assume prior familiarity with programming and the necessary concepts will be taught in class and through assigned readings. However, students who need additional resources may want to look at the following:

Please note that we be covering the following topics: variables, numbers and strings, use of functions, conditions, loops. We will not need the following topics: lists, function definitions, methods, classes.

HTML

The final project will make use of HTML. The following resources may be useful:

Unix / Linux tools

When making use of the server, some familiarity with Unix commands can save you a lot of time.

Software

Hands-on use of database software is absolutely essential to your success in this class. Each student will get an account on a database server which will allow them to make queries against databases provided by the instructor. Each student will also have their own database on the database server in which they will be able to create their own tables and insert their own data.

Students are expected to spend substantial time interacting with the database server. There is simply no way to learn the material presented in the class without spending a lot of time interacting with the software. The amount of time a particular student would need to spend may vary depending on their background. However, for most students this time will be measured in hours rather than minutes per week and should at the very least include trying all the examples presented in class and doing all of the required exercises. This should really be considered the bare minimum, though. Successful students would likely spend a lot of additional time trying variations on the examples and exercises.

Some students might be surprised at the level of precision that is necessary for writing successful SQL queries and the time that may be required to get a query right. A missing quotation mark or comma can really make a difference and it can sometimes take you an hour (or even several hours) to figure out that your query was failing because of a misplaced comma. This, however, is an inherent part of SQL programming and the only way out of it is practice.

In addition to the database software students will need to learn to use several other software application tools, also described below. Students will be shown how to use those tools in class (see the schedule).

A Database Client

During the first class students will learn to use a database “client” to connect to the database. The simplest method, which will be shown in class, is to use MySQL Workbench, a free / open source software client for MySQL that runs on Mac OS X, Linux, and Windows. MySQL Workbench is installed in the classroom and students will be able to use it during or after class. (There are 20 computers available in the classroom. Please share those computers if necessary.) Students who want to use their own computers can download MySQL Workbench for free and install it on their computers. Ideally, they should do so before the first class, since we will be making use of MySQL Workbench on the first day.

Please note that MySQL Workbench and some of the other database clients provide a wide range of features which we will not be learning or using in class. Our use of those tools will focus strictly on entering SQL statements.

An SCP Client

Later in the semester students will learn how to use an SCP client to transfer files from their computer to the server. We will be using WinSCP, which is a Windows application. There are many alternatives for other platforms.

An SSH Client

Later in the semester students will need be shown how to use an SSH client to connect to the server and to execute programs on it. Students can use the following software for this:

A Programmer’s Editor

Later in the semester students would want to use a simple programer’s text editor for editing their files. Those are not hard to use — they work like simple plain text editors (like Notepad) but add nice features that help you spot mistakes.

Diagramming Software

Students will need to use diagramming software for drawing ER diagrams. This can be done by using OpenOffice.org’s Draw with this template that provides Crow’s Feet arrows. This is the software that will be shown in class. There are many alternatives diagramming programs, and you can use them if you want. However, many of such programs have an important downside from a pedagogical standpoint: they blur the boundary between the ER model and the relational model, which can leave students confused. Additionally, such tools usually require more work to create a diagram because the user is expected to provide all the information for the later step of creating a database schema. This can save time in the long term if you are willing to rely on the schema generated by such a tool. However, in this course the students will be specifically asked to not use automatically generated schemas.

Students should not use hand-drawn ER diagrams in their preliminary or final project submissions.

Python

The final project will involve designing and implementing an information system with a database backend and a web interface. The web interface will be written in Python. We will be using only a subset of Python and the students are not expected to have programming experience prior to the class. (This class will not aim to teach you to become a Python programmer, but we will cover those very basic aspects of Python that will be necessary for completing the project.) You do not need to install Python on your computer — it is provided on the server and you will access it using SSH (see above).

Software Problems

Students who have problems installing or using software should ask their questions on the Q&A site as described in the course communication policy.

The Course at a Glance

Most of the course will be dedicated to relational databases, using MySQL 5.5 as an example. We will start by understanding the basics of the relational model and learning how to do queries against a database using one table at a time. We will then look at how to model data structures using ER diagrams as a step towards understanding multi-table databases. In week 5 we will learn how to do queries that use multiple tables. In week 6 we will look at normalization — finding and fixing common problems in database design. At that point students will be equipped to design and implement databases and will start working on the final project. The two weeks after that will look at how a database would be used in context of a larger information system. At this point students will learn how to build a web interface to a database and proceed with that aspect of the final project. The four classes after that will look at somewhat more advanced topics and will be somewhat more theoretical in focus.

The last course is reserved for the final exam.

The deadlines stated in the table below are for the standard grading system. Students who opt for the project option will have different deadlines.

  Date Topic Quizzes
(25% of the grade)
Things Due Weight
1 January 9 Introduction      
2 January 16 One-Table SQL  
3 January 23 The ER Model quiz #1
4 January 30 Converting an ER Design into a Relational Form  
5 February 6 Queries Using Multiple Tables quiz #2
6 February 13 Normalization quiz #3
  February 21
No class - reading week
7 February 27 Catch up. Dynamic SQL quiz #4    
8 March 5 Building Database-Driven Applications no Preliminary Design 10%
9 March 13 Advanced Queries no    
10 March 19 Relational Databases and Documents quiz #5
11 March 26 Database Security no
12 April 2 Storage, Structure, and Performance no Final Project Submission 25%
13 April 9 --   Final Exam (in class) 40%

The Detailed Schedule and Readings

This chart might help you visualize the order in which we will be covering the two books.

Week 1 | January 9

Introducing Databases and SQL

Overview of the course and ground rules. Databases in the context of modern information systems. Database servers and clients. SQL. Connecting to a database with a GUI client. The relational data model. Basic data types. Basic one-table SELECT. ORDER BY and LIMIT.

Exercises.

Slides: PDF, ODP.

Week 2 | January 16

More One-Table SQL

Complex conditions. Arithmetic, functions and system values. More data types. CREATE TABLE. INSERT and LOAD DATA. UPDATE and DELETE.

This week’s quiz. Exercises. Exercises with the results (this file shows the results you should be getting from the database, but not the queries) and answers.

Slides: PDF, ODP.

Week 3 | January 23

The ER Model

Requirements analysis and data modeling. The ER model. ER diagrams. Cardinality and optionality. Weak entities and belonging.

Exercises and answers .

Slides: PDF, ODP.

Week 4 | January 30

Converting an ER Design into a Relational Form

Primary and foreign keys. Natural and surrogate keys. The conversion steps. Implementing PK-FK constraints in SQL. CASE tools.

Exercises and answers. (Please try to complete the exercises before looking at the answers.)

Slides: PDF, ODP.

The diagram we designed during the class: ODG, PDF.

Solution to last week's quiz: PDF.

Week 5 | February 6

Queries Using Multiple Tables

Queries using multiple tables. Cartesian product of tables. JOIN. ON and USING. Inner and outer joins. Natural joins. “Traditional” joins.

Slides: PDF, ODP.

This week's quiz and the answers to it.

Exercises, partial answers (just the values, no queries), and full answers.

Week 6 | February 13

Normalization

Normalization. Normal forms. Fixing tables with ALTER TABLE.

Solution to last week's quiz are posted under week 5.

The final project handout is here. (Updated on after class on February 13 to correct due date and delivery format.)

Slides: PDF, ODP.

This week's quiz and the answers to it.

Exercises and answers.

Week 7 | February 27

Catchup. Embedded SQL.

Catching up in case we fall behind. Some examples. Python as an example of a procedural programming language. Making database queries from within a Python program. Dynamic SQL.

Please install PuTTY and WinSCP (or alternative SSH and SCP clients) before this class.

Slides: PDF, ODP.

Examples from the class: week-07-examples.zip.

Week 8 | March 5

Building Database-Driven Web Applications

Initial design for the final project due before the beginning of the class.

Python and Dynamic SQL recap. SQL in a web application.

Slides: PDF, ODP.

Solution to last week's quiz: PDF.

Examples from the class: week-08-examples.zip. Those files are a somewhat more developed version of what we did in class. Feel free to use this as a starting point for your own projects. You can see those scripts in action here.

No exercises this week - start working on your project.

Week 9 | March 12

Advanced Queries

LOAD DATA. Aggregation with GROUP BY and HAVING. Variables. Subqueries. Casting values.

Slides: PDF, ODP.

Exercises and answers.

Week 10 | March 19

Relational Databases and Documents

Full-text search. Markup languages. XML, YAML, JSON. Markup languages as input or output format. XML support in databases. Exporting and importing XML with --xml and LOAD XML. “Nosql” databases.

Slides: PDF, ODP.

Final project data is here.

Week 11 | March 26

Database Security

Users and access rights. Views. Firewalls. SQL injection attacks. Denial of service attacks.

Slides: PDF, ODP.

Solutions to last week's quiz: PDF.

Week 12 | April 2

Storage, Structure, and Performance

A paper copy of the project report and the digital submission are due before the beginning of the class.

Representing tables. Indexing. Storage Engines. Transactions. Backup. Replication.

Slides: PDF, ODP.

Week 13 | April 9

Final Exam

The final exam will be held during the last class in Room 538 in Bissel. (Security code is required to access the 5th floor.)

Assignments and Grading

The grading scheme is subject to revision until the first day of class.

Students are offered a choice of two grading systems in this course. The “standard” system, described below, is recommended for most students and applies automatically, unless the student requests the individual project option and such request is approved by the instructor.

Quizzes and Exercises

25% of the grade will be based on in-class quizzes which will be given in weeks 2 through 12. (The timing of the quizzes will not be announced beforehand.) There will be a total of 5 quizzes. Your grade will be based on your best four quiz results. In other words, your worst quiz result will be dropped. This also means you can miss one quiz without a penalty. (Your grade will be based on the four quizzes that you did take.)

To help you prepare for quizzes and the final exam, you will receive study problems and questions for each week of the semester. You are expected to work on those questions and problems. They are not an optional component of the course. However, you do not need to turn in your results, since you will not be graded on this component directly. (Rather, your work on the study problems will be graded through the quizzes.) The students are encouraged to work on study problems and question in groups. Students who want to get feedback on their solutions to study problems should bring them to office hours.

Final Project

For the final project you will design and implement an information system with a MySQL database back-end and a simple web interface. The overall purpose and functionality of the system will be specified in a project handout that will be distributed later in the course. Students who want to develop a system serving a different purpose can do so with instructor’s permission, but must discuss this with the instructor early in the semester. (Yes, this does mean that you will need to discuss this before you know what the standard final project is going to be.) The final project will consist of two submissions. The first submission is an initial design, worth 10% of the grade. The second submission is the actual implementation and a report, worth 25% of the grade.

The projects can be done individually, in pairs, or in teams of three. Students who want to work in groups are responsible for forming the groups. The groups will need to be formed before the due date of the first submission.

Final Exam

The final exam, worth 40% of the grade, will test your understanding of the material from the whole semester. You will be asked to write SQL statements, draw ER diagrams, and answer conceptual questions about database design. The final exam questions will closely resemble study problems and questions and/or those used in the quizzes.

Individual Project Option

Students who are already familiar with the material covered in the class and want to focus their time on expanding their knowledge beyond this can apply for the individual project option. Students who are approved for the individual project option will be exempt from quizzes. Instead, 60% of their grade will be based on an individual project (10% for the preliminary design and 50% for the final deliverable), with the other 40% based on the final exam. Students will have more freedom in chosing what to work on for their individual project but such projects will be evaluated in a more demanding way than the final projects under the “standard” grading option.

Students who wish to do so need to submit the following materials:

The student should also arrange to talk to the instructor during 2nd or 3rd week of class.

If a student’s application for the individual project is approved, the student will be expected to submit a revised preliminary design by February 6 and the final submission by April 2nd.

Late Assignments and Extensions

Assignment not submitted by the exact time when they are due will be considered late. (E.g., “17:00” means “17:00”, not “17:05.”) There will be a penalty for lateness of half a grade (e.g. from A to A–) for any assignment not submitted on time, with an an additional half-grade penalty deducted for any further 24 hours that the assignment is late. Work that is not handed in one week (168 hours) after it was due will not be accepted.

Graduating students are advised that late submission of the final project can make it impossible for the final grade to be issued in time for graduation.

Extensions will only be granted for documented medical reasons or in extra-ordinary circumstances.

Special Needs

Students whose circumstances require special accommodations should approach the instructor at the earliest opportunity (ideally during the first week of the semester) to discuss the situation and the possible accommodations.