CMU 15-799 - Spring 2022

Special Topics: Self-Driving Database Management Systems

Project #2 - Self-Driving Infrastructure


Overview

The main portion of a student's grade in this course is the group project on building self-driving database infrastructure. Students will organize into groups of three and choose to implement a project that is (1) relevant to the materials discussed in class, (2) requires a significant programming effort from all team members, and (3) unique (i.e., two groups may not choose the same project topic). The projects will vary in both scope and topic, but they must satisfy this criteria. We will discuss this more in depth during class, though students are encouraged to begin to think about projects that interest them early on. If a group is unable to come up with their own project idea, the instructor will provide suggestions on interesting topics.

Each project is comprised of four tasks that are due at different times during the semester:

  • Proposal Presentation: Each group will provide a proposal of their project topic and present it to the class.
  • Status Meeting: Each group will meet with the instructor to discuss their plans for the project update presentation.
  • Project Update Presentation: Each group will provide a brief update to the class about the current status of their project.
  • Project Design Document: Each group will write a design document that describes their project implementation.

All projects must be implemented in CMU's NoisePage Pilot project for PostgreSQL. At a high-level, each project consists of three implementation tasks. The first is the actual implementation of the proposed idea in the system. The second is the set of unit and regression tests that they will use to check whether their implementation is correct. The final piece is the evaluation of their implementation to determine how will the system performs with it.

Each group must use a single Github repository for all development. Everyone will be provided with an account on the CMU Database Group development servers and additional Amazon AWS credits.

  • Release Date: Feb 28, 2022
  • Due Date: Apr 20, 2022 @ 11:59pm

Proposal Presentation (Due Date: March 14th, 2022)

Each group will give a 10 minute presentation about their proposed project topic to the class. This proposal should contain the following information:

  • An overview of what work must be done and how it will be divided amongst the group.
  • The tests that you will write to validate that your project is correct and the experiments that you will use to measure its performance.
  • The resources you will need to complete the project. This includes software, hardware, data sets, or workloads.

Your proposal should also provide three types of goals: 75% goals, 100% goals, and 125% goals. Think of these as the equivalent of a B grade, an A grade, and a "wow!" grade. The goals can be dependent or independent of the prior goals. Each group can meet individually with the instructor afterwards for additional discussion and clarification of the project idea.

Each group should email the instructor a PDF version of their presentation before class.

Status Meeting (Due Date: April 4-8, 2022)

Each group will meet with the instructor in private and discuss the current status of the project. This will be a preview of the group's status update presentation in the subsequent class. Students should bring up any unexpected challenges or issues with their project implementation.

Project Update Presentation (Due Date: April 20th, 2022)

Each group will provide a brief update to the class half way through the project on the the current status of their implementation. The update presentation should contain the following information:

  • An overview of the development status of their project as related to the goals discussed in the initial proposal.
  • Any information about whether the groups' original plans have changed and an explanation as to why.
  • A measurement of the current code coverage of the tests for your implementation.
  • Color commentary about any surprises or unexpected issues that the group encountered during coding.

The goal of this exercise is to make sure that everyone in the class is aware of what the other groups are working on and how far along they are in the process. That way if one group has worked on part of the system that another group still needs to investigate, then they can talk to each other and share knowledge.

Project Design Document (Due Date: April 20th, 2022)

As part of the status update, each group must provide a design document that describes their project implementation. This document should contain the following information:

  • Overview: A description of the problem that you are trying to solve with your project. That is, what are the high-level goals of the code that you are adding to NoisePage.

  • Scope: Describe which parts of the system will this feature rely on or modify. For each new component that you are adding, describe where it "lives" in NoisePage's architecture and how it will interface/interact with other parts of the system.

  • Architectural Design: An in-depth overview of how you will implement your project. Explain the input and output of the component, describe interactions and breakdown the smaller components if any. You should also describe what (if any) configuration knobs your component will need.

  • Design Rationale: An explanation on why you chose the given design. Your justification should discuss issues related to (1) correctness, (2) performance, (3) engineering complexity / maintainability, and (4) testing. It should also include a brief discussion of the other implementations that you considered and why they were deemed inferior.

  • Testing Plan: A detailed description of how you are going to determine that your implementation is both (1) correct and (2) performant. You should describe the short unit tests and long running regression tests.

  • Trade-offs and Potential Problems: Describe any conscious trade-off you made in your implementation that could be problematic in the future, or any problems discovered during the design process that remain unaddressed (technical debts).

  • Future Work: List any future enhancements or optimizations to your project that you think are worth pursuing after the semester is over. You should provide a rough approximation of the difficulty in the implementation and the expected benefit in terms of either software engineering .

  • Glossary (Optional): List any new concepts or unintuitive/non-standard names that you have added to the system.

This part of the project is meant to encourage each group to think through their implementation. They will also serve as guides for future students in helping them understand what you did after you have left CMU and are potentially dead.

Each group should maintain their design document throughout the semester.

External Code & Libraries

Before a group can use a third-party source code or libraries for their project implementation, they must first get approval from the instructor.

In general, a group is only allowed to incorporate external source code into NoisePage's code base if (1) it is not provided as a Debian package and (2) it is Apache Software License compatible (e.g., BSD, MIT license). GPL code is not allowed.

Collaboration Policy

  • Everyone has to work in a team of three people for this assignment.
  • Groups are allowed to and strongly encouraged discuss the details about the project with others.

WARNING: All of the code for the core portion of your project must be your own. You may not copy source code from other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.