Educational Objectives
Upon successful completeion of this course you should be able to:
- Understand the state-of-the-art in database management systems and distributed systems.
- Interpet and critically analyze research papers on solving problems with data intensive workloads and applications.
- Apply database concepts to solve high-velocity and high-volume data problems.
Course Layout
This is a seminar course, thus there are no exams. You will be graded on the basis of your participation in projects and presentations. There will be readings assigned for each class. Each class will have one or more presenters whose job it will be to lead the discussion in class about those papers. The non-presenters should prepare a one-page position paper on the topic. In addition to these discussions, a single student will be assigned to give a short "lightning" talk about a new database system or technology.
Your final grade for the course will be based on the following weights:
- 20% — Reading Reviews + Class Participation
- 25% — Paper Presentations
- 5% — Lightning Talk Presentation
- 50% — Programming Project
Reading Reviews
At the beginning of each class, each student (except for those that are presenting that day) will need to turn in a one page review of the assigned papers readings for that day. Note that this review will only need to cover the mandatory readings, but students are encouraged to peruse the supplmental readings. Be sure to include your name and SCS login at the top of the paper.
Students are allowed to miss the reading review submissions for two classes during the semester. Note that here a "submission" means all of the reviews for a single day of class. For example, if there are two papers being read for a class that you miss, that will count as only one missed submission. Late submissions will not be accepted without prior approval from the instructor.
Each review must include the following information:
- An overview of the main idea and contributions. (One paragraph)
- Three positive comments about the paper. (One sentence each)
- Three negative comments about the paper. (One sentence each
- Technical / research discussion questions unanswered about the the paper.
WARNING: These weekly reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.
Paper Presentations
Each student will choose at least two dates from the schedule and present the papers assigned on those days to the class. This talk is supposed to be an in depth description and analysis of the papers. It should be 60 minutes long (approximately 30 minutes per paper) and then with 15-20 minutes remaining for questions. The format of the talk should be similar to a conference presentation. Because it is the responsibility of the presenter to teach the class about the papers, he/she will be expected to know and understand all the aspects of the material. Thus, it is important to be prepared. This may require you to do additional background reading. If you have questions regarding the content of your assigned papers, you should arrange to meet with the instructor well in advance of your talk date.
WARNING: It is acceptable for students to use information and content (e.g., images and graphics) found on the Internet but the original source must be properly attributed/cited. No credit will be given for presentations without proper citations. See CMU's Policy on Academic Integrity for additional information.
Lightning Talks
Each student will be expected to give a five minute lightning talk on an existing "Big Data" system or technology of relevant to the class. The goal of the lightning talk is simply to introduce the system to the class and cover the main points/features. It is not intended to be an in depth review of the system, but rather is meant to introduce a large amount of systems in a short amount of time. We want the presenter to be able to pull out only the relevant information and present it.
The main points to include in your lightning talk are:
- What is a one-sentence description of the system?
- What problem is this system trying to solve? Think about both the data and the workloads and possibly give an example (no demos!).
- What makes this system different from previous/other systems (if anything)?
- What are the high-level key architectural characteristics?
Again, the purpose of a lightning talk is introduce the system, and thus only five minutes will be allocated for each presentation. Make sure you practice your presentation and are right at the five minute mark, as it will be a hard cutoff.
WARNING: It is acceptable for students to use information and content (e.g., images and graphics) found on the Internet but the original source must be properly attributed/cited. No credit will be given for presentations without proper citations. See CMU's Policy on Academic Integrity for additional information.
Projects
The main component of this course will be the project. Students will organize into groups and choose to implement a project that is (1) relevant to the materials discussed in class and (2) requires a signficant programming effort from all team members. All of the projects will involve "Big Data" management and/or analysis. The projects will vary greatly in both scope and topic. This will depend on several factors, including group size, group background, and topic. We will discuss this more in depth during class, though you are encouraged to begin to think about projects that interest you now. If you are unable to come up with your own project idea, the instructor can provide suggestions on interesting topics.
Each project group will present their proposals to the class to get feedback from their peers. They will then meet individually with the instructor afterwards for additional discussion and clarification of the project idea.
See the Project Overview page for a more detailed explaination about what is expected.