Final Project Showcase
AutoHint: Automatic SQL Hint Injection in Postgres
Students: Song Fu, Xueqi Li, Bobby Norwood
Source Code: https://github.com/bobby060/pg-autohint
AutoHint implements an automatic hinting engine for Postgres. It uses pg_hint_plan to implement heuristic, rules-based improvements to Postgres query plans without modifying Postgres source code. The framework is fully extensible and includes to ready-made rules: cardinality injection and an intelligent Nested Loop Join to HashJoin conversion.
Verified LLM SQL Query Rewrite
Students: Ruiqi Wang, Melody Hu, Prashanth Duvvada
Source Code: https://github.com/s-wangru/Verified-SQL-query-rewrite
This project implements a pipeline for SQL query optimization using a LLM. It supplies the LLM with table schemas and statistics, prompting it to generate a new optimized query. It then verifies the semantic equivalence using semantic prover (QED) and runs both queries on synthetic datasets to check for identical outputs. The data distribution of these synthetic datasets can be modified to improve the robustness of equivalence checking.
optd: A Cascades-Style Query Optimizer
Students: Yuchen Liang, Sarvesh Tandon
Source Code: https://github.com/cmu-db/optd
This project implements a transformation-based optimizer that improves upon the Cascades framework. Compared to Cascades, our design enables on-demand exploration of the search space based on rule specification. We also provides a complete solution for tracking equivalent expression groups, addressing the "group merging" problem where existing implementations either miss optimization opportunities or wasting time doing redundant computations. Combined with a declarative language for expressing rules and the cost model, our system aims to provide a Query Optimizer as a Service (QOaaS) to many execution engines.
Query Optimizer Tester & Debugger
Students: David Freifeld, Yu Liu, Jiaying Li
Source Code: https://github.com/LYZJU2019/15799-p2
optdbg is an end-to-end tool built on top of Apache DataFusion designed to detect and explain errors in query optimizer cardinality estimation and cost modeling. Building on the methodologies of existing tools for evaluating query optimizers like TAQO and AutoDI, it samples alternative query plans for queries using a variety of novel methods, benchmarks plans and subplans within said sample, then compares the measured results to predicted cardinalities/costs to find mispredictions. It then reports high-level metrics about query optimizer quality and employs static analysis to flag potential bugs in implementations.
German Subquery Unnesting
Students: Sirui Huang, Roland Liu, Harivallabha Rangarajan
Source Code: —
This attempts to implement the 2025 unnesting algorithm in DuckDB, with initial support for scalar subqueries. It improves query handling by unnesting child subqueries first and using equivalence columns to avoid redundant joins. The implementation passes our correctness tests, and a PR is planned for upstream contribution.
Hypostats: Injecting Hypothetical Statistics into PostgreSQL
Students: Roy Huang, Ethan Kwong
Source Code: https://github.com/rgyhuang/HypoStats
Hypostats is a PostgreSQL extension that enables the importing and exporting of statistical data stored in the 'pg_statistic' table. This allows engineers to use statistics without running ANALYZE with the underlying dataset, which can be costly when migrating and restoring databases from backups. Users can also modify statistics and run EXPLAIN queries, allowing for lightweight query plan optimization experimentation. This project also implements extensible API endpoints and UI elements for statistic import/export and modification in a production setting.
Multiple Query Optimization
Students: Yuttapichai Kerdcharoen, Yizhou Chen, Frank Chen
Source Code: https://github.com/ychen884/multi_query_opt
Data transformation tools (like dbt) allow users to provide multiple queries as a dependency graph to perform their tasks. These queries often contain redundant computations, resulting in unnecessary execution time. Our project implements an optimizer that rewrites the dependency graph to enhance performance. This optimizer currently employs three optimization heuristics: predicate pushdown, common CTE elimination, and projection pushdown. Additionally, we experimentally incorporate statistics from databases to determine whether to apply these heuristics. To evaluate our optimizer, we synthesize dependency graphs from TPC-H databases. The preliminary results show that the optimizer can significantly improve performance.
PostgreSQL Statistics Injection
Students: Zhiping Liao, Ethan Lin
Source Code: https://github.com/ArArgon/hypostats
This project builds a Rust extension to export Postgres statistics and catalog to JSON format and re-import these statistics. The project then explores different methods that use statistical tools and analyze these statistics to discover column correlations from joint distributions.