Final Project Showcase | CMU 15-799 :: Special Topics in Databases: Query Optimization (Spring 2025)

Last Updated: May 08, 2025

AutoHint: Automatic SQL Hint Injection in Postgres

Students: Song Fu, Xueqi Li, Bobby Norwood

Source Code: https://github.com/bobby060/pg-autohint

AutoHint implements an automatic hinting engine for Postgres. It uses pg_hint_plan to implement heuristic, rules-based improvements to Postgres query plans without modifying Postgres source code. The framework is fully extensible and includes to ready-made rules: cardinality injection and an intelligent Nested Loop Join to HashJoin conversion.

Verified LLM SQL Query Rewrite

Students: Ruiqi Wang, Melody Hu, Prashanth Duvvada

Source Code: https://github.com/s-wangru/Verified-SQL-query-rewrite

This project implements a pipeline for SQL query optimization using a LLM. It supplies the LLM with table schemas and statistics, prompting it to generate a new optimized query. It then verifies the semantic equivalence using semantic prover (QED) and runs both queries on synthetic datasets to check for identical outputs. The data distribution of these synthetic datasets can be modified to improve the robustness of equivalence checking.

optd: A Cascades-Style Query Optimizer

Students: Yuchen Liang, Sarvesh Tandon

Source Code: https://github.com/cmu-db/optd

This project implements a transformation-based optimizer that improves upon the Cascades framework. Compared to Cascades, our design enables on-demand exploration of the search space based on rule specification. We also provides a complete solution for tracking equivalent expression groups, addressing the "group merging" problem where existing implementations either miss optimization opportunities or wasting time doing redundant computations. Combined with a declarative language for expressing rules and the cost model, our system aims to provide a Query Optimizer as a Service (QOaaS) to many execution engines.

Query Optimizer Tester & Debugger

Students: David Freifeld, Yu Liu, Jiaying Li

Source Code: https://github.com/LYZJU2019/15799-p2

optdbg is an end-to-end tool built on top of Apache DataFusion designed to detect and explain errors in query optimizer cardinality estimation and cost modeling. Building on the methodologies of existing tools for evaluating query optimizers like TAQO and AutoDI, it samples alternative query plans for queries using a variety of novel methods, benchmarks plans and subplans within said sample, then compares the measured results to predicted cardinalities/costs to find mispredictions. It then reports high-level metrics about query optimizer quality and employs static analysis to flag potential bugs in implementations.

German Subquery Unnesting

Students: Sirui Huang, Roland Liu, Harivallabha Rangarajan

Source Code: —

This attempts to implement the 2025 unnesting algorithm in DuckDB, with initial support for scalar subqueries. It improves query handling by unnesting child subqueries first and using equivalence columns to avoid redundant joins. The implementation passes our correctness tests, and a PR is planned for upstream contribution.

Hypostats: Injecting Hypothetical Statistics into PostgreSQL

Students: Roy Huang, Ethan Kwong

Source Code: https://github.com/rgyhuang/HypoStats

Hypostats is a PostgreSQL extension that enables the importing and exporting of statistical data stored in the 'pg_statistic' table. This allows engineers to use statistics without running ANALYZE with the underlying dataset, which can be costly when migrating and restoring databases from backups. Users can also modify statistics and run EXPLAIN queries, allowing for lightweight query plan optimization experimentation. This project also implements extensible API endpoints and UI elements for statistic import/export and modification in a production setting.

Multiple Query Optimization

Students: Yuttapichai Kerdcharoen, Yizhou Chen, Frank Chen

Source Code: https://github.com/ychen884/multi_query_opt

Data transformation tools (like dbt) allow users to provide multiple queries as a dependency graph to perform their tasks. These queries often contain redundant computations, resulting in unnecessary execution time. Our project implements an optimizer that rewrites the dependency graph to enhance performance. This optimizer currently employs three optimization heuristics: predicate pushdown, common CTE elimination, and projection pushdown. Additionally, we experimentally incorporate statistics from databases to determine whether to apply these heuristics. To evaluate our optimizer, we synthesize dependency graphs from TPC-H databases. The preliminary results show that the optimizer can significantly improve performance.

PostgreSQL Statistics Injection

Students: Zhiping Liao, Ethan Lin

Source Code: https://github.com/ArArgon/hypostats

This project builds a Rust extension to export Postgres statistics and catalog to JSON format and re-import these statistics. The project then explores different methods that use statistical tools and analyze these statistics to discover column correlations from joint distributions.

Final Project Showcase Last Updated: May 08, 2025