FastPDB

Active Students: Aaron Huber

Supported By: NSF Award #IIS-1956149

Probabilistic databases allow users to track uncertainty in data, and to better understand their effects on the outcome of queries. These are key requirements for accurate decision-making and rigorous data science over noisy data. Unfortunately, probabilistic databases have historically been slow and hard to use. The FastPDB system, part of the overall Uncertainty4U project, is an effort to overcome the performance limitations of probabilistic databases.

Past efforts to make probabilistic databases efficient have focused on so-called set-probabilistic databases, where the underlying data model is one of sets, and the primary objective is to compute the probability of a specific outcome. Several database systems, including Pip and MCDB, adopt the more common and more efficient bag semantics, focusing instead on computing expectations. Although expectations are more efficient to compute, one of our key findings is that even these must be asymptotically slower than analogous non-probabilistic queries.

Our key insight is that standard processes for sampling from query results can be inlined into standard query evaluation. We are developing the FastPDB system as an extension of the XDB Approximate Query Processing System and the GProm provenance tracking tool. XDB approximates query results in a fraction of the time of a normal database system. As our SIGMOD 2025 paper shows, the resulting system can approximate the expectation of a bag-probabilistic query result in a fraction of the time required to produce a deterministic result.

(The FastPDB project is being developed in collaboration with Boris Glavic, Atri Rudra, and Zhuoyue Zhao)


Software

To Be Released

Publications

FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds
Aaron Huber, Oliver Kennedy, Atri Rudra, Zhuoyue Zhao, Su Feng, Boris Glavic

This page last updated 2025-03-04 15:41:55 -0500