Built For Data Exploration

There's no one right way to explore or transform your data. Vizier is a multi-modal data exploration and analysis tool that lets you seamlessly jump between python, SQL, a spreadsheet-style interface, and more. Not just another notebook, Vizier keeps track of everything that you've done (and haven't done) with your data. When something turns out funny - and it will - Vizier makes it easy to pinpoint exactly why.

Get Vizier » Learn More »


Multi-Modal Interface

Spreadsheets, Python, SQL, D3, all working together seamlessly. Use whichever tool is right for the job.

Best-Effort Data Ingest

Vizier's powerful, automated data cleaning and ingest tools get up and running with your data as fast as possible.

Data Debugger

Vizier's notebook keeps track of everything you do with your data, making it easy to track down bugs in your data processing pipeline.

Versioned Notebook

Vizier's provenance-backed versioned notebook lets you easily branch, checkpoint, and time travel through your notebooks.

Collaboration

Share and release versioned snapshots of notebooks with full edit history, and easily document problems and to-dos through annotations that travel along with the data.

Scalable and Compatible

Vizier is built on Spark, so you know you're getting great scale out and compatibility with all sorts of data sources.


Multi Modal Interface

Python

Unleash the full power of Python to transform and visualize your data.

Spreadsheets

Safely edit your data directly, with all of your changes tracked and available for undo.

Graphs

Vizier includes simple, easy-to-use data visualizations right out of the box.


Provenance for Curation and Exploration

Vizier enables worry-free exploration. A simple notebook interface mirrors a spreadsheet view of your data, tracking the provenance of your edits. Provenance is at the heart of Vizier, making it easy to undo and redo actions and allowing Vizier to suggest new curation steps, visualizations, or to make guesses about your data. Finally provenance allows you to develop curation workflows on small data sets and then seamlessly deploy them to larger datasets (e.g., via Spark or Hadoop)


The Team

  • Mike Brachmann (UB)
  • Heiko Mueller (NYU)
  • Sonia Castelo (NYU)
  • Carlos Bautista (NYU)
  • Juliana Freire (NYU)
  • Boris Glavic (IIT)
  • Oliver Kennedy (UB)

Publications

    Communicating Data Quality in On-Demand Curation   QDB 2016   ( paper )
    The Exception That Improves The Rule   HILDA 2016   ( paper )
    Provenance-aware Versioned Dataworkspaces   TaPP 2016   ( paper )