What is BigDAWG?

BigDAWG Overview Video

BigDAWG is an open source project from researchers within the Intel Science and Technology Center for Big Data (ISTC). BigDAWG is a reference implementation of a polystore database. A polystore system is any database management system (DBMS) that is built on top of multiple, heterogeneous, integrated storage engines.

The current release includes support for 3 database engines: PostgreSQL, SciDB and Accumulo. We allow users to easily download Docker containers with middleware, databases and pre-loaded data along with example queries.

For the most part, we hope that you will download the release, experiment with the data we have distributed and create your own queries. Please do reach out to us if you have some bigger goals in mind - we are happy to help you navigate.

BigDAWG Initial Release Architecture

BigDAWG V 0.1 Architecture


Why use BigDAWG?

BigDAWG should be of interest to anyone seeking a simpler way to use data that spans multiple data models and data stores, such as research analysts, data scientists and database administrators. But we’ve worked hard to make it easy for anyone to try BigDAWG by releasing the code in a set of Docker containers that will automatically run a cluster of three different database engines.


What have we used BigDAWG for?

In the course of creating BigDAWG, we worked with two real-life use cases involving complex, multimodal datasets:

  • MIMIC II, an openly available health data set developed by the MIT Lab for Computational Physiology, comprising de-identified health data associated with ~40,000 critical care patients. It includes demographics, vital signs, laboratory tests, medications, and more.

We hope that you will use BigDAWG islands and database engines to develop new applications such as data visualizations and deep analytics.

Getting Started

Check out our documentation and download code from the following links:

Image result for read the docsAdobe PDF logo Image result for github symbol





This release of BigDAWG is released under the BSD License.
Please note that this refers to the middleware pieces.
External database management systems or software are distributed under their respective license agreement.