{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "skip" } }, "source": [ "# How to render this notebook:\n", "\n", "* Rendering slides and serving webpage: `jupyter nbconvert HandsOn.ipynb --to slides --post serve`. Then open browser to [http://127.0.0.1:8000/HandsOn.slides.html#/](http://127.0.0.1:8000/HandsOn.slides.html#/)\n", "* Render pdf: `jupyter nbconvert HandsOn.ipynb --to pdf`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Hands-On Overview:\n", "\n", "* Demo cluster architecture\n", "* Working with the demo cluster: Docker basics\n", " * List the running containers\n", " * Logging into a container\n", "* Explore the Catalog and view the Postgres tables\n", "* BigDAWG Use cases: \n", " * Postgres to Postgres with BigDAWG\n", " * SciDB to Postgres with BigDAWG\n", " * Postgres to Accumulo with BigDAWG" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Demo Cluster Architecture\n", "\n", "## Docker Container Networking Diagram\n", "\n", "\"Smiley" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Docker Container Descriptions\n", "\n", "\"Smiley\n", "\n", "\"Smiley\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Working with the demo cluster: Docker basics\n", "\n", "![docker-whale](img/docker-whale.png)\n", "![docker-vs-vm](img/docker-vs-virtual-machine.png)\n", "\n", "Virtual Machines:\n", "\n", "* Each application includes separate OS and resources that are not shared.\n", "\n", "Docker: \n", "\n", "* Each application is run as an isolated processes on the host OS and shares its resources.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Open a Docker Quickstart Terminal\n", "\n", "Open a new Docker Quickstart Terminal. This is required to run any `docker` commands.\n", "\n", "![docker-quickstart-terminal](img/docker-quickstart-term.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## List the containers running on Docker\n", "\n", "A list of the running containers is shown. Containers can be referred to by container ID or name. The ports show which port local to a container is forwarded to the docker host port.\n", "\n", "Docker command:\n", "\n", "```bash\n", "> docker ps\n", "```\n", "\n", "Output:\n", "\n", "```\n", "docker ps\n", "CONTAINER ID IMAGE COMMAND CREATED\n", "9a041c769cfb bigdawg/accumulo \"/usr/bin/supervisord\" 2 hours ago\n", "59e91c8a95f1 bigdawg/accumulo \"/usr/bin/supervisord\" 2 hours ago\n", "1aa16f0d88a1 bigdawg/accumulo \"/usr/bin/supervisord\" 2 hours ago\n", "52f26c00d069 bigdawg/accumulo \"/usr/bin/supervisord\" 2 hours ago\n", "b782323ba115 bigdawg/accumulo \"/usr/bin/supervisord\" 2 hours ago\n", "5f3a08ff00c4 bigdawg/scidb \"/bin/sh -c /start_se\" 2 hours ago\n", "a44b212568da bigdawg/postgres \"/bin/sh -c /start_se\" 2 hours ago\n", "c5e6b9d51bc9 bigdawg/postgres \"/bin/sh -c /start_se\" 2 hours ago\n", "94d7cdcdeace bigdawg/postgres \"/bin/sh -c /start_se\" 2 hours ago\n", "```\n", "(continued)\n", "```\n", "STATUS PORTS NAMES\n", "Up 2 hours 0.0.0.0:42424->42424/tcp bigdawg-accumulo-proxy\n", "Up 2 hours 0.0.0.0:9999->9999/tcp, \n", " 0.0.0.0:50095->50095/tcp bigdawg-accumulo-master\n", "Up 2 hours 0.0.0.0:9997->9997/tcp bigdawg-accumulo-tserver0\n", "Up 2 hours 0.0.0.0:2181->2181/tcp bigdawg-accumulo-zookeeper\n", "Up 2 hours 0.0.0.0:1239->1239/tcp bigdawg-scidb-data\n", "Up 2 hours 0.0.0.0:5402->5402/tcp bigdawg-postgres-data2\n", "Up 2 hours 0.0.0.0:5401->5401/tcp bigdawg-postgres-data1\n", "Up 2 hours 0.0.0.0:5400->5400/tcp, \n", " 0.0.0.0:8080->8080/tcp bigdawg-postgres-catalog\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Log into a docker container:\n", "\n", "Docker command:\n", "\n", "```bash\n", "> docker exec -it bigdawg-postgres-catalog bash\n", "```\n", "\n", "Output:\n", "\n", "```\n", "postgres@bigdawg-postgres-catalog:\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Explore the Catalog and view the Postgres tables\n", "\n", "After logging into the `bigdawg-postgres-catalog` container, use `psql` to navigate the `catalog` database view the tables. \n", "\n", "PSQL Command:\n", "\n", "```psql\n", "> psql bigdawg_catalog\n", "```\n", "\n", "Output:\n", "\n", "```\n", "psql (9.4.10)\n", "Type \"help\" for help.\n", "\n", "bigdawg_catalog=# \n", "```\n", "\n", "You are logged in as user `postgres` with password `postgres`.\n", "\n", "## Verify connection info:\n", "\n", "PSQL Command:\n", "\n", "```psql\n", "> \\conninfo\n", "```\n", "\n", "```\n", "bigdawg_catalog-# \\conninfo\n", "You are connected to database \"bigdawg_catalog\" as user \"postgres\" via socket in \"/var/run/postgresql\" at port \"5400\".\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Show list of relations in `catalog` schema:\n", "\n", "The Catalog stores metadata about the database engines and data objects under its control. This metadata is stored in the following tables.\n", "\n", "PSQL Command: \n", "\n", "```psql\n", "> \\dt catalog.\n", "```\n", "\n", "Output:\n", "\n", "```\n", "bigdawg_catalog=# \\dt catalog.\n", " List of relations\n", " Schema | Name | Type | Owner \n", "---------+---------------+-------+----------\n", " catalog | casts | table | postgres\n", " catalog | databases | table | postgres\n", " catalog | engines | table | postgres\n", " catalog | islands | table | postgres\n", " catalog | objects | table | postgres\n", " catalog | scidbbinpaths | table | postgres\n", " catalog | shims | table | postgres\n", "(7 rows)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## View rows of data from a table:\n", "\n", "SQL Command: \n", "\n", "```sql\n", "> SELECT * FROM catalog.engines;\n", "```\n", "\n", "Output:\n", "\n", "**`catalog.engines`**\n", "\n", "```\n", "bigdawg_catalog=# SELECT * FROM catalog.engines;\n", " eid | name | host | port | connection_properties \n", "-----+---------------+--------------------------+------+-----------------------\n", " 0 | postgres0 | bigdawg-postgres-catalog | 5400 | PostgreSQL 9.4.5\n", " 1 | postgres1 | bigdawg-postgres-data1 | 5401 | PostgreSQL 9.4.5\n", " 2 | postgres2 | bigdawg-postgres-data2 | 5402 | PostgreSQL 9.4.5\n", " 3 | scidb_local | bigdawg-scidb-data | 1239 | SciDB 14.12\n", " 4 | saw ZooKeeper | zookeeper.docker.local | 2181 | Accumulo 1.6\n", "(5 rows)\n", "```\n", "\n", "**`catalog.databases`**\n", "\n", "```\n", "bigdawg_catalog=# SELECT * FROM catalog.databases;\n", " dbid | engine_id | name | userid | password \n", "------+-----------+-----------------+----------+----------\n", " 0 | 0 | bigdawg_catalog | postgres | test\n", " 1 | 0 | bigdawg_schemas | postgres | test\n", " 2 | 1 | mimic2 | postgres | test\n", " 3 | 2 | mimic2_copy | postgres | test\n", " 4 | 0 | tpch | postgres | test\n", " 5 | 1 | tpch | postgres | test\n", " 6 | 3 | scidb_local | scidb | scidb123\n", " 7 | 4 | accumulo | bigdawg | bigdawg\n", "(8 rows)\n", "```\n", "\n", "**`catalog.islands`**\n", "\n", "```\n", "bigdawg_catalog=# SELECT * FROM catalog.islands;\n", " iid | scope_name | access_method \n", "-----+------------+---------------\n", " 0 | RELATIONAL | PSQL\n", " 1 | ARRAY | AFL\n", " 2 | TEXT | JSON\n", "(3 rows)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Postgres to Postgres Query Example\n", "\n", "Problem: Data is entered into tables on two separate Postgres engines, and we need to perfrom a join across those tables. \n", "\n", "1. `admissions`: Times when patients were admitted\n", "* `additives`: Medication types, dosage, and administration times\n", "\n", "We need to know when patients were admitted from `admissions` and when their medication was delivered from `additives` so we can see how long they waited until the first medication was administered." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Table 1: `admissions` table\n", "\n", "Contains the admit and discharge times for each patient.\n", "\n", "BigDAWG Syntax:\n", "\n", "```sql\n", "bdrel(SELECT subject_id, admit_dt, disch_dt \n", " FROM mimic2v26.admissions \n", " LIMIT 5);\n", "```\n", "\n", "Output:\n", "\n", "```\n", "> curl -X POST -d \"bdrel(SELECT subject_id, admit_dt, disch_dt FROM mimic2v26.admissions LIMIT 5)\" http://192.168.99.100:8080/bigdawg/query/\n", "\n", "subject_id\tadmit_dt\tdisch_dt\n", "56\t2644-01-17 00:00:00.0\t2644-01-23 00:00:00.0\n", "78\t2778-03-24 00:00:00.0\t2778-03-27 00:00:00.0\n", "3\t2682-09-07 00:00:00.0\t2682-09-18 00:00:00.0\n", "12\t2875-09-26 00:00:00.0\t2875-10-09 00:00:00.0\n", "26\t3079-03-03 00:00:00.0\t3079-03-10 00:00:00.0\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Table 2: `additives` table\n", "\n", "Contains the medications delivered to each patient.\n", "\n", "BigDAWG Syntax:\n", "\n", "```sql\n", "bdrel(SELECT subject_id, charttime, amount, route \n", " FROM mimic2v26.additives \n", " LIMIT 10;)\n", "```\n", "\n", "Output:\n", "\n", "```\n", "> curl -X POST -d \"bdrel(SELECT subject_id, charttime, amount, route FROM mimic2v26.additives LIMIT 10;)\" http://192.168.99.100:8080/bigdawg/query/\n", "\n", "subject_id\tcharttime\tamount\troute\n", "78\t2778-03-24 04:00:00.0\t1.0\tIV Drip\n", "31\t2678-08-22 13:00:00.0\t40.0\tIntravenous Push\n", "31\t2678-08-22 13:00:00.0\t60.0\tIV Drip\n", "31\t2678-08-22 15:00:00.0\t1250.0\tIV Drip\n", "31\t2678-08-23 15:00:00.0\t20.0\tIntravenous Push\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Location Transparency\n", "\n", "The Catalog knows on which engine the tables reside:\n", "\n", "* `admissions` table is on `logical_db==2` and `physical_db==2`\n", "* `additives` table is on `logical_db==2` and `physical_db==3`\n", "\n", "Entry from `catalog.objects`:\n", "\n", "```\n", "oid\tname\t fields logical_db\tphysical_db\n", "3 \"mimic2v26.additives\" \"subject_id,icustay_id,...\" 2 3\n", "4 \"mimic2v26.admissions\" \"hadm_id,subject_id,...\" 2 2\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Postgres to Postgres BigDAWG Query\n", "\n", "Solution: Query the Relational island for data stored in different tables.\n", "\n", "BigDAWG Syntax:\n", "\n", "```sql\n", "bdrel(SELECT mimic2v26.additives.subject_id, \n", " mimic2v26.admissions.admit_dt, \n", " mimic2v26.additives.charttime, \n", " mimic2v26.additives.route \n", " FROM mimic2v26.additives, mimic2v26.admissions \n", " WHERE mimic2v26.additives.subject_id=mimic2v26.admissions.subject_id \n", " LIMIT 15;)\n", "```\n", "\n", "Output:\n", "\n", "```\n", "> curl -X POST -d \"bdrel(SELECT mimic2v26.additives.subject_id, mimic2v26.admissions.admit_dt, mimic2v26.additives.charttime, mimic2v26.additives.route FROM mimic2v26.additives,mimic2v26.admissions WHERE mimic2v26.additives.subject_id=mimic2v26.admissions.subject_id LIMIT 15;)\" http://192.168.99.100:8080/bigdawg/query/\n", "\n", "subject_id admit_dt charttime\t route\n", "3 2682-09-07 00:00:00.0\t2682-09-08 03:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-08 03:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-08 03:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-08 17:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-08 16:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-08 13:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-08 06:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-07 22:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-07 22:00:00.0\tIV Drip\n", "3\t 2682-09-07 00:00:00.0\t2682-09-07 22:00:00.0\tIV Drip\n", "12\t 2875-09-26 00:00:00.0\t2875-09-28 01:00:00.0\tIntravenous Push\n", "12\t 2875-09-26 00:00:00.0\t2875-09-27 04:00:00.0\tIV Drip\n", "12\t 2875-09-26 00:00:00.0\t2875-09-27 09:00:00.0\tIV Drip\n", "21\t 3139-03-19 00:00:00.0\t3138-10-30 16:00:00.0\tIntravenous Push\n", "21\t 3138-10-29 00:00:00.0\t3138-10-30 16:00:00.0\tIntravenous Push\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# SciDB and Postgres Example\n", "\n", "Problem: Waveworm data exists in SciDB and we want to move peak values to Postgres where we can join it later with other records.\n", "\n", "1. SciDB: `myarray` contains timeseries of electrocardiogram (ECG) data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# View SciDB Data\n", "\n", "Log into scidb container and start iquery:\n", "\n", "```\n", "> docker exec -it bigdawg-scidb-data bash\n", "> su scidb\n", "> cd /home/scidb\n", "> iquery\n", "\n", "AQL% \n", "```\n", "\n", "List all arrays:\n", "\n", "```\n", "> AQL% select * from list('arrays');\n", "\n", "{No} name,uaid,aid,schema,availability,temporary\n", "{0} 'myarray',1,1,'myarray [i=0:*,1000000,0]',true,false\n", "{1} 'test_array',4,4,'test_array [i=0:*,100000,0,j=0:*,100000,0]',true,false\n", "{2} 'test_array_flat',3,3,'test_array_flat [i_=0:*,100000,0]',true,false\n", "```\n", "\n", "Select `myarray`:\n", "\n", "```\n", "> AQL% select * from myarray; \n", "\n", "{i} dim1,dim2\n", "{0} 14.1818,0\n", "{1} -81.4545,0\n", "{2} 14.1818,0\n", "{3} -128,0\n", "{4} 14,0\n", "{5} -151.273,0\n", "{6} 13.8182,0\n", "{7} -151.273,0\n", "{8} 13.9091,0\n", "{9} -128,0\n", "{10} 14,0\n", "...\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# SciDB to Postgres Query\n", "\n", "BigDAWG Query Syntax:\n", "\n", "```sql\n", "bdrel(SELECT * FROM # bdrel(): relational island query\n", " bdcast( # bdcast(): cast syntax\n", " bdarray(filter(myarray,dim1>150)), # bdarray(): array island syntax\n", " res, # name of intermediate result\n", " '(i bigint, dim1 real, dim2 real)', # destination schema\n", " relational # destination island type\n", " )\n", ")\n", "```\n", "\n", "Output:\n", "\n", "```\n", "> curl -X POST -d \"bdrel(SELECT * FROM bdcast( bdarray(filter(myarray,dim1>150)), res, '(i bigint, dim1 real, dim2 real)', relational))\" http://192.168.99.100:8080/bigdawg/query/\n", "\n", "i\tdim1\tdim2\n", "37\t151.273\t0.0\n", "47\t151.273\t0.0\n", "115\t174.545\t0.0\n", "159\t151.273\t0.0\n", "165\t174.545\t0.0\n", "183\t174.545\t0.0\n", "191\t151.273\t0.0\n", "193\t174.545\t0.0\n", "201\t174.545\t0.0\n", "223\t151.273\t0.0\n", "225\t174.545\t0.0\n", "283\t151.273\t0.0\n", "...\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Accumulo and Postgres Example\n", "\n", "Problem: Clinical reports are in Postgres and we need to persist some of it into Accumulo to organize it with the rest of the text data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# View the Postgres Data\n", "\n", "BigDAWG Query Syntax:\n", "\n", "```sql\n", "bdrel(\n", " SELECT text \n", " FROM mimic2v26.noteevents \n", " LIMIT 1\n", ")\n", "```\n", "\n", "Output:\n", "\n", "```\n", "> curl -X POST -d \"bdrel(SELECT text FROM mimic2v26.noteevents LIMIT 1)\" http://192.168.99.100:8080/bigdawg/query/\n", "\n", " DATE: [**2976-11-28**] 9:17 PM\n", " CHEST (PORTABLE AP) Clip # [**Clip Number (Radiology) 7295**]\n", " Reason: SOB, decreased BS on R \n", " ______________________________________________________________________________\n", " UNDERLYING MEDICAL CONDITION:\n", " 73 year old man with \n", " REASON FOR THIS EXAMINATION:\n", " SOB, decreased BS on R \n", " ______________________________________________________________________________\n", " FINAL REPORT\n", " HISTORY: 73 year old man with shortness of breath and decreased breath sounds\n", " on the right.\n", " \n", " COMPARISON: [**2974-3-8**].\n", " \n", " PORTABLE CHEST RADIOGRAPH: Again seen is cardiomegaly unchanged. The aorta\n", " is tortuous, unchanged. The pulmonary vascularity is normal in appearance\n", " without failure. The lungs are clear without focal consolidations or\n", " effusions. There is some minimal linear atelectasis at the left base.\n", " \n", " IMPRESSION: No congestive heart failure or focal consolidations.\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Postgres to Accumulo Query\n", "\n", "Query Postgres data for text and cast to accumulo.\n", "\n", "BigDAWG Query Syntax:\n", "\n", "\n", "```sql\n", "bdtext( # bdtext(): text island scope\n", " {'op': 'scan', # accumulo syntax\n", " 'table': 'bdcast( # bdcast() within the accumulo query\n", " bdrel( # bdrel(): relational island scope\n", " SELECT text FROM mimic2v26.noteevents LIMIT 1\n", " ), \n", " res, # intermediate CAST object name\n", " '', # destination island schema\n", " text)' # CAST destination type\n", " }\n", ")\n", "\n", "```\n", "\n", "Output:\n", "\n", "```\n", "> curl -X POST -d \"bdtext({ 'op' : 'scan', 'table' : 'bdcast(bdrel(select text from mimic2v26.noteevents limit 1), res, '', text)'})\" http://192.168.99.100:8080/bigdawg/query/\n", "\n", " DATE: [**2976-11-28**] 9:17 PM\n", " CHEST (PORTABLE AP) Clip # [**Clip Number (Radiology) 7295**]\n", " Reason: SOB, decreased BS on R \n", " ______________________________________________________________________________\n", " UNDERLYING MEDICAL CONDITION:\n", " 73 year old man with \n", " REASON FOR THIS EXAMINATION:\n", " SOB, decreased BS on R \n", " ______________________________________________________________________________\n", " FINAL REPORT\n", " HISTORY: 73 year old man with shortness of breath and decreased breath sounds\n", " on the right.\n", "\n", " COMPARISON: [**2974-3-8**].\n", "\n", " PORTABLE CHEST RADIOGRAPH: Again seen is cardiomegaly unchanged. The aorta\n", " is tortuous, unchanged. The pulmonary vascularity is normal in appearance\n", " without failure. The lungs are clear without focal consolidations or\n", " effusions. There is some minimal linear atelectasis at the left base.\n", "\n", " IMPRESSION: No congestive heart failure or focal consolidations.\n", " \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Admnistrative Interface\n", "\n", "The Administrative Interface is a Python Flask webapp that interacts with Docker and the Postgres database running the Catalog.\n", "\n", "On the \"Cluster Status\" tab you can start and stop components of the Demo Cluster:\n", "\n", "![Status](img/admin_ui_status.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Admnistrative Interface\n", "\n", "The Administrative Interface is a Python Flask webapp that interacts with Docker and the Postgres database running the Catalog.\n", "\n", "On the \"Data Catalog\" tab you can view the Catalog tables:\n", "\n", "![Catalog](img/admin_ui_catalog.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Shutdown the Demo Cluster\n", "\n", "When finished, stop and remove the containers with this script:\n", "\n", "`> ./cleanup_containers.sh`\n", "\n", "Additionaly, if you’re using Docker Toolbox, you can stop the VM running Docker with the following command:\n", "\n", "`> docker-machine stop default`" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }