Skip to content
This repository was archived by the owner on Aug 2, 2022. It is now read-only.

Add doc for ODFE SQL demo #826

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions docs/user/presentations/111620-sql-demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@

## Agenda

Presenter: Chen Dai

Contact info: https://www.linkedin.com/in/chen-dai/

Date: Nov 16, 2020

Topics:

1. A Brief Introduction to SQL and related projects
2. What’re already supported in the SQL engine
3. What’re new improvements in the SQL engine


---
## Introduction

Here is a great blog that covers main features of SQL, Query Workbench, SQL-CLI, JDBC/ODBC driver: https://opendistro.github.io/for-elasticsearch/blog/odfe-updates/2020/06/An-overview-of-the-SQL-Engine-in-Open-Distro-for-Elasticsearch/

Now we can start the demo. Note that you can follow the instructions in Resources (II) Demo Cluster to set up Elasticsearch and Kibana cluster locally.

## Current SQL Engine

Demonstrate the support for basic SQL queries, complex queries and Elasticsearch functions.

```
#Queries
##Basic query
SELECT Carrier, AVG(FlightDelayMin)
FROM kibana_sample_data_flights
WHERE OriginWeather = 'Sunny'
GROUP BY Carrier
HAVING AVG(FlightDelayMin) > 40
ORDER BY Carrier
LIMIT 3

##Complex query
SELECT f1.FlightNum, f2.Carrier
FROM kibana_sample_data_flights f1
JOIN kibana_sample_data_flights f2
ON f1.FlightNum = f2.FlightNum
WHERE f1.OriginWeather = 'Sunny'

#Elasticsearch
##Full text search
SELECT customer_full_name
FROM kibana_sample_data_ecommerce
WHERE MATCH_QUERY(customer_full_name, 'King')

##Nested field query
SELECT * FROM employees LIMIT 10

SELECT p.name, p.started_year
FROM employees e,
e.projects p
WHERE p.name LIKE '%Redshift%'

```

## Improvements on SQL Engine

Architecture changes: https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/dev/Architecture.md

```
#New query planner: explain this to see
SELECT Carrier, AVG(FlightDelayMin)
FROM kibana_sample_data_flights
WHERE OriginWeather = 'Sunny'
GROUP BY Carrier

#Extensibility example: ranking window function
SELECT
Carrier, FlightDelayMin,
RANK() OVER(
PARTITION BY Carrier
ORDER BY FlightDelayMin DESC
) AS rnk
FROM kibana_sample_data_flights
WHERE FlightDelayMin > 0

#New expression system
SELECT
SUBSTRING(Carrier, 1, 2) AS sub,
AVG(ABS(FlightDelayMin * -10))
FROM kibana_sample_data_flights
WHERE OriginWeather = 'Sunny'
GROUP BY sub
```

## Features in Development

1. SQL standard
1. SQL Functions
2. Complex queries: JOINs
2. Elasticsearch
1. More nested field support
2. More metrics and bucket functions

Contribution: https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/developing.rst

---
## Resources

### (I) Documentations

1. SQL reference manual
1. https://opendistro.github.io/for-elasticsearch-docs/docs/sql/
2. https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/user/index.rst
2. Setting for enabling new SQL engine: https://github.com/opendistro-for-elasticsearch/sql/blob/master/docs/user/admin/settings.rst#opendistro-sql-engine-new-enabled

### (II) Demo Cluster Setup

1. Create a Docker compose file as below
2. Start docker container: `docker-compose up`
3. Visit Kibana on http://localhost:5601 (default username and password are both `admin`)
4. Load Kibana sample flights and ecommerce indices and `employees` test index as below
5. Start playing with our SQL plugin


`docker-compose.yml` sample:

```
version: '3'
services:
odfe-node1:
image: amazon/opendistro-for-elasticsearch:1.11.0
container_name: odfe-node1
environment:
- cluster.name=odfe-cluster
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- odfe-net
kibana:
image: amazon/opendistro-for-elasticsearch-kibana:1.11.0
container_name: odfe-kibana
ports:
- 5601:5601
expose:
- "5601"
environment:
ELASTICSEARCH_URL: https://odfe-node1:9200
ELASTICSEARCH_HOSTS: https://odfe-node1:9200
networks:
- odfe-net

networks:
odfe-net:
```

Index mapping for `employees`:

```
PUT employees
{
"mappings": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"projects": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"fielddata": true
},
"started_year": {
"type": "long"
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}

POST employees/_bulk
{"index":{"_id":"1"}}
{"id":3,"name":"Bob Smith","title":null,"projects":[{"name":"AWS Redshift Spectrum querying","started_year":1990},{"name":"AWS Redshift security","started_year":1999},{"name":"AWS Aurora security","started_year":2015}]}
{"index":{"_id":"2"}}
{"id":4,"name":"Susan Smith","title":"Dev Mgr","projects":[]}
{"index":{"_id":"3"}}
{"id":6,"name":"Jane Smith","title":"Software Eng 2","projects":[{"name":"AWS Redshift security","started_year":1998},{"name":"AWS Hello security","started_year":2015,"address":[{"city":"Dallas","state":"TX"}]}]}
```