Skip to content

Commit 1e2f11d

Browse files
authored
Merge branch 'master' into doc_remove_kubeflow_pipelines
2 parents 72e6fd9 + c55115f commit 1e2f11d

17 files changed

+477
-44
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ The AAW includes tools that allow data science users to open almost any file. Th
2424
- sqlite
2525
- many others... just ask :-)
2626

27+
### How much does the AAW cost?
2728

2829
#### CPU Only
2930

docs/en/1-Experiments/Kubeflow.md

+49-43
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@ for your team.
5252

5353
## Image
5454

55-
You will need to choose an image. There are JupyterLab, RStudio, and Ubuntu remote
56-
desktop images available. Select the drop down menu to select additional options
57-
within these (for instance, CPU, PyTorch, and TensorFlow images for JupyterLab).
55+
You will need to choose an image. There are JupyterLab, RStudio, Ubuntu remote
56+
desktop, and SAS images available. The SAS image is only available for StatCan
57+
employees (due to license limitations), the others are available for everyone.
58+
Select the drop down menu to select additional options within these (for
59+
instance, CPU, PyTorch, and TensorFlow images for JupyterLab).
5860

5961
Check the name of the images and choose one that matches what you want to do. Don't know
6062
which one to choose? Check out your options [here](./Selecting-an-Image.md).
@@ -63,33 +65,43 @@ which one to choose? Check out your options [here](./Selecting-an-Image.md).
6365

6466
## CPU and Memory
6567

66-
- At the time of writing (December 23, 2021) there are two types of computers in
67-
the cluster
68-
69-
- **CPU:** `D16s v3` (16 CPU cores, 64 GiB memory; for user use 15 CPU cores
70-
and 48 GiB memory are available; 1 CPU core and 16 GiB memory reserved for
71-
system use).
72-
- **GPU:** `NC6s_v3` (6 CPU cores, 112 GiB memory, 1 GPU; for user use 96 GiB
73-
memory are available; 16 GiB memory reserved for system use). The available
74-
GPU is the NVIDIA Tesla V100 GPU with specs
75-
[here](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf).
76-
77-
When creating a notebook server, the system will limit you to the maximum
78-
specifications above. For CPU notebook servers, you can specify the exact
79-
amount of CPU and memory that you require. This allows you to meet your
80-
compute needs while minimising cost. For a GPU notebook server, you will
81-
always get the full server (6 CPU cores, 96 GiB accessible memory, and 1 GPU).
82-
See below section on GPUs for information on how to select a GPU server.
83-
84-
In the future there may be larger machines available, so you may have looser
85-
restrictions.
68+
At the time of writing (December 23, 2021) there are two types of computers in
69+
the cluster
70+
71+
- **CPU:** `D16s v3` (16 CPU cores, 64 GiB memory; for user use 15 CPU cores
72+
and 48 GiB memory are available; 1 CPU core and 16 GiB memory reserved for
73+
system use).
74+
- **GPU:** `NC6s_v3` (6 CPU cores, 112 GiB memory, 1 GPU; for user use 96 GiB
75+
memory are available; 16 GiB memory reserved for system use). The available
76+
GPU is the NVIDIA Tesla V100 GPU with specs
77+
[here](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf).
78+
79+
When creating a notebook server, the system will limit you to the maximum
80+
specifications above. For CPU notebook servers, you can specify the exact
81+
amount of CPU and memory that you require. This allows you to meet your
82+
compute needs while minimising cost. For a GPU notebook server, you will
83+
always get the full server (6 CPU cores, 96 GiB accessible memory, and 1 GPU).
84+
See below section on GPUs for information on how to select a GPU server.
85+
86+
In the advanced options, you can select a higher limit than the number of CPU cores and
87+
RAM requested. The amount requested is the amount guaranteed to be available for your
88+
notebook server and you will always pay for at least this much. If the limit is higher
89+
than the amount requested, if additional RAM and CPU cores are available on that shared
90+
server in the cluster your notebook server can use them as needed. One use case for this
91+
is jobs that usually need only one CPU core but can benefit from multithreading to speed
92+
up certain operations. By requesting one CPU core but a higher limit, you can pay much
93+
less for the notebook server while allowing it to use spare unused CPU cores as needed
94+
to speed up computations.
95+
96+
![Select CPU and RAM](../images/cpu-ram-select.png)
8697

8798
## GPUs
8899

89100
If you want a GPU server, select `1` as the number of GPUs and `NVIDIA` as the GPU
90101
vendor (the create button will be greyed out until the GPU vendor is selected if
91-
you have a GPU specified). Multi-GPU servers are not currently supported on the
92-
AAW system.
102+
you have a GPU specified). Multi-GPU servers are currently supported on the AAW
103+
system only on a special on-request basis, please contact the AAW maintainers if
104+
you would like a multi-GPU server.
93105

94106
![GPU Configuration](../images/kubeflow_gpu_selection.jpg)
95107

@@ -110,10 +122,6 @@ are various configuration options available:
110122

111123
- You can specify the size of the workspace volume, from 4 GiB to 32 GiB.
112124

113-
- You can choose the option to not use persistent storage for home, in which case the
114-
home folder will be deleted as soon as the notebook server is closed. Otherwise the
115-
home folder will remain and can be used again for a new notebook server in the future.
116-
117125
![Create a Workspace Volume](../images/workspace-volume.PNG)
118126

119127
<!-- prettier-ignore -->
@@ -124,17 +132,23 @@ are various configuration options available:
124132
## Data Volumes
125133

126134
You can also create data volumes that can be used to store additional data. Multiple
127-
data volumes can be created. Click the add volume button to create a new volume and specify
128-
its configuration. There are the following configuration parameters as for data volumes:
129-
130-
- **Type**: Create a new volume or use an existing volume.
135+
data volumes can be created. Click the add new volume button to create a new volume and
136+
specify its configuration. Click the attach existing volume button to mount an existing
137+
data volume to the notebook server. There are the following configuration parameters for
138+
data volumes:
131139

132140
- **Name**: Name of the volume.
133141

134142
- **Size in GiB**: From 4 GiB to 512 GiB.
135143

136-
- **Mount Point**: Path where the data volume can be accessed on the notebook server, by
137-
default `/home/jovyan/<volume name>`.
144+
- **Mount path**: Path where the data volume can be accessed on the notebook server, by
145+
default `/home/jovyan/vol-1`, `/home/jovyan/vol-2`, etc. (incrementing counter per data
146+
volume mounted).
147+
148+
When mounting an existing data volume, the name option becomes a drop-down list of the
149+
existing data volumes. Only a volume not currently mounted to an existing notebook server
150+
can be used. The mount path option remains user-configurable with the same defaults as
151+
creating a new volume.
138152

139153
The garbage can icon on the right can be used to delete an existing or accidentally created
140154
data volume.
@@ -152,14 +166,6 @@ There are currently three checkbox options available here:
152166
access to any Protected B resources. Protected B notebook servers run with many
153167
security restrictions and have access to separate MinIO instances specifically
154168
designed for Protected B data.
155-
- **Allow access to Kubeflow Pipelines**: This will allow the notebook server to
156-
create and manage Kubeflow pipelines. Enable this if you want to use Kubeflow
157-
pipelines.
158-
159-
## Affinity / Tolerations
160-
161-
<!-- prettier-ignore -->
162-
!!! note "This section needs to be filled in."
163169

164170
## Miscellaneous Settings
165171

Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Overview
22

3+
<!-- prettier-ignore -->
4+
!!! warning "Kubeflow pipelines are in the process of being removed from AAW."
5+
No new development should use Kubeflow pipelines. If you have questions
6+
about this removal, please speak with the AAW maintainers.
7+
38
[Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/overview/) are in the process of being removed from AAW.
49

510
They will be replaced by [Argo Workflows](https://argoproj.github.io/argo-workflows/), which will be implemented on AAW soon.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# Geospatial Analytical Environment (GAE) - Cross Platform Access
2+
3+
<!-- prettier-ignore -->
4+
??? danger "Unprotected data only; SSI coming soon:"
5+
At this time, our Geospatial server can only host and provide access to non-sensitive statistical information.
6+
7+
## Getting Started
8+
9+
<!-- prettier-ignore -->
10+
??? success "Prerequisites"
11+
1. An onboarded project with access to DAS GAE ArcGIS Portal
12+
2. An ArcGIS Portal Client Id (API Key)
13+
14+
The ArcGIS Enterprise Portal can be accessed in either the AAW or CAE using the API, from any service which leverages the Python programming language.
15+
16+
For example, in AAW and the use of [Jupyter Notebooks](https://statcan.github.io/daaas/en/1-Experiments/Jupyter/) within the space, or in CAE the use of [Databricks](https://statcan.github.io/cae-eac/en/DataBricks/), DataFactory, etc.
17+
18+
[The DAS GAE ArcGIS Enterprise Portal can be accessed directly here](https://geoanalytics.cloud.statcan.ca/portal)
19+
20+
[For help with self-registering as a DAS Geospatial Portal user](https://statcan.github.io/daaas-dads-geo/english/portal/)
21+
22+
<hr>
23+
24+
## Using the ArcGIS API for Python
25+
26+
### Connecting to ArcGIS Enterprise Portal using ArcGIS API
27+
28+
1. Install packages:
29+
30+
```python
31+
conda install -c esri arcgis
32+
```
33+
34+
or using Artifactory
35+
36+
```python3333
37+
conda install -c https://jfrog.aaw.cloud.statcan.ca/artifactory/api/conda/esri-remote arcgis
38+
```
39+
40+
2. Import the necessary libraries that you will need in the Notebook.
41+
```python
42+
from arcgis.gis import GIS
43+
from arcgis.gis import Item
44+
```
45+
46+
3. Access the Portal
47+
Your project group will be provided with a Client ID upon onboarding. Paste the Client ID in between the quotations ```client_id='######'```.
48+
49+
```python
50+
gis = GIS("https://geoanalytics.cloud.statcan.ca/portal", client_id=' ')
51+
print("Successfully logged in as: " + gis.properties.user.username)
52+
```
53+
54+
4. - The output will redirect you to a login Portal.
55+
- Use the StatCan Azure Login option, and your Cloud ID
56+
- After successful login, you will receive a code to sign in using SAML.
57+
- Paste this code into the output.
58+
59+
60+
![OAuth2 Approval](../images/OAuth2Key.png)
61+
62+
<hr>
63+
64+
### Display user information
65+
Using the 'me' function, we can display various information about the user logged in.
66+
```python
67+
me = gis.users.me
68+
username = me.username
69+
description = me.description
70+
display(me)
71+
```
72+
73+
<hr>
74+
75+
### Search for Content
76+
Search for the content you have hosted on the DAaaS Geo Portal. Using the 'me' function we can search for all of the hosted content on the account. There are multiple ways to search for content. Two different methods are outlined below.
77+
78+
**Search all of your hosted itmes in the DAaaS Geo Portal.**
79+
```python
80+
my_content = me.items()
81+
my_content
82+
```
83+
**Search for specific content you own in the DAaaS Geo Portal.**
84+
85+
This is similar to the example above, however if you know the title of they layer you want to use, you can save it as a function.
86+
```python
87+
my_items = me.items()
88+
for items in my_items:
89+
print(items.title, " | ", items.type)
90+
if items.title == "Flood in Sorel-Tracy":
91+
flood_item = items
92+
93+
else:
94+
continue
95+
print(flood_item)
96+
```
97+
98+
**Search all content you have access to, not just your own.**
99+
100+
```python
101+
flood_item = gis.content.search("tags: flood", item_type ="Feature Service")
102+
flood_item
103+
```
104+
105+
<hr>
106+
107+
### Get Content
108+
We need to get the item from the DAaaS Geo Portal in order to use it in the Jupyter Notebook. This is done by providing the unique identification number of the item you want to use. Three examples are outlined below, all accessing the identical layer.
109+
```python
110+
item1 = gis.content.get(my_content[5].id) #from searching your content above
111+
display(item1)
112+
113+
item2 = gis.content.get(flood_item.id) #from example above -searching for specific content
114+
display(item2)
115+
116+
item3 = gis.content.get('edebfe03764b497f90cda5f0bfe727e2') #the actual content id number
117+
display(item3)
118+
```
119+
120+
<hr>
121+
122+
### Perform Analysis
123+
Once the layers are brought into the Jupyter notebook, we are able to perform similar types of analysis you would expect to find in a GIS software such as ArcGIS. There are many modules containing many sub-modules of which can perform multiple types of analyses.
124+
<br/>
125+
126+
Using the arcgis.features module, import the use_proximity submodule ```from arcgis.features import use_proximity```. This submodule allows us to '.create_buffers' - areas of equal distance from features. Here, we specify the layer we want to use, distance, units, and output name (you may also specify other characteristics such as field, ring type, end type, and others). By specifying an output name, after running the buffer command, a new layer will be automatically uploaded into the DAaaS GEO Portal containing the new feature you just created.
127+
<br/>
128+
129+
```python
130+
buffer_lyr = use_proximity.create_buffers(item1, distances=[1],
131+
units = "Kilometers",
132+
output_name='item1_buffer')
133+
134+
display(item1_buffer)
135+
```
136+
137+
Some users prefer to work with Open-Source packages. Translating from ArcGIS to Spatial Dataframes is simple.
138+
```python
139+
# create a Spatially Enabled DataFrame object
140+
sdf = pd.DataFrame.spatial.from_layer(feature_layer)
141+
```
142+
143+
<hr>
144+
145+
### Update Items
146+
By getting the item as we did similar to the example above, we can use the '.update' function to update exisiting item within the DAaaS GEO Portal. We can update item properties, data, thumbnails, and metadata.
147+
```python
148+
item1_buffer = gis.content.get('c60c7e57bdb846dnbd7c8226c80414d2')
149+
item1_buffer.update(item_properties={'title': 'Enter Title'
150+
'tags': 'tag1, tag2, tag3, tag4',
151+
'description': 'Enter description of item'}
152+
```
153+
154+
<hr>
155+
156+
### Visualize Your Data on an Interactive Map
157+
158+
**Example: MatplotLib Library**
159+
In the code below, we create an ax object, which is a map style plot. We then plot our data ('Population Change') change column on the axes
160+
```python
161+
import matplotlib.pyplot as plt
162+
ax = sdf.boundary.plot(figsize=(10, 5))
163+
shape.plot(ax=ax, column='Population Change', legend=True)
164+
plt.show()
165+
```
166+
167+
**Example: ipyleaflet Library**
168+
In this example we will use the library 'ipyleaflet' to create an interactive map. This map will be centered around Toronto, ON. The data being used will be outlined below.
169+
Begin by pasting ```conda install -c conda-forge ipyleaflet``` allowing you to install ipyleaflet libraries in the Python environment.
170+
<br/>
171+
Import the necessary libraries.
172+
```python
173+
import ipyleaflet
174+
from ipyleaflet import *
175+
```
176+
Now that we have imported the ipyleaflet module, we can create a simple map by specifying the latitude and longitude of the location we want, zoom level, and basemap [(more basemaps)](https://ipyleaflet.readthedocs.io/en/latest/map_and_basemaps/basemaps.html). Extra controls have been added such as layers and scale.
177+
```python
178+
toronto_map = Map(center=[43.69, -79.35], zoom=11, basemap=basemaps.Esri.WorldStreetMap)
179+
180+
toronto_map.add_control(LayersControl(position='topright'))
181+
toronto_map.add_control(ScaleControl(position='bottomleft'))
182+
toronto_map
183+
```
184+
<br/>
185+
186+
##Learn More about the ArcGIS API for Python
187+
[Full documentation for the ArGIS API can be located here](https://developers.arcgis.com/python/)
188+
189+
##Learn More about DAS Geospatial Analytical Environment (GAE) and Services
190+
[GAE Help Guide](https://statcan.github.io/daaas-dads-geo/)

docs/en/5-Storage/MinIO.md

+3
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ S3 storage). Buckets are good at three things:
1616

1717
## MinIO Mounted Folders on a Notebook Server
1818

19+
<!-- prettier-ignore -->
1920
!!! warning "MinIO mounts are not currently working on Protected B servers."
2021

2122
Your MinIO storage are mounted as directories if you select the option
@@ -104,6 +105,7 @@ This lets you browse, upload/download, delete, or share files.
104105

105106
## Browse Datasets
106107

108+
<!-- prettier-ignore -->
107109
!!! warning "The link below is not currently working."
108110

109111
Browse some [datasets](https://datasets.covid.cloud.statcan.ca) here. These data
@@ -218,6 +220,7 @@ send to a collaborator!
218220

219221
## Get MinIO Credentials
220222

223+
<!-- prettier-ignore -->
221224
!!! warning "The methods below have not been tested recently, since certain MinIO changes. These may require adjustment."
222225

223226
<!-- prettier-ignore -->

0 commit comments

Comments
 (0)