Skip to content

Commit 60e2497

Browse files
Server documentation #63 (#66)
Added server setup documentation, notes on optimization and updated requirements.txt --------- Co-authored-by: etiennegaucher <[email protected]>
1 parent 7c0b10b commit 60e2497

File tree

2 files changed

+299
-3
lines changed

2 files changed

+299
-3
lines changed

README.md

+298-1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ Requirements
2323

2424
Development setup
2525
-------------------
26+
This is for setting up and running learn pathology locally on your computer
27+
for development purposes. For deployment of the system on a web server, see the
28+
deployment instructions below.
29+
2630
1. Clone project
2731
```bash
2832
git clone https://github.com/AICAN-Research/learn-pathology.git
@@ -59,5 +63,298 @@ Open your web browser at http://localhost:8000
5963
- Go to admin page
6064
- Press slide, and add some slides to the database
6165

62-
Deployment
66+
Deployment & Server optimization
6367
-------------------
68+
69+
This guide is for Ubuntu Linux.
70+
To setup Learn Pathology for deployment on a server use apache2 and mod_wsgi.
71+
72+
**1. First install packages**
73+
```bash
74+
sudo apt install python3-pip apache2 libapache2-mod-wsgi-py3
75+
```
76+
77+
Also make sure you have all [requirements for FAST installed](https://fast.eriksmistad.no/install-ubuntu-linux.html):
78+
```bash
79+
sudo apt install libgl1 libopengl0 libopenslide0 libusb-1.0-0 libxcb-xinerama0
80+
```
81+
You also need OpenCL. To install OpenCL on Linux, download an implementation depending on the CPU/GPU you have.
82+
83+
84+
**2. Then clone the repo on the server** for instance to /var/www/
85+
```bash
86+
cd /var/www/
87+
git clone https://github.com/AICAN-Research/learn-pathology
88+
```
89+
90+
**3. Setup virtual environment on the server**
91+
```bash
92+
cd learn-pathology
93+
virtualenv -ppython3 environment
94+
source environment/bin/activate
95+
```
96+
97+
**4. Install requirements**
98+
```bash
99+
pip install --upgrade pip # Make sure pip is up to date first
100+
pip install -r requirements.txt
101+
```
102+
103+
**5. Create a secret key and configure settings**
104+
Generate a secret key and add it to settings.py
105+
```bash
106+
python manage.py shell -c 'from django.core.management import utils; print(utils.get_random_secret_key())'
107+
```
108+
Edit the file learnpathology/settings.py.
109+
Uncomment and set the SECRET_KEY to the output of the python command above.
110+
**Remember to keep this key secret, and do not push it to git/github.** You may change it, even when the system is in use, but note that users may be logged out.
111+
See here for more info: https://medium.com/@bayraktar.eralp/changing-rotating-django-secret-key-without-logging-users-out-804a29d3ea65
112+
113+
Disable debug mode by setting `debug = False`.
114+
For security reasons this should always be off on a production server, only turn it on if you need to actually debug.
115+
116+
Add your domain to ALLOWED_HOST, for example like so:
117+
```ALLOWED_HOSTS = ['learnpathology.no', 'www.learnpathology.no']```
118+
119+
120+
**6. Initialize database**
121+
```bash
122+
./manage.py makemigrations
123+
./manage.py migrate
124+
```
125+
126+
**7. Create super user**
127+
```bash
128+
./manage.py createsuperuser
129+
```
130+
131+
132+
**8. Collect static files**
133+
```bash
134+
./manage.py collectstatic
135+
```
136+
137+
**9. Fix user permissions**
138+
Apache needs write access to the database.
139+
Apache runs on the user wwww-data thus give this user write
140+
access to the root folder and the database file db.sqlite3
141+
```bash
142+
cd ..
143+
sudo chown :www-data learn-pathology
144+
sudo chmod g+w learn-pathology
145+
cd learn-pathology
146+
sudo chown www-data db.sqlite3
147+
sudo chmod g+w db.sqlite3
148+
```
149+
150+
**10. Create an apache config**
151+
```bash
152+
sudo nano /etc/apache2/sites-available/learnpathology.conf
153+
```
154+
You should always use HTTPS and SSL encryption.
155+
If you are not using HTTPS, you are essentially transferring everything, login password, images, on the webpage totally unencrypted over the internet!
156+
To use HTTPS/SSL encryption you need an SSL certificate, you can buy one cheap from services like [namecheap.com](https://www.namecheap.com) or free from [Let's encrypt](https://letsencrypt.org/).
157+
Store the certificate, the key, and the CA certificate files on the server, e.g. in folder /var/www/learn-pathology/ssl/.
158+
The config with SSL/HTTPS end-to-end-encryption will then look something like this:
159+
```
160+
# Redirect to secure site
161+
<VirtualHost *:80>
162+
ServerName learnpathology.no
163+
ServerAdmin [email protected]
164+
Redirect permanent / https://learnpathology.no
165+
</VirtualHost>
166+
167+
<VirtualHost *:443>
168+
# Common stuff
169+
ServerName learnpathology.no
170+
ServerAdmin [email protected]
171+
DocumentRoot /var/www/learn-pathology/
172+
173+
# SSL stuff
174+
SSLEngine on
175+
SSLCertificateFile "/var/www/learn-pathology/ssl/certificate.crt"
176+
SSLCertificateKeyFile "/var/www/learn-pathology/ssl/certificate.key"
177+
SSLCACertificateFile "/var/www/learn-pathology/ssl/certificate.ca.crt"
178+
179+
Alias /static /var/www/learn-pathology/static
180+
<Directory /var/www/learn-pathology/static>
181+
Require all granted
182+
</Directory>
183+
184+
<Directory /var/www/learn-pathology/learnpathology>
185+
<Files wsgi.py>
186+
Require all granted
187+
</Files>
188+
</Directory>
189+
190+
# Modify this to fit your python version:
191+
WSGIDaemonProcess learnpathologywsgi python-path=/var/www/learn-pathology/:/var/www/learn-pathology/environment/lib/python3.10/site-packages processes=32 threads=32
192+
# This setting is needed for FAST to run properly:
193+
WSGIApplicationGroup %{GLOBAL}
194+
WSGIProcessGroup learnpathologywsgi
195+
WSGIScriptAlias / /var/www/learn-pathology/learnpathology/wsgi.py
196+
197+
ErrorLog ${APACHE_LOG_DIR}/learnpathology.error.log
198+
CustomLog ${APACHE_LOG_DIR}/learnpathology.access.log combined
199+
</VirtualHost>
200+
```
201+
202+
**11. Enable website and test**
203+
```bash
204+
sudo a2ensite learnpathology
205+
sudo systemctl reload apache2
206+
```
207+
Open your browser and check that the webpage works.
208+
A usual cause of error on the apache2 server, when enabling the website, is a syntax error in the configuration file. In such situations, you can use the command line `apache2ctl configtest` to debug the file.
209+
210+
**12. Server Optimizations**
211+
212+
**Use mpm_worker to enable multi-processing and multi-threading**
213+
214+
By default, apache will use mpm_prefork to handle multiple requests, which do not use any multi-threading.
215+
If you plan to serve hundreds of simultaneous users, you should consider using mpm_worker instead.
216+
217+
To do so, first enable mpm_worker:
218+
```bash
219+
sudo a2dismod mpm_prefork
220+
sudo a2enmod mpm_worker
221+
sudo service apache2 restart
222+
```
223+
Then open the mpm_worker config: `sudo nano /etc/apache2/mods-enabled/mpm_worker.conf`
224+
225+
The following config has worked well for us:
226+
```
227+
<IfModule mpm_worker_module>
228+
ServerLimit 32
229+
StartServers 16
230+
MinSpareThreads 50
231+
MaxSpareThreads 100
232+
ThreadLimit 64
233+
ThreadsPerChild 50
234+
MaxRequestWorkers 512
235+
MaxConnectionsPerChild 0
236+
</IfModule>
237+
```
238+
Restart apache after changes:
239+
```bash
240+
sudo service apache2 restart
241+
```
242+
243+
**Enable turbojpeg**
244+
245+
Requested image tiles have to be compressed with JPEG before they are sent to the users.
246+
By default, PIL is used for compression which is slow. Turbo JPEG is a faster option.
247+
248+
To use TurboJPEG, first install it:
249+
```bash
250+
sudo apt install libturbojpeg
251+
```
252+
and install it in your python environment:
253+
```bash
254+
pip install PyTurboJPEG==1.7.*
255+
```
256+
then enable it in learnpathology/settings.py:
257+
```python
258+
USE_TURBOJPEG = True
259+
```
260+
261+
Reload apache after changes:
262+
```bash
263+
sudo service apache2 reload
264+
```
265+
266+
**Cache tiles in memory using memcached**
267+
268+
In an educational setting, you might have several hundred students which will
269+
access the same few images at the same time during class.
270+
Since reading images from the harddrive is one of the slowest operations, you
271+
can considerably improve performance by having the images in memory instead.
272+
In this case, all students, except the first one, accessing an image, will read
273+
images directly from memory instead of the slow harddrive.
274+
275+
We recommend using memcached for this purpose, you can install it like this:
276+
```bash
277+
sudo apt update
278+
sudo apt install memcached libmemcached-tools
279+
sudo service memcached start
280+
```
281+
282+
Modify memcached config to allow it to use a lot of memory and large enough objects to be stored:
283+
`nano /etc/memcached.conf`:
284+
```
285+
# Depending how much RAM you have, set the limit of you much
286+
# memory memcached should be allowed to use, here we have set it
287+
# to 100 GB = 100*1024 = 102400 MB:
288+
-m 102400
289+
# Allow file sizes up to 3 MB
290+
-I 3m
291+
```
292+
293+
Restart after modifying the config:
294+
```bash
295+
sudo service memcached restart
296+
```
297+
298+
Install the pymemcache binding in your python environment:
299+
```bash
300+
pip install pymemcache==4.*
301+
```
302+
303+
Enable the tile caching in learnpathology/settings.py:
304+
```python
305+
USE_TILE_CACHE = True
306+
```
307+
308+
Reload apache after modifying settings:
309+
```bash
310+
sudo service apache2 reload
311+
```
312+
313+
You can check if memcached is working by looking at its statistics (how many items are stored, number of cache hits/misses etc.):
314+
```bash
315+
memcstat --servers="127.0.0.1"
316+
```
317+
318+
The images are stored for 30 minutes in memory as defined in slide/views.py:
319+
```python
320+
@cache_page(60 * 30)
321+
def tile(request, slide_id, osd_level, x, y):
322+
...
323+
```
324+
325+
Learn Pathology is set to store a maximum of 100 000 images, here are the settings we have used (defined in learnpathology/settings.py):
326+
```python
327+
if USE_TILE_CACHE:
328+
CACHES = {
329+
'default': {
330+
'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
331+
'LOCATION': '127.0.0.1:11211',
332+
'MAX_ENTRIES': 100000,
333+
'OPTIONS': {
334+
'no_delay': True,
335+
'ignore_exc': True,
336+
'max_pool_size': 4,
337+
'use_pooling': True,
338+
}
339+
}
340+
}
341+
```
342+
343+
**13. Setting up FEIDE login**
344+
345+
To enable users to login using FEIDE/dataporten make sure you have got your client ID and secret key from your
346+
local FEIDE administration.
347+
348+
Then enable FEIDE login in learnpathology/settings.py:
349+
```python
350+
USE_FEIDE_LOGIN = True
351+
```
352+
353+
Reload apache after modifying settings:
354+
```bash
355+
sudo service apache2 reload
356+
```
357+
358+
Then go the admin page using your browser and create a new item under "Social applications"
359+
and set the provider to *Dataporten* and enter your client id and secret key, also remember to add a site and select it
360+
with your domain.

requirements.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
django==3.2.*
2-
pyfast==4.5.0
2+
pyfast==4.9.*
33
pillow==8.*
44
pyyaml==6.*
5-
65
numpy~=1.23.5
76
django-ckeditor~=6.5.1
87
django-allauth==0.59.*

0 commit comments

Comments
 (0)