Skip to content

Commit cfdf916

Browse files
authored
Merge pull request #3 from danielskatz/patch-1
minor changes to paper
2 parents c19f65c + 5363de0 commit cfdf916

File tree

1 file changed

+19
-19
lines changed

1 file changed

+19
-19
lines changed

joss/paper.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -70,14 +70,14 @@ Slurm [@slurm] is an open-source cluster management and job
7070
scheduling system for Linux-based compute clusters and is widely used
7171
for High Performance Computing (HPC). It offers command line tools to
7272
export and analyze cluster use and various applications have been
73-
developed to monitor the current state of the cluster (e.g. live
74-
dashboards using Grafana [@grafanadb]). A feature rich tool for the
75-
analysis of cluster performance is Open XDMoD [@xdmod] that supports
76-
various schedulers and metrics. Open XDMoD uses 3^rd^ party software
73+
developed to monitor the current state of the cluster (e.g., live
74+
dashboards using Grafana [@grafanadb]). A feature-rich tool for the
75+
analysis of cluster performance is Open XDMoD [@xdmod], which supports
76+
various schedulers and metrics. Open XDMoD uses 3rd-party software
7777
libraries that are not free for commercial use. Open OnDemand
7878
[@Hudak2018] allows users to access a HPC cluster using a web portal,
7979
it provides various apps to facilitate HPC usage and can integrate the
80-
Open XDMoD for usage statistics. Both, Open XDMoD and Open OnDemand
80+
Open XDMoD for usage statistics. Both Open XDMoD and Open OnDemand
8181
require continuous support and extensive configurations and therefore,
8282
intuitive, responsive, easy-to-install and easy-to-use applications that
8383
enable HPC administrators and managers to analyze and visualize cluster
@@ -99,15 +99,15 @@ using docker containers and a docker-compose implementation.
9999
## SCAS Dashboard overview
100100

101101
The SCAS dashboard architecture consists of a nginx web server as a
102-
router (reverse proxy), a frontend based on R-Shiny [@shiny; @R;
103-
@shinydashboard], a backend based on Python using the Django REST
104-
framework to provide an API, and a PostgreSQL database as backend (see
105-
\autoref{fig:fig1}). The dashboard is intended for the HPC stakeholders
106-
and therefore includes secure user authentication. The frontend is a
102+
router (reverse proxy), a front end based on R-Shiny [@shiny; @R;
103+
@shinydashboard], a back end based on Python using the Django REST
104+
framework to provide an API, and a PostgreSQL database as back end (see
105+
\autoref{fig:fig1}). The dashboard is intended for HPC stakeholders
106+
and therefore includes secure user authentication. The front end is a
107107
user-friendly interface for filtering and visualizing the Slurm data.
108-
The backend provides an admin interface via Python Django Admin and a
109-
web API that is used by both the frontend and a script for uploading new
110-
data. Additionally, the backend creates a daily index of the data,
108+
The back end provides an admin interface via Python Django Admin and a
109+
web API that is used by both the front end and a script for uploading new
110+
data. Additionally, the back end creates a daily index of the data,
111111
enabling the software to maintain a low memory footprint while being
112112
fast and responsive. Furthermore, a presentation can be generated
113113
automatically and viewed by various stakeholders, including the cluster
@@ -128,12 +128,12 @@ width=100% }
128128
Completed compute jobs and available node configurations are submitted
129129
to the SCAS-backend API with a script that utilizes the Slurm's *sacct*
130130
tool. This script can be run as a daily or weekly *cron* job on a job
131-
submission node. The backend then generates the daily statistics that
131+
submission node. The back end then generates the daily statistics that
132132
are stored in the database. This preprocessed indexed data enables the
133133
app to have a low memory footprint and high responsiveness, as no
134134
calculations are required when the data is fetched from the API. Upon
135-
filtering a date range in the frontend, a request is sent to the backend
136-
which retrieves the data for the selected days and aggregates the
135+
filtering a date range in the front end, a request is sent to the back end
136+
that retrieves the data for the selected days and aggregates the
137137
statistics to generate the visualizations.
138138

139139
## Frontend -- dashboard user interface
@@ -179,15 +179,15 @@ pending times) for GPU servers, with 16 GPUs, over a time frame of 1
179179
year. As shown in \hyperref[fig:fig2]{Figure 2b,c}, the increase in
180180
the number of GPU jobs and CPU hours for the GPU partition is visible
181181
and confirms the assumption. By inspecting the pending times per day
182-
(\hyperref[fig:fig2]{Figure 2d}) there is a general, unbiased
182+
(\hyperref[fig:fig2]{Figure 2d}), there is a general, unbiased
183183
increase of the pending times visible for the last few months. From
184184
\hyperref[fig:fig2]{Figure 2e} we can then see an increase of the
185185
pending times for the GPU partition for the previous 6 months.
186186
\hyperref[fig:fig2]{Figure 2f} shows that the increase of the pending
187187
times is only seen for servers with >10 GPUs, and the utilization of
188188
the nodes with 16 GPUs has increased while those with 2 and 4 GPUs were
189189
stable (\hyperref[fig:fig2]{Figure 2g}). This analysis can be used to
190-
draw concrete conclusions. In this case to either inform the users that
190+
draw concrete conclusions, in this case, to either inform the users that
191191
resources are available if up to 4 GPUs are requested, or to make the
192192
decision to invest in new GPU servers to achieve shorter pending times
193193
and higher throughput.
@@ -209,7 +209,7 @@ plot showing the utilization of nodes with different numbers of GPUs.
209209
The SCAS dashboard enables rapid and responsive analysis of Slurm-based
210210
cluster usage. This allows stakeholders: I) to identify current
211211
bottlenecks of CPU and GPU utilization, II) to make informed decisions
212-
to adapt SLURM parameters in the short term and III) to support
212+
to adapt SLURM parameters in the short term, and III) to support
213213
strategic decisions, all based on user needs. The SCAS dashboard, code,
214214
and the documentation are hosted on a publicly available GitHub
215215
repository (<https://github.com/Bioinformatics-Munich/scas_dashboard>).

0 commit comments

Comments
 (0)