Applying deep learning approaches to ND280 data

by Igor Trujnara

Introduction

This directory contains the files from the ND280 deep learning project. I did it as a minor project in the BIST Master's of Multidisciplinary Research degree. If you've decided to work on this voluntarily, welcome. If you were forced to do it, good luck.

The aim of the project was to create a robust electron/muon binary classification method using simulated ND280 HA-TPC data from GEANT4.

Below, I will describe the models in the project, the structure of the directory, and some key terms used throughout. The explanations assume basic familiarity with deep learning and Pytorch. In addition to this readme, many files contain a brief explanation of their purpose at the very beginning.

Models

The following models are included in the project:

multilayer perceptron (MLP), built from scratch
convolutional neural network (CNN), based on efficientnet_b4
vision transformer (ViT) from the Huggingface Transformers package

Each model may have variants with 1 and 3 input channels.

Additionally, the following ideas have been tested to some extent:

CNN with data augmentation
CNN for positive particles (proton/pion)

Directory structure

The directory contains the following subdirectories:

data – training data was supposed to be there, but it's not
- You can find the training data in /data/neutrinos/common/casado/T2K/ND280Cont/
models– models were supposed to be here, but are bundled with training code instead
notebooks – this directory contains numerous Jupyter notebooks for data exploration and model testing
- refer to individual notebooks for details on their purpose; each notebook starts with a brief explanation
scripts – this directory contains all scripts submitted to the cluster; the subdirectory structure is roughly:
- tune.py / train.py – the tuning/training script executed
- utils.py – shared utility code
- *.sh – a shell script that prepares the environment and runs the Python script
- *.sub – a Condor submit file for the script
- logs/ – Condor log, output and error files
- out/ – output files created by the script

Glossary

The following relevant terms are used in the files, in no particular order:

Color – this refers to an input variable in the data (qmax, tmax or fwhm), and to the associated input channels in input tensors; the tensors normally have a shape of (b,c,h,w), where c is the color; "color" and "channel" may be used interchangeably
Offset – the data in input files is stored in sparse format, i.e., each row refers to a single non-zero pixel; however, the file is grouped by event ID; dataset objects store a table of offsets, i.e., the row index of the first and last pixel of each event
Tuning – the exploration of hyperparameter space through low-budget training of numerous models with different setups; in this project, it is performed with Optuna
Training – in this project, it usually refers to full, high-budget optimization of a model with a good setup identified through tuning
Eager loading – a type of data loading in which all the data is loaded to RAM at once
Lazy loading – a type of data loading in which small portions of data are read from hard drive to conserve memory; also called data streaming

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.ipynb_checkpoints		.ipynb_checkpoints
models		models
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Applying deep learning approaches to ND280 data

by Igor Trujnara

Introduction

Models

Directory structure

Glossary

About

Uh oh!

Releases

Packages

Languages

itrujnara/nd280-project

Folders and files

Latest commit

History

Repository files navigation

Applying deep learning approaches to ND280 data

by Igor Trujnara

Introduction

Models

Directory structure

Glossary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages