Skip to content

Commit feed11f

Browse files
chore: Add minor clarifications regarding speed measurements
1 parent bd30eaa commit feed11f

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

README.md

+11-8
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
### Overview
1818

19-
This is a Keras implementation of the SSD model architecture introduced by Wei Liu et al. in the paper [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325).
19+
This is a Keras port of the SSD model architecture introduced by Wei Liu et al. in the paper [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325).
2020

2121
Ports of the trained weights of all the original models are provided below. This implementation is accurate, meaning that both the ported weights and models trained from scratch produce the same mAP values as the respective models of the original Caffe implementation (see performance section below).
2222

@@ -33,11 +33,12 @@ If you would like to build an SSD with your own base network architecture, you c
3333

3434
### Performance
3535

36-
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2012 `test`) or the official Pascal VOC Matlab evaluation script (for 2007 `test`). In all cases the results are either identical to those of the original Caffe models or surpass them slightly. Download links to all ported weights are available further below.
36+
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2012 `test`) or the official Pascal VOC Matlab evaluation script (for 2007 `test`). In all cases the results either match (or slightly surpass) those of the original Caffe models. Download links to all ported weights are available further below.
3737

3838
<table width="70%">
3939
<tr>
40-
<td colspan=4 align=center>Mean Average Precision</td>
40+
<td></td>
41+
<td colspan=3 align=center>Mean Average Precision</td>
4142
</tr>
4243
<tr>
4344
<td>evaluated on</td>
@@ -68,7 +69,8 @@ Training an SSD300 from scratch to convergence on Pascal VOC 2007 `trainval` and
6869

6970
<table width="95%">
7071
<tr>
71-
<td colspan=4 align=center>Mean Average Precision</td>
72+
<td></td>
73+
<td colspan=3 align=center>Mean Average Precision</td>
7274
</tr>
7375
<tr>
7476
<td></td>
@@ -84,11 +86,12 @@ Training an SSD300 from scratch to convergence on Pascal VOC 2007 `trainval` and
8486
</tr>
8587
</table>
8688

87-
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1070 mobile (i.e. the laptop version). There are two things to note here. First, note that the benchmark prediction speeds of the original Caffe implementation were achieved using a TitanX GPU. Second, the paper says they measured the prediction speed at batch size 8, which I think isn't a meaningful way of measuring the speed. The whole point of measuring the speed of a detection model is to know how many individual sequential images the model can process per second, therefore measuring the prediction speed on batches and then deducing the time spent on each individual image in the batch defeats the whole purpose. For the sake of comparability, below you find the predictions speed for the original Caffe SSD implementation and the prediction speed for this implementation under the same conditions, i.e. at batch size 8. In addition you find the prediction speed for this implementation at batch size 1, which in my opinion is the more meaningful number.
89+
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1070 mobile (i.e. the laptop version) and cuDNN v6. There are two things to note here. First, note that the benchmark prediction speeds of the original Caffe implementation were achieved using a TitanX GPU and cuDNN v4. Second, the paper says they measured the prediction speed at batch size 8, which I think isn't a meaningful way of measuring the speed. The whole point of measuring the speed of a detection model is to know how many individual sequential images the model can process per second, therefore measuring the prediction speed on batches of images and then deducing the time spent on each individual image in the batch defeats the purpose. For the sake of comparability, below you find the prediction speed for the original Caffe SSD implementation and the prediction speed for this implementation under the same conditions, i.e. at batch size 8. In addition you find the prediction speed for this implementation at batch size 1, which in my opinion is the more meaningful number.
8890

8991
<table width>
9092
<tr>
91-
<td colspan=4 align=center>Frames per Second</td>
93+
<td></td>
94+
<td colspan=3 align=center>Frames per Second</td>
9295
</tr>
9396
<tr>
9497
<td></td>
@@ -258,6 +261,6 @@ Currently in the works:
258261

259262
### Terminology
260263

261-
* "Anchor boxes": The paper calls them "default boxes", in the original C++ code they are called "prior boxes" or "priors", and the Faster R-CNN paper calls them "anchor boxes". All terms mean the same thing, but I slightly prefer the name "anchor boxes" because I find it to be the most descriptive of these names. I call them "prior boxes" or "priors" in `keras_ssd300.py` to stay consistent with the original Caffe implementation, but everywhere else I use the name "anchor boxes" or "anchors".
262-
* "Labels": For the purpose of this project, datasets consist of "images" and "labels". Everything that belongs to the annotations of a given image is the "labels" of that image: Not just object category labels, but also bounding box coordinates. I also use the terms "labels" and "targets" more or less interchangeably throughout the documentation, although "targets" means labels specifically in the context of training.
264+
* "Anchor boxes": The paper calls them "default boxes", in the original C++ code they are called "prior boxes" or "priors", and the Faster R-CNN paper calls them "anchor boxes". All terms mean the same thing, but I slightly prefer the name "anchor boxes" because I find it to be the most descriptive of these names. I call them "prior boxes" or "priors" in `keras_ssd300.py` and `keras_ssd512.py` to stay consistent with the original Caffe implementation, but everywhere else I use the name "anchor boxes" or "anchors".
265+
* "Labels": For the purpose of this project, datasets consist of "images" and "labels". Everything that belongs to the annotations of a given image is the "labels" of that image: Not just object category labels, but also bounding box coordinates. "Labels" is just shorter than "annotations". I also use the terms "labels" and "targets" more or less interchangeably throughout the documentation, although "targets" means labels specifically in the context of training.
263266
* "Predictor layer": The "predictor layers" or "predictors" are all the last convolution layers of the network, i.e. all convolution layers that do not feed into any subsequent convolution layers.

0 commit comments

Comments
 (0)