You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-8
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@
16
16
17
17
### Overview
18
18
19
-
This is a Keras implementation of the SSD model architecture introduced by Wei Liu et al. in the paper [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325).
19
+
This is a Keras port of the SSD model architecture introduced by Wei Liu et al. in the paper [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325).
20
20
21
21
Ports of the trained weights of all the original models are provided below. This implementation is accurate, meaning that both the ported weights and models trained from scratch produce the same mAP values as the respective models of the original Caffe implementation (see performance section below).
22
22
@@ -33,11 +33,12 @@ If you would like to build an SSD with your own base network architecture, you c
33
33
34
34
### Performance
35
35
36
-
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2012 `test`) or the official Pascal VOC Matlab evaluation script (for 2007 `test`). In all cases the results are either identical to those of the original Caffe models or surpass them slightly. Download links to all ported weights are available further below.
36
+
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2012 `test`) or the official Pascal VOC Matlab evaluation script (for 2007 `test`). In all cases the results either match (or slightly surpass) those of the original Caffe models. Download links to all ported weights are available further below.
37
37
38
38
<tablewidth="70%">
39
39
<tr>
40
-
<td colspan=4 align=center>Mean Average Precision</td>
40
+
<td></td>
41
+
<td colspan=3 align=center>Mean Average Precision</td>
41
42
</tr>
42
43
<tr>
43
44
<td>evaluated on</td>
@@ -68,7 +69,8 @@ Training an SSD300 from scratch to convergence on Pascal VOC 2007 `trainval` and
68
69
69
70
<tablewidth="95%">
70
71
<tr>
71
-
<td colspan=4 align=center>Mean Average Precision</td>
72
+
<td></td>
73
+
<td colspan=3 align=center>Mean Average Precision</td>
72
74
</tr>
73
75
<tr>
74
76
<td></td>
@@ -84,11 +86,12 @@ Training an SSD300 from scratch to convergence on Pascal VOC 2007 `trainval` and
84
86
</tr>
85
87
</table>
86
88
87
-
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1070 mobile (i.e. the laptop version). There are two things to note here. First, note that the benchmark prediction speeds of the original Caffe implementation were achieved using a TitanX GPU. Second, the paper says they measured the prediction speed at batch size 8, which I think isn't a meaningful way of measuring the speed. The whole point of measuring the speed of a detection model is to know how many individual sequential images the model can process per second, therefore measuring the prediction speed on batches and then deducing the time spent on each individual image in the batch defeats the whole purpose. For the sake of comparability, below you find the predictions speed for the original Caffe SSD implementation and the prediction speed for this implementation under the same conditions, i.e. at batch size 8. In addition you find the prediction speed for this implementation at batch size 1, which in my opinion is the more meaningful number.
89
+
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1070 mobile (i.e. the laptop version) and cuDNN v6. There are two things to note here. First, note that the benchmark prediction speeds of the original Caffe implementation were achieved using a TitanX GPU and cuDNN v4. Second, the paper says they measured the prediction speed at batch size 8, which I think isn't a meaningful way of measuring the speed. The whole point of measuring the speed of a detection model is to know how many individual sequential images the model can process per second, therefore measuring the prediction speed on batches of images and then deducing the time spent on each individual image in the batch defeats the purpose. For the sake of comparability, below you find the prediction speed for the original Caffe SSD implementation and the prediction speed for this implementation under the same conditions, i.e. at batch size 8. In addition you find the prediction speed for this implementation at batch size 1, which in my opinion is the more meaningful number.
88
90
89
91
<tablewidth>
90
92
<tr>
91
-
<td colspan=4 align=center>Frames per Second</td>
93
+
<td></td>
94
+
<td colspan=3 align=center>Frames per Second</td>
92
95
</tr>
93
96
<tr>
94
97
<td></td>
@@ -258,6 +261,6 @@ Currently in the works:
258
261
259
262
### Terminology
260
263
261
-
* "Anchor boxes": The paper calls them "default boxes", in the original C++ code they are called "prior boxes" or "priors", and the Faster R-CNN paper calls them "anchor boxes". All terms mean the same thing, but I slightly prefer the name "anchor boxes" because I find it to be the most descriptive of these names. I call them "prior boxes" or "priors" in `keras_ssd300.py` to stay consistent with the original Caffe implementation, but everywhere else I use the name "anchor boxes" or "anchors".
262
-
* "Labels": For the purpose of this project, datasets consist of "images" and "labels". Everything that belongs to the annotations of a given image is the "labels" of that image: Not just object category labels, but also bounding box coordinates. I also use the terms "labels" and "targets" more or less interchangeably throughout the documentation, although "targets" means labels specifically in the context of training.
264
+
* "Anchor boxes": The paper calls them "default boxes", in the original C++ code they are called "prior boxes" or "priors", and the Faster R-CNN paper calls them "anchor boxes". All terms mean the same thing, but I slightly prefer the name "anchor boxes" because I find it to be the most descriptive of these names. I call them "prior boxes" or "priors" in `keras_ssd300.py`and `keras_ssd512.py`to stay consistent with the original Caffe implementation, but everywhere else I use the name "anchor boxes" or "anchors".
265
+
* "Labels": For the purpose of this project, datasets consist of "images" and "labels". Everything that belongs to the annotations of a given image is the "labels" of that image: Not just object category labels, but also bounding box coordinates. "Labels" is just shorter than "annotations". I also use the terms "labels" and "targets" more or less interchangeably throughout the documentation, although "targets" means labels specifically in the context of training.
263
266
* "Predictor layer": The "predictor layers" or "predictors" are all the last convolution layers of the network, i.e. all convolution layers that do not feed into any subsequent convolution layers.
0 commit comments