adjustable gradient clipping ("clip_gradients / current_lr")

This paper:

https://arxiv.org/pdf/1511.04587.pdf

... suggests "adjustable gradient clipping", which seems to greatly help in training deep networks quickly and efficiently. Basically they suggest to scale the gradient clipping with "1 / learning rate". So they want to clip the gradients to [- clip_gradients / current_lr, + clip_gradients / current_lr].

As far as I can see, Caffe doesn't support this yet, correct? Might be useful to add?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adjustable gradient clipping ("clip_gradients / current_lr") #4671

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

adjustable gradient clipping ("clip_gradients / current_lr") #4671

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions