let model do validate among a new recorded variable global_steps rather than epochs #566

sakuranew · 2019-11-27T07:58:59Z

A new feature may be sound like this
The default fit loop do validate each epoch，so if one want to do validate more steps or less steps，can only make train_split =None and implement a new callback.

So I wonder if this project can achieve such a function or callback class，I plan to implement a callback for my need and I‘m pleasured to contribute to this if needed
In many papers such like bert，they say optimized how many steps rather than how many epochs
In bert source code

flags.DEFINE_integer("num_train_steps", 100000, "Number of training steps.")

flags.DEFINE_integer("num_warmup_steps", 10000, "Number of warmup steps.")

flags.DEFINE_integer("save_checkpoints_steps", 1000,
                     "How often to save the model checkpoint.")
 if FLAGS.do_train:
    tf.logging.info("***** Running training *****")
    tf.logging.info("  Batch size = %d", FLAGS.train_batch_size)
    train_input_fn = input_fn_builder(
        input_files=input_files,
        max_seq_length=FLAGS.max_seq_length,
        max_predictions_per_seq=FLAGS.max_predictions_per_seq,
        is_training=True)
    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)

BenjaminBossan · 2019-11-27T10:11:56Z

Thanks for the suggestion. Maybe you could help me clarify some things.

This is what I understood: You want to either 1) validate less frequently than once per epoch (e.g. validate only every 10th train epoch), or 2) more frequently, e.g. 10x per train epoch. Is that correct so far?

Regarding 1), please have a look at #564. Maybe we can do something there to make that feature possible. Regarding 2), this sounds more difficult. E.g. we have the history object. This object has one row for each epoch. If you want to validate more than once per epoch, it would be hard to log in the history.

Regarding your tf code, I'm sorry but I don't understand what it's doing. Could you explain? What is the conceptual difference between epoch and step -- is a step a batch?

sakuranew · 2019-11-27T11:19:07Z

yeah，step is a batch for each optimizer.step()

If controllable validate frequency is a good feature ， I think we can add a key such as global_step and make global_step +=1 in on_batch_end ,when a new epoch begin, make global_step = last epoch's global_step . Then in fit_loop, through a frequency parameters,decide to validate or not ,the default frequency can be (len(dataset)-1)//batch_size+1 which just validate each epoch.

This object has one row for each epoch. If you want to validate more than once per epoch, it would be hard to log in the history.

This is a problem，I didn't think about it，in my code ，I drop the PrintLog Callback
Thank you for your reply，I will close this and think about it later

BenjaminBossan · 2019-11-27T19:00:54Z

The number of training batches can already be calculated from the history: sum(net_fit.history[:, 'train_batch_count']).

Regarding the option to trigger validation after a certain number of training batches, I find this a bit odd since sometimes this will trigger at the beginning of the epoch, sometimes at the end, sometimes inbetween (depending on the number of batches per epoch and the step size). Would it not make more sense to validate every N complete training epochs (with N>=1)? I believe that should be easy to implement with the changes in #564.

BenjaminBossan added the question label Nov 27, 2019

sakuranew closed this as completed Nov 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

let model do validate among a new recorded variable global_steps rather than epochs #566

let model do validate among a new recorded variable global_steps rather than epochs #566

sakuranew commented Nov 27, 2019

BenjaminBossan commented Nov 27, 2019

sakuranew commented Nov 27, 2019

BenjaminBossan commented Nov 27, 2019

let model do validate among a new recorded variable global_steps rather than epochs #566

let model do validate among a new recorded variable global_steps rather than epochs #566

Comments

sakuranew commented Nov 27, 2019

BenjaminBossan commented Nov 27, 2019

sakuranew commented Nov 27, 2019

BenjaminBossan commented Nov 27, 2019