Description
Hi All,
versions:
python 3.9
tensorflow_decision_forests==0.2.6
tensorflow==2.9.1
Running on AWS instance type: ml.m5.24xlarge
Problem description:
When setting max_vocab_count in tfdf.keras.FeatureUsage and in tfdf.keras.GradientBoostedTreesModel to 20, features of type: CATEGORICAL integerized won't be affected and original vocabulary size will be used, while in features of type: CATEGORICAL has-dict max_vocab_count will be applied correctly:
Please see the statistics on the log for example, both using the same feature usage:
"request_id" CATEGORICAL integerized vocab-size:8806 no-ood-item
"request_tile" CATEGORICAL has-dict vocab-size:21 num-oods:2823 (0.0014115%) most-frequent:"851fb467fffffff" 2395895 (1.19795%)
request_id is ignored by the guide and doesn't use the max_vocab_count,
request_tile is handled correctly.
Will appreciate your help, Thank you