Skip to content

max_vocab_count won't work for CATEGORICAL integerized in tfdf.keras.GradientBoostedTreesModel #190

Open
@advahadr

Description

@advahadr

Hi All,

versions:
python 3.9
tensorflow_decision_forests==0.2.6
tensorflow==2.9.1

Running on AWS instance type: ml.m5.24xlarge

Problem description:
When setting max_vocab_count in tfdf.keras.FeatureUsage and in tfdf.keras.GradientBoostedTreesModel to 20, features of type: CATEGORICAL integerized won't be affected and original vocabulary size will be used, while in features of type: CATEGORICAL has-dict max_vocab_count will be applied correctly:
Please see the statistics on the log for example, both using the same feature usage:

"request_id" CATEGORICAL integerized vocab-size:8806 no-ood-item
"request_tile" CATEGORICAL has-dict vocab-size:21 num-oods:2823 (0.0014115%) most-frequent:"851fb467fffffff" 2395895 (1.19795%)

request_id is ignored by the guide and doesn't use the max_vocab_count,
request_tile is handled correctly.

Will appreciate your help, Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions