Count tokens when embedding

It could be helpful to the user to understand how many tokens are in their dataset (and how many tokens are in a given cluster).

We can just capture the tokens encoded during the embedding step.

We will need to consider that someone importing embeddings may not have recorded the token counts so surfacing it in the UI would be optional.

The token count could be stored in a parallel array in the h5 file, and later turned into a column in the scope parquet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Count tokens when embedding #77

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Count tokens when embedding #77

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions