Open
Description
It could be helpful to the user to understand how many tokens are in their dataset (and how many tokens are in a given cluster).
We can just capture the tokens encoded during the embedding step.
We will need to consider that someone importing embeddings may not have recorded the token counts so surfacing it in the UI would be optional.
The token count could be stored in a parallel array in the h5 file, and later turned into a column in the scope parquet.