text generation #2874

wangjiawen2013 · 2025-03-04T09:15:05Z

wangjiawen2013
Mar 4, 2025

Hi,
In the example of text-generation (https://github.com/tracel-ai/burn/tree/main/examples/text-generation), only the training process was performed. How to generate a new text ? I think the principle of generation is a bit different from training, because we can only generate new words one by one, so we cannot use attention mask during text generation. But during training, we can use attention mask to speedup the training parallelized becuase we have already known the whole text sequence.

laggui · 2025-03-04T13:19:55Z

laggui
Mar 4, 2025
Maintainer

Text generation is autoregressive, so you have to generate a single token at a time. Take a look at the llama generation example for example.

0 replies

wangjiawen2013 · 2025-03-05T01:16:12Z

wangjiawen2013
Mar 5, 2025
Author

What's the relationship between transformer example and llama generation example? In my opinion, llama is a model based on transformer.
There is also a new big language model named deepseek, does burn support deepseek ?

0 replies

laggui · 2025-03-05T13:18:08Z

laggui
Mar 5, 2025
Maintainer

What's the relationship between transformer example and llama generation example? In my opinion, llama is a model based on transformer.

Yep! Llama is a model formed by decoder-only transformer blocks. I was linking to the llama generation example simply because it illustrates the autoregressive inference process.

There is also a new big language model named deepseek, does burn support deepseek ?

Not yet 👀 it would be a great addition.

0 replies

wangjiawen2013 · 2025-03-11T01:41:33Z

wangjiawen2013
Mar 11, 2025
Author

@laggui
I have run the text-generation example successfully, while it seems that the accuracy is very low (only 10%). Here is the dashboard:

Is the result the same as your test ?

2 replies

laggui Mar 21, 2025
Maintainer

Hmmm I haven't actually checked this example to completion in a while, would have to give it a shot to get back with the results!

I don't recall what the ballpark accuracy is for convergence and it is not documented 😅

laggui Mar 21, 2025
Maintainer

Ok so while 10% might sound low when compared to typical tasks with classification metrics (e.g., image classification), in practice this should still result in coherent and somewhat useful text generation.

Next-token prediction isn't an exact match task, the model just needs to pick a plausible continuation, not the single "correct" token. So the accuracy metric does not paint the whole picture.

Depending on task complexity, large SOTA models probably don't even get much more than 70% next-token accuracy. This example uses a very small model trained for a short amount of time, so I think 10% is reasonable 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text generation #2874

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

text generation #2874

wangjiawen2013 Mar 4, 2025

Replies: 4 comments · 2 replies

laggui Mar 4, 2025 Maintainer

wangjiawen2013 Mar 5, 2025 Author

laggui Mar 5, 2025 Maintainer

wangjiawen2013 Mar 11, 2025 Author

laggui Mar 21, 2025 Maintainer

laggui Mar 21, 2025 Maintainer

wangjiawen2013
Mar 4, 2025

Replies: 4 comments 2 replies

laggui
Mar 4, 2025
Maintainer

wangjiawen2013
Mar 5, 2025
Author

laggui
Mar 5, 2025
Maintainer

wangjiawen2013
Mar 11, 2025
Author

laggui Mar 21, 2025
Maintainer

laggui Mar 21, 2025
Maintainer