Open
Description
The idea is really cool (outline is really important for long generations),
However, based on my own experience, you can get far better results if you include in the prompt both outline, summary up to this point (generate it with recursive summarization), previous paragraph, and the model's goal are to predict the next paragraph.
It would be interesting to see the results of your approach since I did not try this with J1 models.
Do you have any results you are willing to share about this?
(BTW, the best way I found to generalize to "longer than training prompt" length is using relative attention (such Transformer-XL) but I assume this is off the table with an already pre-trained 178B model..)
Metadata
Metadata
Assignees
Labels
No labels