-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426
Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426
Conversation
…seLLM` and `Anthropic` classes
✅ Deploy Preview for continuedev canceled.
|
ThinkingChatMessage
interface and handle thinking chunks in BaseLLM
and Anthropic
classes
Would it be possible to add the same support to Bedrock.ts now that I have enabled tools support, and have a seprate PR to enable claude 3.7 on bedrock as well? |
@chezsmithy I have just sent a PR that does the same with Bedrock: #4444 |
Deep Research would be a killer-feature in this coding assistant Continue, it's possible to have Perplexity API Key too ? |
@ferenci84 - is there any calculations behind to set the budget tokens as "budget_tokens": 2048 ? Could you please suggest how we can determine the appropriate budget token limit for our use case? |
You need to make sure that contextLength is larger than maxTokens and maxTokens is larger than budgetTokens, and budgetTokens is at least 1024. This uses 3200 as budget tokens: https://bigcode-bench.github.io/ |
@ferenci84 this looks really good, type updates and all. Two ways I think it could be improved:
![]() Both of these thoughts lead to a bigger question of how we should combine the previous strategy for handling thinking tokens in Ollama (deepseek literally just outputs tags in its output) with this one. It seems to me that parsing the tags in |
Yes, the big difference between claude and deepseek is that with anthropic the thinking and is a separate message (even multiple messages in case of redacted thinking) and this should be fed back to the AI in the same way on the next turn (which is required for tool use, otherwise those blocks are just ignored). I was thinking about combining the srategy, but I thought that it would be overly complicated due to the big differences. As of UI change it would be one more week until I could update, but why not let other users start sooner as it is right now? |
The buttons between the first message and thinking block seems to be a mistake as it's not present in other cases. |
Thinking further, there is one corner case in which a shared implementation would be helpful: let's say you start the conversation with anthropic and then swith to deepseek. Not sure how common is that. |
My ide is to have a metadata object with dynamic data structure for assistant messages, in which thinking data could be added, but also other possibly complex data that help us reconstruct the messages. This could even be any type so that any additional data like signature, multiple redacted thinking that shouldn't be displayed, could be stored. In addition we could add an optional "display" key that would contain data about how to display, for example that deepseek will not show think tag in the main message, we could have sepaeate (optional) display information for main message and thinking message, that is parsed in the core (even better, in the LLM class as it would better help encapsulation), not in the UI as is is now. What do you think? Any way, I think crucial updates should be done (for example those that support seeing thinking blocks and avoid errors with tools) because it would help the product to succeed. |
Let me explain: The current process is this (simplified): 2.1. Messages are fed back to the LLM Now we can move the 1.2 - 1.4 logic back to the LLM class this way: For example, in case of Anthropic, reduceChunks method would convert the message chunks and, in the metadata, it would combine them into the original format, containing thinking with signature, or multiple redacted_thinking blocks, that can be easily converted when sending a subsequent message. As reduceChunks and streamChat methods would be in the very same class, not scattered in the codebase, one could easily see what to work with, when converting back to message threads. Simply, the output of reduceChunks would be received by the streamChat, unchanged. All this logic would be in the provider class, so all these llm-specific logic would be part of the provider, would help with encapsulation, and locality of behavior (code proximity and behavioral cohesion). It would be easier to understand, easier to maintain. What do you think? |
I didn't realize someone else was already working on this. 😅 I've basically implemented the same functionality, with tool support and using the existing Reasoning UI: #4559 Although there's no support for Bedrock in mine. Also I tried and you can swap between thinking models, unless you use tools, then when you try to swap to DeepSeek you get an error. Anyway, I'm happy with either or any solution being merged into Continue. 😄 |
@FallDownTheSystem Checked your solution, I also wanted to go the same path of adding a new type of MessagePart, before I realized that it would require change in lot of seemingly irrelevant code. What do you think about a longterm solution to simplify as I explained above? Not sure if it was clear. For short term, I think just one of the working solution should be merged so that ordinary users can start to use the new feature of Claude, they will not care as long as it's working well. |
It does feel like some kind of refactoring would be in order. Currently the message are converted into suitable format for the UI and then converted back for the API. Ideally you'd be able to keep the original data intact and have LLM specific methods that are responsible for converting that to a format that's suitable for the UI. I can see that the code base has gotten more complex as support for new formats have been added. I'm sure originally everything was pretty simple when the output was just a string. |
…s2' into anthropic_process_thinking_blocks2
@sestinj I corrected the issue of having action buttons below the user messages. Now it looks like this: ![]() Btw I made it similar to the context items dropdown. If we merge with the deepseek solution: Personally I like the "thinking indicator" and the button design is a bit more prominent. I think the text-box rendering is a bit more polished in the Anthropic thinking display. I would raise a question why seconds taken on thinking is important (I believe it's a usable info for prompt tuning, not here). What I would be interested in, is the number of tokens taken on thinking, even while in progress. So it would be something like this: Btw the other PR contains everything done in this PR, and also the Bedrock implementation, so that should be merged instead of this one: |
@sestinj I made more versions, please check screenshots: ![]() ![]() ![]() And the previous version can be enhanced with the ... in-progress indicator: Seconds can also be added easily if needed, right from within the component (with a simple benchmark at the frontend that is the fallback in the component used for deepseek), although I think deepseek outputs this info from the model. Including tokens, that I suggested, is not as easy as importing the token counter causes some conflict, not sure why, but it could possibly be done in an other way. |
…thinking state and elapsed time
Description
This modification allows users to see the human-readable parts of the thinking blocks in the LLM output.
This simple change is a first step on the way of full implementation of handling thinking blocks.
Checklist
Screenshots
[ For visual changes, include screenshots. ]

Testing instructions
Thinking can be enabled in config this way:
Watch the Continue - LLM Prompt/Completion output tab.
Next steps
The thinking mode doesn't currently work with tool use. The software should add the
thinking
andredacted_thinking
blocks to the conversation history when sending subsequent requests. Tool use requires this, with no tool use, any thinking blocks will be ignored.human-readable thinking blocks could be added to the UI in form of collapsible boxes. This is an example from TypingMind:
Edit
Handling thinking blocks and redacted_thinking blocks have been added. Due to these changes, these thinking blocks have also became visible. This may require some visual update, but it's now accessible and functional, and we are closer to a final solution where users can read the thinking blocks if they want.
Additional testing instructions
The changes are working if you can call tools with no problem.
Redacted thinking can be tested by adding this string in your first prompt:
ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
Edit 2
I have added rendering of thinking blocks as shown in the screenshots below: