Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

Merged

Conversation

ferenci84
Copy link
Contributor

@ferenci84 ferenci84 commented Mar 1, 2025

Description

This modification allows users to see the human-readable parts of the thinking blocks in the LLM output.

This simple change is a first step on the way of full implementation of handling thinking blocks.

Checklist

  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Screenshots

[ For visual changes, include screenshots. ]
Screenshot 2025-03-01 at 14 18 00

Testing instructions

Thinking can be enabled in config this way:

   {
      "title": "Anthropic - Claude 3.7 Sonnet (thinking)",
      "model": "claude-3-7-sonnet-latest",
      "provider": "anthropic",
      "apiKey": "xxx",
      "completionOptions": {
        "temperature": 1,
        "topP": 0.999,
        "maxTokens": 8096
      },
      "requestOptions": {
        "extraBodyProperties": {
          "thinking": {
            "type": "enabled",
            "budget_tokens": 2048
          }
        }
      },
      "cacheBehavior": {
        "cacheConversation": true,
        "cacheSystemMessage": true
      }
    },

Watch the Continue - LLM Prompt/Completion output tab.

Next steps

  1. The thinking mode doesn't currently work with tool use. The software should add the thinking and redacted_thinking blocks to the conversation history when sending subsequent requests. Tool use requires this, with no tool use, any thinking blocks will be ignored.

  2. human-readable thinking blocks could be added to the UI in form of collapsible boxes. This is an example from TypingMind:

Screenshot 2025-03-01 at 14 29 38

Edit

Handling thinking blocks and redacted_thinking blocks have been added. Due to these changes, these thinking blocks have also became visible. This may require some visual update, but it's now accessible and functional, and we are closer to a final solution where users can read the thinking blocks if they want.

Additional testing instructions

The changes are working if you can call tools with no problem.

Redacted thinking can be tested by adding this string in your first prompt:
ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

Edit 2

I have added rendering of thinking blocks as shown in the screenshots below:

Screenshot 2025-03-05 at 11 42 56 Screenshot 2025-03-05 at 11 43 05 Screenshot 2025-03-05 at 11 42 33

Copy link

netlify bot commented Mar 1, 2025

Deploy Preview for continuedev canceled.

Name Link
🔨 Latest commit e9a3f97
🔍 Latest deploy log https://app.netlify.com/sites/continuedev/deploys/67d0a6f312e47300071990df

@ferenci84 ferenci84 changed the title add ThinkingChatMessage interface and handle thinking chunks in BaseLLM and Anthropic classes Handle extended thinking with Anthropic provider, accept streaming messages and show thinking blocks in the output tab Mar 1, 2025
@ferenci84 ferenci84 changed the title Handle extended thinking with Anthropic provider, accept streaming messages and show thinking blocks in the output tab Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab Mar 1, 2025
@chezsmithy
Copy link
Contributor

chezsmithy commented Mar 2, 2025

Would it be possible to add the same support to Bedrock.ts now that I have enabled tools support, and have a seprate PR to enable claude 3.7 on bedrock as well?

@ferenci84
Copy link
Contributor Author

Would it be possible to add the same support to Bedrock.ts now that I have enabled tools support, and have a seprate PR to enable claude 3.7 on bedrock as well?

@chezsmithy I have just sent a PR that does the same with Bedrock: #4444

@Symfomany
Copy link

Deep Research would be a killer-feature in this coding assistant Continue, it's possible to have Perplexity API Key too ?

@ferenci84
Copy link
Contributor Author

@RomneyDa You are the assignee of issue #4339 and this PR would resolve it, can you please review?

@SivasankarMathiyazhagan
Copy link

@ferenci84 - is there any calculations behind to set the budget tokens as "budget_tokens": 2048 ? Could you please suggest how we can determine the appropriate budget token limit for our use case?

@ferenci84 ferenci84 mentioned this pull request Mar 7, 2025
2 tasks
@ferenci84
Copy link
Contributor Author

@ferenci84 - is there any calculations behind to set the budget tokens as "budget_tokens": 2048 ? Could you please suggest how we can determine the appropriate budget token limit for our use case?

You need to make sure that contextLength is larger than maxTokens and maxTokens is larger than budgetTokens, and budgetTokens is at least 1024. This uses 3200 as budget tokens: https://bigcode-bench.github.io/

@sestinj
Copy link
Contributor

sestinj commented Mar 10, 2025

@ferenci84 this looks really good, type updates and all. Two ways I think it could be improved:

  • In this PR (3832: DeepSeek-R1 reasoning display can be toggled #3840) we have a separate UI for displaying thinking tokens. I think we should keep the same UI for both for consistency
  • as shown in this screenshot, the toolbar for [trash, copy, thumbs up, thumbs down] shows up between the input and thinking toggle, which causes some awkward blank space
Screenshot 2025-03-09 at 8 38 43 PM

Both of these thoughts lead to a bigger question of how we should combine the previous strategy for handling thinking tokens in Ollama (deepseek literally just outputs tags in its output) with this one. It seems to me that parsing the tags in streamChat in llm/llms/index.ts would be the best option

@ferenci84
Copy link
Contributor Author

Yes, the big difference between claude and deepseek is that with anthropic the thinking and is a separate message (even multiple messages in case of redacted thinking) and this should be fed back to the AI in the same way on the next turn (which is required for tool use, otherwise those blocks are just ignored). I was thinking about combining the srategy, but I thought that it would be overly complicated due to the big differences. As of UI change it would be one more week until I could update, but why not let other users start sooner as it is right now?

@ferenci84
Copy link
Contributor Author

The buttons between the first message and thinking block seems to be a mistake as it's not present in other cases.

@ferenci84
Copy link
Contributor Author

Thinking further, there is one corner case in which a shared implementation would be helpful: let's say you start the conversation with anthropic and then swith to deepseek. Not sure how common is that.

@ferenci84
Copy link
Contributor Author

My ide is to have a metadata object with dynamic data structure for assistant messages, in which thinking data could be added, but also other possibly complex data that help us reconstruct the messages. This could even be any type so that any additional data like signature, multiple redacted thinking that shouldn't be displayed, could be stored. In addition we could add an optional "display" key that would contain data about how to display, for example that deepseek will not show think tag in the main message, we could have sepaeate (optional) display information for main message and thinking message, that is parsed in the core (even better, in the LLM class as it would better help encapsulation), not in the UI as is is now. What do you think?

Any way, I think crucial updates should be done (for example those that support seeing thinking blocks and avoid errors with tools) because it would help the product to succeed.

@ferenci84
Copy link
Contributor Author

ferenci84 commented Mar 10, 2025

Let me explain:

The current process is this (simplified):
1.1. Chunks are produced in the LLM class
1.2. Somewhere in the code, relevant chunks are catched and Messge objects (actually message fragments) are yielded
1.3. Somewhere in the code (this is already in the frontend code), Message fragment objects are catched and those fragments combined into single messages
1.4. At this part, also it's decided how it will be displayed (e.g. are removed from the message and saved under a separate key of the message object)

2.1. Messages are fed back to the LLM
2.2. Messages are converted back so LLM can understand

Now we can move the 1.2 - 1.4 logic back to the LLM class this way:
We would create a virtual method (that can be overloaded with subclasses) that will do 1.3, let's say we can name it "reduceChunks" as it works similarly to a reducer, and produce single messages out of separate chunks. Here in the same method, logic can be added that saves information about 1.4, for example there would be two additional keys: displayContent and displayThinking.

For example, in case of Anthropic, reduceChunks method would convert the message chunks and, in the metadata, it would combine them into the original format, containing thinking with signature, or multiple redacted_thinking blocks, that can be easily converted when sending a subsequent message. As reduceChunks and streamChat methods would be in the very same class, not scattered in the codebase, one could easily see what to work with, when converting back to message threads. Simply, the output of reduceChunks would be received by the streamChat, unchanged.
In addition to being able to save metadata in the reduceChunks, it would also save display data. For example if we would have additional (and optional) display keys (for example display?: {content?: string; reasoning?: { content?: string, start?: Date, end?: Date, redacted?: boolean}}) Anthropic would save the thinking blocks as display.reasoning.content, and message content as display.content, while deepseek would regognize the and produce three contents: whole message saved in the key content (so it's easy to fed back to the LLM), content of the tags saved in display.reasoning.content, and the content with tags stripped out, saved in display.content.

All this logic would be in the provider class, so all these llm-specific logic would be part of the provider, would help with encapsulation, and locality of behavior (code proximity and behavioral cohesion). It would be easier to understand, easier to maintain. What do you think?

@FallDownTheSystem
Copy link
Contributor

I didn't realize someone else was already working on this. 😅

I've basically implemented the same functionality, with tool support and using the existing Reasoning UI: #4559

Although there's no support for Bedrock in mine. Also I tried and you can swap between thinking models, unless you use tools, then when you try to swap to DeepSeek you get an error.

Anyway, I'm happy with either or any solution being merged into Continue. 😄

@ferenci84
Copy link
Contributor Author

@FallDownTheSystem Checked your solution, I also wanted to go the same path of adding a new type of MessagePart, before I realized that it would require change in lot of seemingly irrelevant code. What do you think about a longterm solution to simplify as I explained above? Not sure if it was clear. For short term, I think just one of the working solution should be merged so that ordinary users can start to use the new feature of Claude, they will not care as long as it's working well.

@FallDownTheSystem
Copy link
Contributor

@FallDownTheSystem Checked your solution, I also wanted to go the same path of adding a new type of MessagePart, before I realized that it would require change in lot of seemingly irrelevant code. What do you think about a longterm solution to simplify as I explained above? Not sure if it was clear. For short term, I think just one of the working solution should be merged so that ordinary users can start to use the new feature of Claude, they will not care as long as it's working well.

It does feel like some kind of refactoring would be in order. Currently the message are converted into suitable format for the UI and then converted back for the API. Ideally you'd be able to keep the original data intact and have LLM specific methods that are responsible for converting that to a format that's suitable for the UI.

I can see that the code base has gotten more complex as support for new formats have been added. I'm sure originally everything was pretty simple when the output was just a string.

@ferenci84
Copy link
Contributor Author

@sestinj I corrected the issue of having action buttons below the user messages. Now it looks like this:

Screenshot 2025-03-10 at 14 06 55

Btw I made it similar to the context items dropdown.

If we merge with the deepseek solution:
Screenshot 2025-03-10 at 14 10 19

Personally I like the "thinking indicator" and the button design is a bit more prominent. I think the text-box rendering is a bit more polished in the Anthropic thinking display.

I would raise a question why seconds taken on thinking is important (I believe it's a usable info for prompt tuning, not here). What I would be interested in, is the number of tokens taken on thinking, even while in progress. So it would be something like this:
Thinking... (1526 tokens)
And when finished:
Reasoning (2500 tokens)

Btw the other PR contains everything done in this PR, and also the Bedrock implementation, so that should be merged instead of this one:
#4453

@ferenci84
Copy link
Contributor Author

@sestinj I made more versions, please check screenshots:

Screenshot 2025-03-10 at 18 06 37 Screenshot 2025-03-10 at 18 06 53 Screenshot 2025-03-10 at 18 06 28

And the previous version can be enhanced with the ... in-progress indicator:
Screenshot 2025-03-10 at 18 05 24

Seconds can also be added easily if needed, right from within the component (with a simple benchmark at the frontend that is the fallback in the component used for deepseek), although I think deepseek outputs this info from the model.

Including tokens, that I suggested, is not as easy as importing the token counter causes some conflict, not sure why, but it could possibly be done in an other way.

@ferenci84
Copy link
Contributor Author

ferenci84 commented Mar 11, 2025

@sestinj I have changed it to a button display and also added seconds:
Screenshot 2025-03-11 at 22 19 51

It looks like this when seconds is not available:
Screenshot 2025-03-11 at 22 26 06

When opened:
Screenshot 2025-03-11 at 22 23 22

It's also part of the PR with Bedrock support: #4453

@sestinj sestinj merged commit 7949f9a into continuedev:main Mar 16, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants