Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

ferenci84 · 2025-03-01T14:00:01Z

Description

This modification allows users to see the human-readable parts of the thinking blocks in the LLM output.

This simple change is a first step on the way of full implementation of handling thinking blocks.

Checklist

The relevant docs, if any, have been updated or created
The relevant tests, if any, have been updated or created

Screenshots

[ For visual changes, include screenshots. ]

Testing instructions

Thinking can be enabled in config this way:

   {
      "title": "Anthropic - Claude 3.7 Sonnet (thinking)",
      "model": "claude-3-7-sonnet-latest",
      "provider": "anthropic",
      "apiKey": "xxx",
      "completionOptions": {
        "temperature": 1,
        "topP": 0.999,
        "maxTokens": 8096
      },
      "requestOptions": {
        "extraBodyProperties": {
          "thinking": {
            "type": "enabled",
            "budget_tokens": 2048
          }
        }
      },
      "cacheBehavior": {
        "cacheConversation": true,
        "cacheSystemMessage": true
      }
    },

Watch the Continue - LLM Prompt/Completion output tab.

Next steps

The thinking mode doesn't currently work with tool use. The software should add the thinking and redacted_thinking blocks to the conversation history when sending subsequent requests. Tool use requires this, with no tool use, any thinking blocks will be ignored.
human-readable thinking blocks could be added to the UI in form of collapsible boxes. This is an example from TypingMind:

Edit

Handling thinking blocks and redacted_thinking blocks have been added. Due to these changes, these thinking blocks have also became visible. This may require some visual update, but it's now accessible and functional, and we are closer to a final solution where users can read the thinking blocks if they want.

Additional testing instructions

The changes are working if you can call tools with no problem.

Redacted thinking can be tested by adding this string in your first prompt:
ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

Edit 2

I have added rendering of thinking blocks as shown in the screenshots below:

…seLLM` and `Anthropic` classes

netlify · 2025-03-01T14:00:18Z

✅ Deploy Preview for continuedev canceled.

Name	Link
🔨 Latest commit	`e9a3f97`
🔍 Latest deploy log	https://app.netlify.com/sites/continuedev/deploys/67d0a6f312e47300071990df

chezsmithy · 2025-03-02T05:08:52Z

Would it be possible to add the same support to Bedrock.ts now that I have enabled tools support, and have a seprate PR to enable claude 3.7 on bedrock as well?

ferenci84 · 2025-03-03T11:01:31Z

Would it be possible to add the same support to Bedrock.ts now that I have enabled tools support, and have a seprate PR to enable claude 3.7 on bedrock as well?

@chezsmithy I have just sent a PR that does the same with Bedrock: #4444

Symfomany · 2025-03-04T10:34:18Z

Deep Research would be a killer-feature in this coding assistant Continue, it's possible to have Perplexity API Key too ?

ferenci84 · 2025-03-05T10:49:18Z

@RomneyDa You are the assignee of issue #4339 and this PR would resolve it, can you please review?

SivasankarMathiyazhagan · 2025-03-07T06:20:23Z

@ferenci84 - is there any calculations behind to set the budget tokens as "budget_tokens": 2048 ? Could you please suggest how we can determine the appropriate budget token limit for our use case?

ferenci84 · 2025-03-07T12:47:22Z

@ferenci84 - is there any calculations behind to set the budget tokens as "budget_tokens": 2048 ? Could you please suggest how we can determine the appropriate budget token limit for our use case?

You need to make sure that contextLength is larger than maxTokens and maxTokens is larger than budgetTokens, and budgetTokens is at least 1024. This uses 3200 as budget tokens: https://bigcode-bench.github.io/

sestinj · 2025-03-10T03:46:40Z

@ferenci84 this looks really good, type updates and all. Two ways I think it could be improved:

In this PR (3832: DeepSeek-R1 reasoning display can be toggled #3840) we have a separate UI for displaying thinking tokens. I think we should keep the same UI for both for consistency
as shown in this screenshot, the toolbar for [trash, copy, thumbs up, thumbs down] shows up between the input and thinking toggle, which causes some awkward blank space

Both of these thoughts lead to a bigger question of how we should combine the previous strategy for handling thinking tokens in Ollama (deepseek literally just outputs tags in its output) with this one. It seems to me that parsing the tags in streamChat in llm/llms/index.ts would be the best option

ferenci84 · 2025-03-10T05:30:42Z

Yes, the big difference between claude and deepseek is that with anthropic the thinking and is a separate message (even multiple messages in case of redacted thinking) and this should be fed back to the AI in the same way on the next turn (which is required for tool use, otherwise those blocks are just ignored). I was thinking about combining the srategy, but I thought that it would be overly complicated due to the big differences. As of UI change it would be one more week until I could update, but why not let other users start sooner as it is right now?

ferenci84 · 2025-03-10T05:59:54Z

The buttons between the first message and thinking block seems to be a mistake as it's not present in other cases.

ferenci84 · 2025-03-10T06:11:54Z

Thinking further, there is one corner case in which a shared implementation would be helpful: let's say you start the conversation with anthropic and then swith to deepseek. Not sure how common is that.

ferenci84 · 2025-03-10T06:27:18Z

My ide is to have a metadata object with dynamic data structure for assistant messages, in which thinking data could be added, but also other possibly complex data that help us reconstruct the messages. This could even be any type so that any additional data like signature, multiple redacted thinking that shouldn't be displayed, could be stored. In addition we could add an optional "display" key that would contain data about how to display, for example that deepseek will not show think tag in the main message, we could have sepaeate (optional) display information for main message and thinking message, that is parsed in the core (even better, in the LLM class as it would better help encapsulation), not in the UI as is is now. What do you think?

Any way, I think crucial updates should be done (for example those that support seeing thinking blocks and avoid errors with tools) because it would help the product to succeed.

ferenci84 · 2025-03-10T07:35:23Z

Let me explain:

The current process is this (simplified):
1.1. Chunks are produced in the LLM class
1.2. Somewhere in the code, relevant chunks are catched and Messge objects (actually message fragments) are yielded
1.3. Somewhere in the code (this is already in the frontend code), Message fragment objects are catched and those fragments combined into single messages
1.4. At this part, also it's decided how it will be displayed (e.g. are removed from the message and saved under a separate key of the message object)

2.1. Messages are fed back to the LLM
2.2. Messages are converted back so LLM can understand

Now we can move the 1.2 - 1.4 logic back to the LLM class this way:
We would create a virtual method (that can be overloaded with subclasses) that will do 1.3, let's say we can name it "reduceChunks" as it works similarly to a reducer, and produce single messages out of separate chunks. Here in the same method, logic can be added that saves information about 1.4, for example there would be two additional keys: displayContent and displayThinking.

For example, in case of Anthropic, reduceChunks method would convert the message chunks and, in the metadata, it would combine them into the original format, containing thinking with signature, or multiple redacted_thinking blocks, that can be easily converted when sending a subsequent message. As reduceChunks and streamChat methods would be in the very same class, not scattered in the codebase, one could easily see what to work with, when converting back to message threads. Simply, the output of reduceChunks would be received by the streamChat, unchanged.
In addition to being able to save metadata in the reduceChunks, it would also save display data. For example if we would have additional (and optional) display keys (for example display?: {content?: string; reasoning?: { content?: string, start?: Date, end?: Date, redacted?: boolean}}) Anthropic would save the thinking blocks as display.reasoning.content, and message content as display.content, while deepseek would regognize the and produce three contents: whole message saved in the key content (so it's easy to fed back to the LLM), content of the tags saved in display.reasoning.content, and the content with tags stripped out, saved in display.content.

All this logic would be in the provider class, so all these llm-specific logic would be part of the provider, would help with encapsulation, and locality of behavior (code proximity and behavioral cohesion). It would be easier to understand, easier to maintain. What do you think?

FallDownTheSystem · 2025-03-10T09:12:29Z

I didn't realize someone else was already working on this. 😅

I've basically implemented the same functionality, with tool support and using the existing Reasoning UI: #4559

Although there's no support for Bedrock in mine. Also I tried and you can swap between thinking models, unless you use tools, then when you try to swap to DeepSeek you get an error.

Anyway, I'm happy with either or any solution being merged into Continue. 😄

ferenci84 · 2025-03-10T10:29:53Z

@FallDownTheSystem Checked your solution, I also wanted to go the same path of adding a new type of MessagePart, before I realized that it would require change in lot of seemingly irrelevant code. What do you think about a longterm solution to simplify as I explained above? Not sure if it was clear. For short term, I think just one of the working solution should be merged so that ordinary users can start to use the new feature of Claude, they will not care as long as it's working well.

FallDownTheSystem · 2025-03-10T11:20:35Z

@FallDownTheSystem Checked your solution, I also wanted to go the same path of adding a new type of MessagePart, before I realized that it would require change in lot of seemingly irrelevant code. What do you think about a longterm solution to simplify as I explained above? Not sure if it was clear. For short term, I think just one of the working solution should be merged so that ordinary users can start to use the new feature of Claude, they will not care as long as it's working well.

It does feel like some kind of refactoring would be in order. Currently the message are converted into suitable format for the UI and then converted back for the API. Ideally you'd be able to keep the original data intact and have LLM specific methods that are responsible for converting that to a format that's suitable for the UI.

I can see that the code base has gotten more complex as support for new formats have been added. I'm sure originally everything was pretty simple when the output was just a string.

…s2' into anthropic_process_thinking_blocks2

ferenci84 · 2025-03-10T13:24:41Z

@sestinj I corrected the issue of having action buttons below the user messages. Now it looks like this:

Btw I made it similar to the context items dropdown.

If we merge with the deepseek solution:

Personally I like the "thinking indicator" and the button design is a bit more prominent. I think the text-box rendering is a bit more polished in the Anthropic thinking display.

I would raise a question why seconds taken on thinking is important (I believe it's a usable info for prompt tuning, not here). What I would be interested in, is the number of tokens taken on thinking, even while in progress. So it would be something like this:
Thinking... (1526 tokens)
And when finished:
Reasoning (2500 tokens)

Btw the other PR contains everything done in this PR, and also the Bedrock implementation, so that should be merged instead of this one:
#4453

ferenci84 · 2025-03-10T17:15:20Z

@sestinj I made more versions, please check screenshots:

And the previous version can be enhanced with the ... in-progress indicator:

Seconds can also be added easily if needed, right from within the component (with a simple benchmark at the frontend that is the fallback in the component used for deepseek), although I think deepseek outputs this info from the model.

Including tokens, that I suggested, is not as easy as importing the token counter causes some conflict, not sure why, but it could possibly be done in an other way.

…thinking state and elapsed time

ferenci84 · 2025-03-11T21:24:46Z

@sestinj I have changed it to a button display and also added seconds:

It looks like this when seconds is not available:

When opened:

It's also part of the PR with Bedrock support: #4453

add ThinkingChatMessage interface and handle thinking chunks in `Ba…

44d91ec

…seLLM` and `Anthropic` classes

ferenci84 mentioned this pull request Mar 1, 2025

Process thinking blocks with Anthropic provider and write to the Output. #4425

Closed

2 tasks

ferenci84 changed the title ~~add ThinkingChatMessage interface and handle thinking chunks in BaseLLM and Anthropic classes~~ Handle extended thinking with Anthropic provider, accept streaming messages and show thinking blocks in the output tab Mar 1, 2025

ferenci84 changed the title ~~Handle extended thinking with Anthropic provider, accept streaming messages and show thinking blocks in the output tab~~ Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab Mar 1, 2025

correct type errors

8c330cb

ferenci84 mentioned this pull request Mar 3, 2025

Process thinking blocks for Anthropic with Bedrock provider #4444

Closed

This was referenced Mar 3, 2025

Process thinking blocks for Anthropic with Bedrock provider #4453

Merged

Add support for Claude 3.7 Thinking & Budget Tokens #4339

Closed

Anthropic requires thinking and redacted_thinking blocks to be added to conversation history for tool support #4470

Closed

ferenci84 added 2 commits March 4, 2025 17:57

Adding thinking blocks and handle redacted_thinking

a15d116

Add ThinkingBlockPeek component and integrate it in Chat.tsx

93602e3

ferenci84 mentioned this pull request Mar 7, 2025

Feat{providers}: bedrock-anthropic #4522

Closed

2 tasks

Merge branch 'main' into anthropic_process_thinking_blocks2

b56295b

update hideActionSpace condition to include "thinking" role

7a7b29f

ferenci84 added 2 commits March 10, 2025 13:54

Merge remote-tracking branch 'origin/anthropic_process_thinking_block…

bf1faed

…s2' into anthropic_process_thinking_blocks2

adjust margin values for icons in ThinkingBlockPeek component

288ca1b

add inProgress prop and update ThinkingBlockPeek component to handle …

e9a3f97

…thinking state and elapsed time

sestinj merged commit 7949f9a into continuedev:main Mar 16, 2025
29 checks passed

FallDownTheSystem mentioned this pull request Mar 24, 2025

Add support for reasoning in the UI #4559

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

ferenci84 commented Mar 1, 2025 •

edited

Loading

netlify bot commented Mar 1, 2025 •

edited

Loading

chezsmithy commented Mar 2, 2025 •

edited

Loading

ferenci84 commented Mar 3, 2025

Symfomany commented Mar 4, 2025

ferenci84 commented Mar 5, 2025

SivasankarMathiyazhagan commented Mar 7, 2025

ferenci84 commented Mar 7, 2025

sestinj commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025 •

edited

Loading

FallDownTheSystem commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

FallDownTheSystem commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 11, 2025 •

edited

Loading

Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

Conversation

ferenci84 commented Mar 1, 2025 • edited Loading

Description

Checklist

Screenshots

Testing instructions

Next steps

Edit

Additional testing instructions

Edit 2

netlify bot commented Mar 1, 2025 • edited Loading

✅ Deploy Preview for continuedev canceled.

chezsmithy commented Mar 2, 2025 • edited Loading

ferenci84 commented Mar 3, 2025

Symfomany commented Mar 4, 2025

ferenci84 commented Mar 5, 2025

SivasankarMathiyazhagan commented Mar 7, 2025

ferenci84 commented Mar 7, 2025

sestinj commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025 • edited Loading

FallDownTheSystem commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

FallDownTheSystem commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 10, 2025

ferenci84 commented Mar 11, 2025 • edited Loading

ferenci84 commented Mar 1, 2025 •

edited

Loading

netlify bot commented Mar 1, 2025 •

edited

Loading

chezsmithy commented Mar 2, 2025 •

edited

Loading

ferenci84 commented Mar 10, 2025 •

edited

Loading

ferenci84 commented Mar 11, 2025 •

edited

Loading