docs[patch]: Add logprob docs, more updates (#5404)

jacoblee93 · web-flow · commit b414e395a1d9 · 2024-05-15T16:26:45.000-07:00
* Add logprob docs, more updates

* More content updates
diff --git a/docs/core_docs/docs/how_to/chat_model_caching.mdx b/docs/core_docs/docs/how_to/chat_model_caching.mdx
@@ -4,15 +4,20 @@ sidebar_position: 3
 
 # How to cache chat model responses
 
+:::info Prerequisites
+
+This guide assumes familiarity with the following concepts:
+
+- [Chat models](/docs/concepts/#chat-models)
+- [LLMs](/docs/concepts/#llms)
+
+:::
+
 LangChain provides an optional caching layer for chat models. This is useful for two reasons:
 
 It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
 It can speed up your application by reducing the number of API calls you make to the LLM provider.
 
-import UnifiedModelParamsTooltip from "@mdx_components/unified_model_params_tooltip.mdx";
-
-<UnifiedModelParamsTooltip></UnifiedModelParamsTooltip>
-
 import CodeBlock from "@theme/CodeBlock";
 
 ```typescript
@@ -84,7 +89,7 @@ LangChain also provides a Redis-based cache. This is useful if you want to share
 To use it, you'll need to install the `redis` package:
 
 ```bash npm2yarn
-npm install ioredis
+npm install ioredis @langchain/community
 ```
 
 Then, you can pass a `cache` option when you instantiate the LLM. For example:
@@ -105,3 +110,9 @@ By default the cache is stored a temporary directory, but you can specify a cust
 ```typescript
 const cache = await LocalFileCache.create();
 ```
+
+## Next steps
+
+You've now learned how to cache model responses to save time and money.
+
+Next, check out the other how-to guides on chat models, like [how to get a model to return structured output](/docs/how_to/structured_output) or [how to create your own custom chat model](/docs/how_to/custom_chat_model).
diff --git a/docs/core_docs/docs/how_to/chat_streaming.ipynb b/docs/core_docs/docs/how_to/chat_streaming.ipynb
@@ -17,10 +17,9 @@
    "source": [
     "# How to stream chat model responses\n",
     "\n",
-    "\n",
     "All [chat models](https://api.js.langchain.com/classes/langchain_core_language_models_chat_models.BaseChatModel.html) implement the [Runnable interface](https://api.js.langchain.com/classes/langchain_core_runnables.Runnable.html), which comes with a **default** implementations of standard runnable methods (i.e. `invoke`, `batch`, `stream`, `streamEvents`).\n",
     "\n",
-    "The **default** streaming implementation provides an `AsyncIterator` that yields a single value: the final output from the underlying chat model provider.\n",
+    "The **default** streaming implementation provides an `AsyncGenerator` that yields a single value: the final output from the underlying chat model provider.\n",
     "\n",
     ":::{.callout-tip}\n",
     "\n",
@@ -40,7 +39,7 @@
    "source": [
     "## Streaming\n",
     "\n",
-    "Below we use a `---` to help visualize the delimiter between tokens."
+    "Below, we use a `---` to help visualize the delimiter between tokens."
    ]
   },
   {
@@ -221,9 +220,9 @@
    "source": [
     "## Stream events\n",
     "\n",
-    "Chat models also support the standard [astream events](https://api.js.langchain.com/classes/langchain_core_runnables.Runnable.html#streamEvents) method.\n",
+    "Chat models also support the standard [streamEvents()](https://api.js.langchain.com/classes/langchain_core_runnables.Runnable.html#streamEvents) method.\n",
     "\n",
-    "This method is useful if you're streaming output from a larger LLM application that contains multiple steps (e.g., an LLM chain composed of a prompt, llm and parser)."
+    "This method is useful if you're streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser)."
    ]
   },
   {
@@ -358,6 +357,18 @@
     "    console.log(event)\n",
     "}"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abacb301",
+   "metadata": {},
+   "source": [
+    "## Next steps\n",
+    "\n",
+    "You've now seen a few ways you can stream chat model responses.\n",
+    "\n",
+    "Next, check out this guide for more on [streaming with other LangChain modules](/docs/how_to/streaming)."
+   ]
   }
  ],
  "metadata": {
diff --git a/docs/core_docs/docs/how_to/chat_token_usage_tracking.mdx b/docs/core_docs/docs/how_to/chat_token_usage_tracking.mdx
@@ -4,11 +4,19 @@ sidebar_position: 5
 
 # How to track token usage
 
+:::info Prerequisites
+
+This guide assumes familiarity with the following concepts:
+
+- [Chat models](/docs/concepts/#chat-models)
+
+:::
+
 This notebook goes over how to track your token usage for specific calls.
 
-## Using AIMessage.response_metadata
+## Using `AIMessage.response_metadata`
 
-A number of model providers return token usage information as part of the chat generation response. When available, this is included in the [AIMessage.response_metadata](/docs/modules/model_io/chat/response_metadata/) field.
+A number of model providers return token usage information as part of the chat generation response. When available, this is included in the [`AIMessage.response_metadata`](/docs/modules/model_io/chat/response_metadata/) field.
 Here's an example with OpenAI:
 
 import CodeBlock from "@theme/CodeBlock";
@@ -42,3 +50,9 @@ Here's an example of how you could do that:
 import CallbackExample from "@examples/models/chat/token_usage_tracking_callback.ts";
 
 <CodeBlock language="typescript">{CallbackExample}</CodeBlock>
+
+## Next steps
+
+You've now seen a few examples of how to track chat model token usage for supported providers.
+
+Next, check out the other how-to guides on chat models in this section, like [how to get a model to return structured output](/docs/how_to/structured_output) or [how to add caching to your chat models](/docs/how_to/chat_model_caching).
diff --git a/docs/core_docs/docs/how_to/custom_chat.mdx b/docs/core_docs/docs/how_to/custom_chat.mdx
@@ -4,6 +4,14 @@ sidebar_position: 4
 
 # How to create a custom chat model class
 
+:::info Prerequisites
+
+This guide assumes familiarity with the following concepts:
+
+- [Chat models](/docs/concepts/#chat-models)
+
+:::
+
 This notebook goes over how to create a custom chat model wrapper, in case you want to use your own chat model or a different wrapper than one that is directly supported in LangChain.
 
 There are a few required things that a chat model needs to implement after extending the [`SimpleChatModel` class](https://api.js.langchain.com/classes/langchain_core_language_models_chat_models.SimpleChatModel.html):
diff --git a/docs/core_docs/docs/how_to/custom_llm.mdx b/docs/core_docs/docs/how_to/custom_llm.mdx
@@ -4,6 +4,14 @@ sidebar_position: 3
 
 # How to create a custom LLM class
 
+:::info Prerequisites
+
+This guide assumes familiarity with the following concepts:
+
+- [LLMs](/docs/concepts/#llms)
+
+:::
+
 This notebook goes over how to create a custom LLM wrapper, in case you want to use your own LLM or a different wrapper than one that is directly supported in LangChain.
 
 There are a few required things that a custom LLM needs to implement after extending the [`LLM` class](https://api.js.langchain.com/classes/langchain_core_language_models_llms.LLM.html):
diff --git a/docs/core_docs/docs/how_to/llm_caching.mdx b/docs/core_docs/docs/how_to/llm_caching.mdx
@@ -193,3 +193,9 @@ By default the cache is stored a temporary directory, but you can specify a cust
 ```typescript
 const cache = await LocalFileCache.create();
 ```
+
+## Next steps
+
+You've now learned how to cache model responses to save time and money.
+
+Next, check out the other how-to guides on LLMs, like [how to create your own custom LLM class](/docs/how_to/custom_llm).
diff --git a/docs/core_docs/docs/how_to/llm_token_usage_tracking.mdx b/docs/core_docs/docs/how_to/llm_token_usage_tracking.mdx
@@ -4,12 +4,20 @@ sidebar_position: 5
 
 # How to track token usage
 
-This notebook goes over how to track your token usage for specific calls. This is currently only implemented for the OpenAI API.
+:::info Prerequisites
 
-Here's an example of tracking token usage for a single LLM call:
+This guide assumes familiarity with the following concepts:
+
+- [LLMs](/docs/concepts/#llms)
+
+:::
+
+This notebook goes over how to track your token usage for specific LLM calls. This is only implemented by some providers, including OpenAI.
+
+Here's an example of tracking token usage for a single LLM call via a callback:
 
 import CodeBlock from "@theme/CodeBlock";
-import Example from "@examples/models/chat/token_usage_tracking.ts";
+import Example from "@examples/models/llm/token_usage_tracking.ts";
 
 import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
 
@@ -22,3 +30,9 @@ npm install @langchain/openai
 <CodeBlock language="typescript">{Example}</CodeBlock>
 
 If this model is passed to a chain or agent that calls it multiple times, it will log an output each time.
+
+## Next steps
+
+You've now seen how to get token usage for supported LLM providers.
+
+Next, check out the other how-to guides in this section, like [how to implement your own custom LLM](/docs/how_to/custom_llm).
diff --git a/docs/core_docs/docs/how_to/logprobs.ipynb b/docs/core_docs/docs/how_to/logprobs.ipynb
diff --git a/docs/core_docs/docs/how_to/streaming_llm.mdx b/docs/core_docs/docs/how_to/streaming_llm.mdx