Skip to content

added Browserbase loader #5248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Browserbase Loader

## Description

[Browserbase](https://browserbase.com) is a serverless platform for running headless browsers, it offers advanced debugging, session recordings, stealth mode, integrated proxies and captcha solving.

## Installation

- Get an API key from [browserbase.com](https://browserbase.com) and set it in environment variables (`BROWSERBASE_API_KEY`).
- Install the [Browserbase SDK](http://github.com/browserbase/js-sdk):

```
npm i @browserbasehq/sdk
```

## Example

Utilize the BrowserbaseLoader as follows to allow your agent to load websites:

```js
import { BrowserbaseLoader } from "langchain/document_loaders/web/browserbase.js";

const loader = new BrowserbaseLoader(["https://example.com"], { textContent: true });
const docs = await loader.load();
```

## Arguments

- `urls`: Required. List of URLs to load.

## Options

- `api_key`: Optional. Specifies Browserbase API key. Defaults is the `BROWSERBASE_API_KEY` environment variable.
- `text_content`: Optional. Load pages as readable text. Default is `False`.
4 changes: 4 additions & 0 deletions langchain/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,10 @@ document_loaders/web/azure_blob_storage_file.cjs
document_loaders/web/azure_blob_storage_file.js
document_loaders/web/azure_blob_storage_file.d.ts
document_loaders/web/azure_blob_storage_file.d.cts
document_loaders/web/browserbase.cjs
document_loaders/web/browserbase.js
document_loaders/web/browserbase.d.ts
document_loaders/web/browserbase.d.cts
document_loaders/web/cheerio.cjs
document_loaders/web/cheerio.js
document_loaders/web/cheerio.d.ts
Expand Down
3 changes: 3 additions & 0 deletions langchain/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,8 @@ export const config = {
"document_loaders/web/azure_blob_storage_container",
"document_loaders/web/azure_blob_storage_file":
"document_loaders/web/azure_blob_storage_file",
"document_loaders/web/browserbase":
"document_loaders/web/browserbase",
"document_loaders/web/cheerio": "document_loaders/web/cheerio",
"document_loaders/web/puppeteer": "document_loaders/web/puppeteer",
"document_loaders/web/playwright": "document_loaders/web/playwright",
Expand Down Expand Up @@ -629,6 +631,7 @@ export const config = {
"document_loaders/web/assemblyai",
"document_loaders/web/azure_blob_storage_container",
"document_loaders/web/azure_blob_storage_file",
"document_loaders/web/browserbase",
"document_loaders/web/cheerio",
"document_loaders/web/puppeteer",
"document_loaders/web/playwright",
Expand Down
18 changes: 18 additions & 0 deletions langchain/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,10 @@
"document_loaders/web/azure_blob_storage_file.js",
"document_loaders/web/azure_blob_storage_file.d.ts",
"document_loaders/web/azure_blob_storage_file.d.cts",
"document_loaders/web/browserbase.cjs",
"document_loaders/web/browserbase.js",
"document_loaders/web/browserbase.d.ts",
"document_loaders/web/browserbase.d.cts",
"document_loaders/web/cheerio.cjs",
"document_loaders/web/cheerio.js",
"document_loaders/web/cheerio.d.ts",
Expand Down Expand Up @@ -1222,6 +1226,7 @@
"@aws-sdk/credential-provider-node": "^3.388.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! I noticed that the addition of "@browserbasehq/sdk" in the package.json file introduces a new dependency to the project. This change is flagged for maintainers to review the impact on peer/dev/hard dependencies. Keep up the great work!

"@aws-sdk/types": "^3.357.0",
"@azure/storage-blob": "^12.15.0",
"@browserbasehq/sdk": "^1.0.0",
"@cloudflare/workers-types": "^4.20230922.0",
"@faker-js/faker": "^7.6.0",
"@gomomento/sdk": "^1.51.1",
Expand Down Expand Up @@ -1309,6 +1314,7 @@
"@aws-sdk/client-sfn": "^3.310.0",
"@aws-sdk/credential-provider-node": "^3.388.0",
"@azure/storage-blob": "^12.15.0",
"@browserbasehq/sdk": "*",
"@gomomento/sdk": "^1.51.1",
"@gomomento/sdk-core": "^1.51.1",
"@gomomento/sdk-web": "^1.51.1",
Expand Down Expand Up @@ -1371,6 +1377,9 @@
"@azure/storage-blob": {
"optional": true
},
"@browserbasehq/sdk": {
"optional": true
},
"@gomomento/sdk": {
"optional": true
},
Expand Down Expand Up @@ -2754,6 +2763,15 @@
"import": "./document_loaders/web/azure_blob_storage_file.js",
"require": "./document_loaders/web/azure_blob_storage_file.cjs"
},
"./document_loaders/web/browserbase": {
"types": {
"import": "./document_loaders/web/browserbase.d.ts",
"require": "./document_loaders/web/browserbase.d.cts",
"default": "./document_loaders/web/browserbase.d.ts"
},
"import": "./document_loaders/web/browserbase.js",
"require": "./document_loaders/web/browserbase.cjs"
},
"./document_loaders/web/cheerio": {
"types": {
"import": "./document_loaders/web/cheerio.d.ts",
Expand Down
83 changes: 83 additions & 0 deletions langchain/src/document_loaders/web/browserbase.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import { Document } from "@langchain/core/documents";
import type { BrowserbaseLoadOptions } from "@browserbasehq/sdk";
import { BaseDocumentLoader } from "../base.js";
import type { DocumentLoader } from "../base.js";

type BrowserbaseLoaderOptions = BrowserbaseLoadOptions & {
apiKey?: string;
};

/**
* Load pre-rendered web pages using a headless browser hosted on Browserbase.
*
* Depends on `@browserbasehq/sdk` package.
* Get your API key from https://browserbase.com
*
* @param {string[]} urls - The URLs of the web pages to load.
* @param {BrowserbaseLoaderOptions} [options] - Browserbase client options.
*/

export class BrowserbaseLoader
extends BaseDocumentLoader
implements DocumentLoader
{
urls: string[];

options: BrowserbaseLoaderOptions;

constructor(urls: string[], options: BrowserbaseLoaderOptions = {}) {
super();
this.urls = urls;
this.options = options;
}

/**
* Load pages from URLs.
*
* @returns {Promise<Document[]>} - A generator that yields loaded documents.
*/

async load(): Promise<Document[]> {
const documents: Document[] = [];
for await (const doc of this.lazyLoad()) {
documents.push(doc);
}

return documents;
}

/**
* Load pages from URLs.
*
* @returns {Generator<Document>} - A generator that yields loaded documents.
*/
async *lazyLoad() {
const browserbase = await BrowserbaseLoader.imports(this.options.apiKey);
const pages = await browserbase.loadURLs(this.urls, this.options);

let index = 0;
for await (const page of pages) {
yield new Document({
pageContent: page,
metadata: {
url: this.urls[index],
},
});

index += index + 1;
}
}

static async imports(apiKey?: string) {
try {
const { default: Browserbase } = await import("@browserbasehq/sdk");
return new Browserbase(apiKey);
} catch (error) {
throw new Error(
"You must run " +
"`npm install --save @browserbasehq/sdk` " +
"to use the Browserbase loader."
);
}
}
}
1 change: 1 addition & 0 deletions langchain/src/load/import_constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ export const optionalImportEntrypoints: string[] = [
"langchain/document_loaders/web/assemblyai",
"langchain/document_loaders/web/azure_blob_storage_container",
"langchain/document_loaders/web/azure_blob_storage_file",
"langchain/document_loaders/web/browserbase",
"langchain/document_loaders/web/cheerio",
"langchain/document_loaders/web/puppeteer",
"langchain/document_loaders/web/playwright",
Expand Down
47 changes: 45 additions & 2 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -6584,6 +6584,16 @@ __metadata:
languageName: node
linkType: hard

"@browserbasehq/sdk@npm:^1.0.0":
version: 1.0.0
resolution: "@browserbasehq/sdk@npm:1.0.0"
dependencies:
playwright: ^1.43.1
zod: ^3.22.5
checksum: 1aa7d6fd9e7550bdb7fff43a3c858227bcb9fcb26e9c4ee0ae245e36a1fa90d9a378f6937c620ef0bbe8e400710005778d945d180033639afa50a564c975a3ae
languageName: node
linkType: hard

"@chainsafe/is-ip@npm:^2.0.1":
version: 2.0.2
resolution: "@chainsafe/is-ip@npm:2.0.2"
Expand Down Expand Up @@ -22821,7 +22831,7 @@ __metadata:
languageName: node
linkType: hard

"fsevents@npm:^2.3.2, fsevents@npm:~2.3.2":
"fsevents@npm:2.3.2, fsevents@npm:^2.3.2, fsevents@npm:~2.3.2":
version: 2.3.2
resolution: "fsevents@npm:2.3.2"
dependencies:
Expand All @@ -22831,7 +22841,7 @@ __metadata:
languageName: node
linkType: hard

"fsevents@patch:fsevents@^2.3.2#~builtin<compat/fsevents>, fsevents@patch:fsevents@~2.3.2#~builtin<compat/fsevents>":
"fsevents@patch:fsevents@2.3.2#~builtin<compat/fsevents>, fsevents@patch:fsevents@^2.3.2#~builtin<compat/fsevents>, fsevents@patch:fsevents@~2.3.2#~builtin<compat/fsevents>":
version: 2.3.2
resolution: "fsevents@patch:fsevents@npm%3A2.3.2#~builtin<compat/fsevents>::version=2.3.2&hash=df0bf1"
dependencies:
Expand Down Expand Up @@ -26658,6 +26668,7 @@ __metadata:
"@aws-sdk/credential-provider-node": ^3.388.0
"@aws-sdk/types": ^3.357.0
"@azure/storage-blob": ^12.15.0
"@browserbasehq/sdk": ^1.0.0
"@cloudflare/workers-types": ^4.20230922.0
"@faker-js/faker": ^7.6.0
"@gomomento/sdk": ^1.51.1
Expand Down Expand Up @@ -26761,6 +26772,7 @@ __metadata:
"@aws-sdk/client-sfn": ^3.310.0
"@aws-sdk/credential-provider-node": ^3.388.0
"@azure/storage-blob": ^12.15.0
"@browserbasehq/sdk": "*"
"@gomomento/sdk": ^1.51.1
"@gomomento/sdk-core": ^1.51.1
"@gomomento/sdk-web": ^1.51.1
Expand Down Expand Up @@ -30343,6 +30355,15 @@ __metadata:
languageName: node
linkType: hard

"playwright-core@npm:1.43.1":
version: 1.43.1
resolution: "playwright-core@npm:1.43.1"
bin:
playwright-core: cli.js
checksum: 7c96b3a4a4bce2ee22c3cd680c9b0bb9e4bf07ee4b51d1e9a7f47a6489c7b0b960d4b550e530b8f41d1ffeadd26c7c6bb626ae8689dfd90dce1cb8e35ae78ff7
languageName: node
linkType: hard

"playwright@npm:^1.32.1":
version: 1.32.1
resolution: "playwright@npm:1.32.1"
Expand All @@ -30354,6 +30375,21 @@ __metadata:
languageName: node
linkType: hard

"playwright@npm:^1.43.1":
version: 1.43.1
resolution: "playwright@npm:1.43.1"
dependencies:
fsevents: 2.3.2
playwright-core: 1.43.1
dependenciesMeta:
fsevents:
optional: true
bin:
playwright: cli.js
checksum: de9db021f93018a18275bbb5af09ebf1804aa0534f47578b35b440064abc774509740205802824afc94a99fc84dd55ffe9e215718ad3ecc691b251ab3882b096
languageName: node
linkType: hard

"portkey-ai@npm:^0.1.11":
version: 0.1.11
resolution: "portkey-ai@npm:0.1.11"
Expand Down Expand Up @@ -37171,6 +37207,13 @@ __metadata:
languageName: node
linkType: hard

"zod@npm:^3.22.5":
version: 3.23.4
resolution: "zod@npm:3.23.4"
checksum: 58f6e298c51d9ae01a1b1a1692ac7f00774b466d9a287a1ff8d61ff1fbe0ae9b0f050ae1cf1a8f71e4c6ccd0333a3cc340f339360fab5f5046cc954d10525a54
languageName: node
linkType: hard

"zwitch@npm:^1.0.0":
version: 1.0.5
resolution: "zwitch@npm:1.0.5"
Expand Down