Skip to content

Commit 28ea45b

Browse files
authored
docs: adds ecosystem docs for UnstructuredLoader (#653)
* added stay tuned to the end * linting, linting, linting * more linting * update import statement
1 parent c890e12 commit 28ea45b

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed

docs/docs/ecosystem/unstructured.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Unstructured
2+
3+
This page covers how to use [Unstructured](https://unstructured.io) within LangChain.
4+
5+
## What is Unstructured?
6+
7+
Unstructured is an [open source](https://github.com/Unstructured-IO/unstructured) Python package
8+
for extracting text from raw documents for use in machine learning applications. Currently,
9+
Unstructured supports partitioning Word documents (in `.doc` or `.docx` format),
10+
PowerPoints (in `.ppt` or `.pptx` format), PDFs, HTML files, images,
11+
emails (in `.eml` or `.msg` format), epubs, markdown, and plain text files.
12+
`unstructured` is a Python package and cannot be used directly with TS/JS, Unstructured
13+
also maintains a [REST API](https://github.com/Unstructured-IO/unstructured-api) to support
14+
pre-processing pipelines written in other programming languages. The endpoint for the
15+
hosted Unstructured API is `https://api.unstructured.io/general/v0/general`, or you can run
16+
the service locally using the instructions found
17+
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image).
18+
19+
## Quick start
20+
21+
You can use Unstructured in`langchainjs` with the following code.
22+
Replace the filename with the file you would like to process.
23+
If you are running the container locally, switch the url to
24+
`https://api.unstructured.io/general/v0/general`.
25+
26+
```typescript
27+
import { UnstructuredLoader } from "langchain/document_loader";
28+
29+
const loader = new UnstructuredLoader(
30+
"https://api.unstructured.io/general/v0/general",
31+
"langchain/src/document_loaders/tests/example_data/example.txt"
32+
);
33+
const docs = await loader.load();
34+
```
35+
36+
Stayed tuned for future updates, including functionality equivalent to
37+
`UnstructuredDirectoryLoader` in `langchain`!.

0 commit comments

Comments
 (0)