|
| 1 | +# Unstructured |
| 2 | + |
| 3 | +This page covers how to use [Unstructured](https://unstructured.io) within LangChain. |
| 4 | + |
| 5 | +## What is Unstructured? |
| 6 | + |
| 7 | +Unstructured is an [open source](https://github.com/Unstructured-IO/unstructured) Python package |
| 8 | +for extracting text from raw documents for use in machine learning applications. Currently, |
| 9 | +Unstructured supports partitioning Word documents (in `.doc` or `.docx` format), |
| 10 | +PowerPoints (in `.ppt` or `.pptx` format), PDFs, HTML files, images, |
| 11 | +emails (in `.eml` or `.msg` format), epubs, markdown, and plain text files. |
| 12 | +`unstructured` is a Python package and cannot be used directly with TS/JS, Unstructured |
| 13 | +also maintains a [REST API](https://github.com/Unstructured-IO/unstructured-api) to support |
| 14 | +pre-processing pipelines written in other programming languages. The endpoint for the |
| 15 | +hosted Unstructured API is `https://api.unstructured.io/general/v0/general`, or you can run |
| 16 | +the service locally using the instructions found |
| 17 | +[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image). |
| 18 | + |
| 19 | +## Quick start |
| 20 | + |
| 21 | +You can use Unstructured in`langchainjs` with the following code. |
| 22 | +Replace the filename with the file you would like to process. |
| 23 | +If you are running the container locally, switch the url to |
| 24 | +`https://api.unstructured.io/general/v0/general`. |
| 25 | + |
| 26 | +```typescript |
| 27 | +import { UnstructuredLoader } from "langchain/document_loader"; |
| 28 | + |
| 29 | +const loader = new UnstructuredLoader( |
| 30 | + "https://api.unstructured.io/general/v0/general", |
| 31 | + "langchain/src/document_loaders/tests/example_data/example.txt" |
| 32 | +); |
| 33 | +const docs = await loader.load(); |
| 34 | +``` |
| 35 | + |
| 36 | +Stayed tuned for future updates, including functionality equivalent to |
| 37 | +`UnstructuredDirectoryLoader` in `langchain`!. |
0 commit comments