|
3 | 3 | :width: 50%
|
4 | 4 | :alt: ScrapegraphAI
|
5 | 5 |
|
6 |
| -Overview |
| 6 | +Overview |
7 | 7 | ========
|
8 | 8 |
|
9 | 9 | ScrapeGraphAI is an **open-source** Python library designed to revolutionize **scraping** tools.
|
10 |
| -In today's data-intensive digital landscape, this library stands out by integrating **Large Language Models** (LLMs) |
| 10 | +In today's data-intensive digital landscape, this library stands out by integrating **Large Language Models** (LLMs) |
11 | 11 | and modular **graph-based** pipelines to automate the scraping of data from various sources (e.g., websites, local files etc.).
|
12 | 12 |
|
13 | 13 | Simply specify the information you need to extract, and ScrapeGraphAI handles the rest, providing a more **flexible** and **low-maintenance** solution compared to traditional scraping tools.
|
14 | 14 |
|
15 | 15 | For comprehensive documentation and updates, visit our `website <https://scrapegraphai.com>`_.
|
16 | 16 |
|
17 |
| -Key Features |
18 |
| ------------ |
19 |
| - |
20 |
| -* **Just One Prompt Away**: Transform any website into clean, organized data for AI agents and Data Analytics |
21 |
| -* **Save Time**: No more writing complex code or dealing with manual extraction |
22 |
| -* **Save Money**: High-quality data extraction at a fraction of the cost of traditional scraping services |
23 |
| -* **AI Powered**: State-of-the-art AI technologies for fast, accurate, and dependable results |
24 |
| - |
25 |
| -Community Impact |
26 |
| --------------- |
27 |
| - |
28 |
| -Our open-source technology is continuously enhanced by a global community of developers: |
29 |
| - |
30 |
| -* **+17K** stars on Github |
31 |
| -* **7,000,000+** extracted webpages |
32 |
| -* **250k+** unique users |
33 |
| - |
34 |
| -Services |
35 |
| --------- |
36 |
| - |
37 |
| -* **Markdownify**: Convert webpage to markdown format (2 credits/page) |
38 |
| -* **Smart Scraper**: Structured AI web scraping given a URL (5 credits/page) |
39 |
| -* **Local Scraper**: Structured AI scraping given your local HTML (10 credits/page) |
40 | 17 |
|
41 | 18 | Why ScrapegraphAI?
|
42 | 19 | ==================
|
43 | 20 |
|
44 | 21 | Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages.
|
45 |
| -ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention. |
| 22 | +ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention. |
46 | 23 | This flexibility ensures that scrapers remain functional even when website layouts change.
|
47 | 24 |
|
48 | 25 | We support many LLMs including **GPT, Gemini, Groq, Azure, Hugging Face** etc.
|
@@ -187,13 +164,13 @@ FAQ
|
187 | 164 | - Check your internet connection. Low speed or unstable connection can cause the HTML to not load properly.
|
188 | 165 |
|
189 | 166 | - Try using a proxy server to mask your IP address. Check out the :ref:`Proxy` section for more information on how to configure proxy settings.
|
190 |
| - |
| 167 | + |
191 | 168 | - Use a different LLM model. Some models might perform better on certain websites than others.
|
192 | 169 |
|
193 | 170 | - Set the `verbose` parameter to `True` in the graph_config to see more detailed logs.
|
194 | 171 |
|
195 | 172 | - Visualize the pipeline graphically using :ref:`Burr`.
|
196 |
| - |
| 173 | + |
197 | 174 | If the issue persists, please report it on the GitHub repository.
|
198 | 175 |
|
199 | 176 | 6. **How does ScrapeGraphAI handle the context window limit of LLMs?**
|
@@ -226,3 +203,8 @@ Sponsors
|
226 | 203 | :width: 11%
|
227 | 204 | :alt: Scrapedo
|
228 | 205 | :target: https://scrape.do
|
| 206 | + |
| 207 | +.. image:: ../../assets/scrapegraph_logo.png |
| 208 | + :width: 11% |
| 209 | + :alt: ScrapegraphAI |
| 210 | + :target: https://scrapegraphai.com |
0 commit comments