Skip to content

Commit 17ed28c

Browse files
Enhance README.md and index.ts with detailed tool descriptions (#61)
- Added comprehensive descriptions for various tools including `scrape`, `map`, `crawl`, `search`, `extract`, `deep_research`, and `generate_llmstxt`. - Included best use cases, common mistakes, prompt examples, usage examples, and return values for each tool. - Updated the `index.ts` file to reflect the new descriptions and ensure consistency with the README.
1 parent f7d5ec7 commit 17ed28c

File tree

2 files changed

+413
-60
lines changed

2 files changed

+413
-60
lines changed

README.md

Lines changed: 226 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -299,12 +299,54 @@ The server utilizes Firecrawl's built-in rate limiting and batch processing capa
299299
- Smart request queuing and throttling
300300
- Automatic retries for transient errors
301301

302+
## How to Choose a Tool
303+
304+
Use this guide to select the right tool for your task:
305+
306+
- **If you know the exact URL(s) you want:**
307+
- For one: use **scrape**
308+
- For many: use **batch_scrape**
309+
- **If you need to discover URLs on a site:** use **map**
310+
- **If you want to search the web for info:** use **search**
311+
- **If you want to extract structured data:** use **extract**
312+
- **If you want to analyze a whole site or section:** use **crawl** (with limits!)
313+
- **If you want to do in-depth research:** use **deep_research**
314+
- **If you want to generate LLMs.txt:** use **generate_llmstxt**
315+
316+
### Quick Reference Table
317+
318+
| Tool | Best for | Returns |
319+
|---------------------|------------------------------------------|-----------------|
320+
| scrape | Single page content | markdown/html |
321+
| batch_scrape | Multiple known URLs | markdown/html[] |
322+
| map | Discovering URLs on a site | URL[] |
323+
| crawl | Multi-page extraction (with limits) | markdown/html[] |
324+
| search | Web search for info | results[] |
325+
| extract | Structured data from pages | JSON |
326+
| deep_research | In-depth, multi-source research | summary, sources|
327+
| generate_llmstxt | LLMs.txt for a domain | text |
328+
302329
## Available Tools
303330

304331
### 1. Scrape Tool (`firecrawl_scrape`)
305332

306333
Scrape content from a single URL with advanced options.
307334

335+
**Best for:**
336+
- Single page content extraction, when you know exactly which page contains the information.
337+
338+
**Not recommended for:**
339+
- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
340+
- When you're unsure which page contains the information (use search)
341+
- When you need structured data (use extract)
342+
343+
**Common mistakes:**
344+
- Using scrape for a list of URLs (use batch_scrape instead).
345+
346+
**Prompt Example:**
347+
> "Get the content of the page at https://example.com."
348+
349+
**Usage Example:**
308350
```json
309351
{
310352
"name": "firecrawl_scrape",
@@ -322,10 +364,27 @@ Scrape content from a single URL with advanced options.
322364
}
323365
```
324366

367+
**Returns:**
368+
- Markdown, HTML, or other formats as specified.
369+
325370
### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
326371

327372
Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
328373

374+
**Best for:**
375+
- Retrieving content from multiple pages, when you know exactly which pages to scrape.
376+
377+
**Not recommended for:**
378+
- Discovering URLs (use map first if you don't know the URLs)
379+
- Scraping a single page (use scrape)
380+
381+
**Common mistakes:**
382+
- Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
383+
384+
**Prompt Example:**
385+
> "Get the content of these three blog posts: [url1, url2, url3]."
386+
387+
**Usage Example:**
329388
```json
330389
{
331390
"name": "firecrawl_batch_scrape",
@@ -339,7 +398,8 @@ Scrape multiple URLs efficiently with built-in rate limiting and parallel proces
339398
}
340399
```
341400

342-
Response includes operation ID for status checking:
401+
**Returns:**
402+
- Response includes operation ID for status checking:
343403

344404
```json
345405
{
@@ -366,15 +426,58 @@ Check the status of a batch operation.
366426
}
367427
```
368428

369-
### 4. Search Tool (`firecrawl_search`)
429+
### 4. Map Tool (`firecrawl_map`)
430+
431+
Map a website to discover all indexed URLs on the site.
432+
433+
**Best for:**
434+
- Discovering URLs on a website before deciding what to scrape
435+
- Finding specific sections of a website
436+
437+
**Not recommended for:**
438+
- When you already know which specific URL you need (use scrape or batch_scrape)
439+
- When you need the content of the pages (use scrape after mapping)
440+
441+
**Common mistakes:**
442+
- Using crawl to discover URLs instead of map
443+
444+
**Prompt Example:**
445+
> "List all URLs on example.com."
446+
447+
**Usage Example:**
448+
```json
449+
{
450+
"name": "firecrawl_map",
451+
"arguments": {
452+
"url": "https://example.com"
453+
}
454+
}
455+
```
456+
457+
**Returns:**
458+
- Array of URLs found on the site
459+
460+
### 5. Search Tool (`firecrawl_search`)
370461

371462
Search the web and optionally extract content from search results.
372463

464+
**Best for:**
465+
- Finding specific information across multiple websites, when you don't know which website has the information.
466+
- When you need the most relevant content for a query
467+
468+
**Not recommended for:**
469+
- When you already know which website to scrape (use scrape)
470+
- When you need comprehensive coverage of a single website (use map or crawl)
471+
472+
**Common mistakes:**
473+
- Using crawl or map for open-ended questions (use search instead)
474+
475+
**Usage Example:**
373476
```json
374477
{
375478
"name": "firecrawl_search",
376479
"arguments": {
377-
"query": "your search query",
480+
"query": "latest AI research papers 2023",
378481
"limit": 5,
379482
"lang": "en",
380483
"country": "us",
@@ -386,15 +489,39 @@ Search the web and optionally extract content from search results.
386489
}
387490
```
388491

389-
### 5. Crawl Tool (`firecrawl_crawl`)
492+
**Returns:**
493+
- Array of search results (with optional scraped content)
494+
495+
**Prompt Example:**
496+
> "Find the latest research papers on AI published in 2023."
497+
498+
### 6. Crawl Tool (`firecrawl_crawl`)
499+
500+
Starts an asynchronous crawl job on a website and extract content from all pages.
501+
502+
**Best for:**
503+
- Extracting content from multiple related pages, when you need comprehensive coverage.
504+
505+
**Not recommended for:**
506+
- Extracting content from a single page (use scrape)
507+
- When token limits are a concern (use map + batch_scrape)
508+
- When you need fast results (crawling can be slow)
509+
510+
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
390511

391-
Start an asynchronous crawl with advanced options.
512+
**Common mistakes:**
513+
- Setting limit or maxDepth too high (causes token overflow)
514+
- Using crawl for a single page (use scrape instead)
392515

516+
**Prompt Example:**
517+
> "Get all blog posts from the first two levels of example.com/blog."
518+
519+
**Usage Example:**
393520
```json
394521
{
395522
"name": "firecrawl_crawl",
396523
"arguments": {
397-
"url": "https://example.com",
524+
"url": "https://example.com/blog/*",
398525
"maxDepth": 2,
399526
"limit": 100,
400527
"allowExternalLinks": false,
@@ -403,10 +530,62 @@ Start an asynchronous crawl with advanced options.
403530
}
404531
```
405532

406-
### 6. Extract Tool (`firecrawl_extract`)
533+
**Returns:**
534+
- Response includes operation ID for status checking:
535+
536+
```json
537+
{
538+
"content": [
539+
{
540+
"type": "text",
541+
"text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress."
542+
}
543+
],
544+
"isError": false
545+
}
546+
```
547+
548+
### 7. Check Crawl Status (`firecrawl_check_crawl_status`)
549+
550+
Check the status of a crawl job.
551+
552+
```json
553+
{
554+
"name": "firecrawl_check_crawl_status",
555+
"arguments": {
556+
"id": "550e8400-e29b-41d4-a716-446655440000"
557+
}
558+
}
559+
```
560+
561+
**Returns:**
562+
- Response includes the status of the crawl job:
563+
564+
### 8. Extract Tool (`firecrawl_extract`)
407565

408566
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
409567

568+
**Best for:**
569+
- Extracting specific structured data like prices, names, details.
570+
571+
**Not recommended for:**
572+
- When you need the full content of a page (use scrape)
573+
- When you're not looking for specific structured data
574+
575+
**Arguments:**
576+
- `urls`: Array of URLs to extract information from
577+
- `prompt`: Custom prompt for the LLM extraction
578+
- `systemPrompt`: System prompt to guide the LLM
579+
- `schema`: JSON schema for structured data extraction
580+
- `allowExternalLinks`: Allow extraction from external links
581+
- `enableWebSearch`: Enable web search for additional context
582+
- `includeSubdomains`: Include subdomains in extraction
583+
584+
When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
585+
**Prompt Example:**
586+
> "Extract the product name, price, and description from these product pages."
587+
588+
**Usage Example:**
410589
```json
411590
{
412591
"name": "firecrawl_extract",
@@ -430,7 +609,8 @@ Extract structured information from web pages using LLM capabilities. Supports b
430609
}
431610
```
432611

433-
Example response:
612+
**Returns:**
613+
- Extracted structured data as defined by your schema
434614

435615
```json
436616
{
@@ -448,50 +628,64 @@ Example response:
448628
}
449629
```
450630

451-
#### Extract Tool Options:
631+
### 9. Deep Research Tool (`firecrawl_deep_research`)
452632

453-
- `urls`: Array of URLs to extract information from
454-
- `prompt`: Custom prompt for the LLM extraction
455-
- `systemPrompt`: System prompt to guide the LLM
456-
- `schema`: JSON schema for structured data extraction
457-
- `allowExternalLinks`: Allow extraction from external links
458-
- `enableWebSearch`: Enable web search for additional context
459-
- `includeSubdomains`: Include subdomains in extraction
633+
Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.
460634

461-
When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
635+
**Best for:**
636+
- Complex research questions requiring multiple sources, in-depth analysis.
462637

463-
### 7. Deep Research Tool (firecrawl_deep_research)
638+
**Not recommended for:**
639+
- Simple questions that can be answered with a single search
640+
- When you need very specific information from a known page (use scrape)
641+
- When you need results quickly (deep research can take time)
464642

465-
Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.
643+
**Arguments:**
644+
- query (string, required): The research question or topic to explore.
645+
- maxDepth (number, optional): Maximum recursive depth for crawling/search (default: 3).
646+
- timeLimit (number, optional): Time limit in seconds for the research session (default: 120).
647+
- maxUrls (number, optional): Maximum number of URLs to analyze (default: 50).
648+
649+
**Prompt Example:**
650+
> "Research the environmental impact of electric vehicles versus gasoline vehicles."
466651
652+
**Usage Example:**
467653
```json
468654
{
469655
"name": "firecrawl_deep_research",
470656
"arguments": {
471-
"query": "how does carbon capture technology work?",
657+
"query": "What are the environmental impacts of electric vehicles compared to gasoline vehicles?",
472658
"maxDepth": 3,
473659
"timeLimit": 120,
474660
"maxUrls": 50
475661
}
476662
}
477663
```
478664

479-
Arguments:
665+
**Returns:**
666+
- Final analysis generated by an LLM based on research. (data.finalAnalysis)
667+
- May also include structured activities and sources used in the research process.
480668

481-
- query (string, required): The research question or topic to explore.
482-
- maxDepth (number, optional): Maximum recursive depth for crawling/search (default: 3).
483-
- timeLimit (number, optional): Time limit in seconds for the research session (default: 120).
484-
- maxUrls (number, optional): Maximum number of URLs to analyze (default: 50).
669+
### 10. Generate LLMs.txt Tool (`firecrawl_generate_llmstxt`)
485670

486-
Returns:
671+
Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact
672+
with the site.
487673

488-
- Final analysis generated by an LLM based on research. (data.finalAnalysis)
489-
- May also include structured activities and sources used in the research process.
674+
**Best for:**
675+
- Creating machine-readable permission guidelines for AI models.
676+
677+
**Not recommended for:**
678+
- General content extraction or research
490679

491-
### 8. Generate LLMs.txt Tool (firecrawl_generate_llmstxt)
680+
**Arguments:**
681+
- url (string, required): The base URL of the website to analyze.
682+
- maxUrls (number, optional): Max number of URLs to include (default: 10).
683+
- showFullText (boolean, optional): Whether to include llms-full.txt contents in the response.
492684

493-
Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact with the site.
685+
**Prompt Example:**
686+
> "Generate an LLMs.txt file for example.com."
494687
688+
**Usage Example:**
495689
```json
496690
{
497691
"name": "firecrawl_generate_llmstxt",
@@ -503,15 +697,8 @@ Generate a standardized llms.txt (and optionally llms-full.txt) file for a given
503697
}
504698
```
505699

506-
Arguments:
507-
508-
- url (string, required): The base URL of the website to analyze.
509-
- maxUrls (number, optional): Max number of URLs to include (default: 10).
510-
- showFullText (boolean, optional): Whether to include llms-full.txt contents in the response.
511-
512-
Returns:
513-
514-
- Generated llms.txt file contents and optionally the llms-full.txt (data.llmstxt and/or data.llmsfulltxt)
700+
**Returns:**
701+
- LLMs.txt file contents (and optionally llms-full.txt)
515702

516703
## Logging System
517704

0 commit comments

Comments
 (0)