mendableai
diff --git a/‎README.md
Lines changed: 226 additions & 39 deletions b/‎README.md
Lines changed: 226 additions & 39 deletions
@@ -299,12 +299,54 @@ The server utilizes Firecrawl's built-in rate limiting and batch processing capa
 - Smart request queuing and throttling
 - Automatic retries for transient errors
 
+## How to Choose a Tool
+
+Use this guide to select the right tool for your task:
+
+- **If you know the exact URL(s) you want:**
+  - For one: use **scrape**
+  - For many: use **batch_scrape**
+- **If you need to discover URLs on a site:** use **map**
+- **If you want to search the web for info:** use **search**
+- **If you want to extract structured data:** use **extract**
+- **If you want to analyze a whole site or section:** use **crawl** (with limits!)
+- **If you want to do in-depth research:** use **deep_research**
+- **If you want to generate LLMs.txt:** use **generate_llmstxt**
+
+### Quick Reference Table
+
+| Tool                | Best for                                 | Returns         |
+|---------------------|------------------------------------------|-----------------|
+| scrape              | Single page content                      | markdown/html   |
+| batch_scrape        | Multiple known URLs                      | markdown/html[] |
+| map                 | Discovering URLs on a site               | URL[]           |
+| crawl               | Multi-page extraction (with limits)      | markdown/html[] |
+| search              | Web search for info                      | results[]       |
+| extract             | Structured data from pages               | JSON            |
+| deep_research       | In-depth, multi-source research          | summary, sources|
+| generate_llmstxt    | LLMs.txt for a domain                    | text            |
+
 ## Available Tools
 
 ### 1. Scrape Tool (`firecrawl_scrape`)
 
 Scrape content from a single URL with advanced options.
 
+**Best for:**
+- Single page content extraction, when you know exactly which page contains the information.
+
+**Not recommended for:**
+- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
+- When you're unsure which page contains the information (use search)
+- When you need structured data (use extract)
+
+**Common mistakes:**
+- Using scrape for a list of URLs (use batch_scrape instead).
+
+**Prompt Example:**
+> "Get the content of the page at https://example.com."
+
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_scrape",
@@ -322,10 +364,27 @@ Scrape content from a single URL with advanced options.
 }
 ```
 
+**Returns:**
+- Markdown, HTML, or other formats as specified.
+
 ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
 
 Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
 
+**Best for:**
+- Retrieving content from multiple pages, when you know exactly which pages to scrape.
+
+**Not recommended for:**
+- Discovering URLs (use map first if you don't know the URLs)
+- Scraping a single page (use scrape)
+
+**Common mistakes:**
+- Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
+
+**Prompt Example:**
+> "Get the content of these three blog posts: [url1, url2, url3]."
+
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_batch_scrape",
@@ -339,7 +398,8 @@ Scrape multiple URLs efficiently with built-in rate limiting and parallel proces
 }
 ```
 
-Response includes operation ID for status checking:
+**Returns:**
+- Response includes operation ID for status checking:
 
 ```json
 {
@@ -366,15 +426,58 @@ Check the status of a batch operation.
 }
 ```
 
-### 4. Search Tool (`firecrawl_search`)
+### 4. Map Tool (`firecrawl_map`)
+
+Map a website to discover all indexed URLs on the site.
+
+**Best for:**
+- Discovering URLs on a website before deciding what to scrape
+- Finding specific sections of a website
+
+**Not recommended for:**
+- When you already know which specific URL you need (use scrape or batch_scrape)
+- When you need the content of the pages (use scrape after mapping)
+
+**Common mistakes:**
+- Using crawl to discover URLs instead of map
+
+**Prompt Example:**
+> "List all URLs on example.com."
+
+**Usage Example:**
+```json
+{
+  "name": "firecrawl_map",
+  "arguments": {
+    "url": "https://example.com"
+  }
+}
+```
+
+**Returns:**
+- Array of URLs found on the site
+
+### 5. Search Tool (`firecrawl_search`)
 
 Search the web and optionally extract content from search results.
 
+**Best for:**
+- Finding specific information across multiple websites, when you don't know which website has the information.
+- When you need the most relevant content for a query
+
+**Not recommended for:**
+- When you already know which website to scrape (use scrape)
+- When you need comprehensive coverage of a single website (use map or crawl)
+
+**Common mistakes:**
+- Using crawl or map for open-ended questions (use search instead)
+
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_search",
   "arguments": {
-    "query": "your search query",
+    "query": "latest AI research papers 2023",
     "limit": 5,
     "lang": "en",
     "country": "us",
@@ -386,15 +489,39 @@ Search the web and optionally extract content from search results.
 }
 ```
 
-### 5. Crawl Tool (`firecrawl_crawl`)
+**Returns:**
+- Array of search results (with optional scraped content)
+
+**Prompt Example:**
+> "Find the latest research papers on AI published in 2023."
+
+### 6. Crawl Tool (`firecrawl_crawl`)
+
+Starts an asynchronous crawl job on a website and extract content from all pages.
+
+**Best for:**
+- Extracting content from multiple related pages, when you need comprehensive coverage.
+
+**Not recommended for:**
+- Extracting content from a single page (use scrape)
+- When token limits are a concern (use map + batch_scrape)
+- When you need fast results (crawling can be slow)
+
+**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
 
-Start an asynchronous crawl with advanced options.
+**Common mistakes:**
+- Setting limit or maxDepth too high (causes token overflow)
+- Using crawl for a single page (use scrape instead)
 
+**Prompt Example:**
+> "Get all blog posts from the first two levels of example.com/blog."
+
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_crawl",
   "arguments": {
-    "url": "https://example.com",
+    "url": "https://example.com/blog/*",
     "maxDepth": 2,
     "limit": 100,
     "allowExternalLinks": false,
@@ -403,10 +530,62 @@ Start an asynchronous crawl with advanced options.
 }
 ```
 
-### 6. Extract Tool (`firecrawl_extract`)
+**Returns:**
+- Response includes operation ID for status checking:
+
+```json
+{
+  "content": [
+    {
+      "type": "text",
+      "text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress."
+    }
+  ],
+  "isError": false
+}
+```
+
+### 7. Check Crawl Status (`firecrawl_check_crawl_status`)
+
+Check the status of a crawl job.
+
+```json
+{
+  "name": "firecrawl_check_crawl_status",
+  "arguments": {
+    "id": "550e8400-e29b-41d4-a716-446655440000"
+  }
+}
+```
+
+**Returns:**
+- Response includes the status of the crawl job:
+  
+### 8. Extract Tool (`firecrawl_extract`)
 
 Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
 
+**Best for:**
+- Extracting specific structured data like prices, names, details.
+
+**Not recommended for:**
+- When you need the full content of a page (use scrape)
+- When you're not looking for specific structured data
+
+**Arguments:**
+- `urls`: Array of URLs to extract information from
+- `prompt`: Custom prompt for the LLM extraction
+- `systemPrompt`: System prompt to guide the LLM
+- `schema`: JSON schema for structured data extraction
+- `allowExternalLinks`: Allow extraction from external links
+- `enableWebSearch`: Enable web search for additional context
+- `includeSubdomains`: Include subdomains in extraction
+
+When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
+**Prompt Example:**
+> "Extract the product name, price, and description from these product pages."
+
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_extract",
@@ -430,7 +609,8 @@ Extract structured information from web pages using LLM capabilities. Supports b
 }
 ```
 
-Example response:
+**Returns:**
+- Extracted structured data as defined by your schema
 
 ```json
 {
@@ -448,50 +628,64 @@ Example response:
 }
 ```
 
-#### Extract Tool Options:
+### 9. Deep Research Tool (`firecrawl_deep_research`)
 
-- `urls`: Array of URLs to extract information from
-- `prompt`: Custom prompt for the LLM extraction
-- `systemPrompt`: System prompt to guide the LLM
-- `schema`: JSON schema for structured data extraction
-- `allowExternalLinks`: Allow extraction from external links
-- `enableWebSearch`: Enable web search for additional context
-- `includeSubdomains`: Include subdomains in extraction
+Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.
 
-When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
+**Best for:**
+- Complex research questions requiring multiple sources, in-depth analysis.
 
-### 7. Deep Research Tool (firecrawl_deep_research)
+**Not recommended for:**
+- Simple questions that can be answered with a single search
+- When you need very specific information from a known page (use scrape)
+- When you need results quickly (deep research can take time)
 
-Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.
+**Arguments:**
+- query (string, required): The research question or topic to explore.
+- maxDepth (number, optional): Maximum recursive depth for crawling/search (default: 3).
+- timeLimit (number, optional): Time limit in seconds for the research session (default: 120).
+- maxUrls (number, optional): Maximum number of URLs to analyze (default: 50).
+
+**Prompt Example:**
+> "Research the environmental impact of electric vehicles versus gasoline vehicles."
 
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_deep_research",
   "arguments": {
-    "query": "how does carbon capture technology work?",
+    "query": "What are the environmental impacts of electric vehicles compared to gasoline vehicles?",
     "maxDepth": 3,
     "timeLimit": 120,
     "maxUrls": 50
   }
 }
 ```
 
-Arguments:
+**Returns:**
+- Final analysis generated by an LLM based on research. (data.finalAnalysis)
+- May also include structured activities and sources used in the research process.
 
-- query (string, required): The research question or topic to explore.
-- maxDepth (number, optional): Maximum recursive depth for crawling/search (default: 3).
-- timeLimit (number, optional): Time limit in seconds for the research session (default: 120).
-- maxUrls (number, optional): Maximum number of URLs to analyze (default: 50).
+### 10. Generate LLMs.txt Tool (`firecrawl_generate_llmstxt`)
 
-Returns:
+Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact 
+with the site.
 
-- Final analysis generated by an LLM based on research. (data.finalAnalysis)
-- May also include structured activities and sources used in the research process.
+**Best for:**
+- Creating machine-readable permission guidelines for AI models.
+
+**Not recommended for:**
+- General content extraction or research
 
-### 8. Generate LLMs.txt Tool (firecrawl_generate_llmstxt)
+**Arguments:**
+- url (string, required): The base URL of the website to analyze.
+- maxUrls (number, optional): Max number of URLs to include (default: 10).
+- showFullText (boolean, optional): Whether to include llms-full.txt contents in the response.
 
-Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact with the site.
+**Prompt Example:**
+> "Generate an LLMs.txt file for example.com."
 
+**Usage Example:**
 ```json
 {
   "name": "firecrawl_generate_llmstxt",
@@ -503,15 +697,8 @@ Generate a standardized llms.txt (and optionally llms-full.txt) file for a given
 }
 ```
 
-Arguments:
-
-- url (string, required): The base URL of the website to analyze.
-- maxUrls (number, optional): Max number of URLs to include (default: 10).
-- showFullText (boolean, optional): Whether to include llms-full.txt contents in the response.
-
-Returns:
-
-- Generated llms.txt file contents and optionally the llms-full.txt (data.llmstxt and/or data.llmsfulltxt)
+**Returns:**
+- LLMs.txt file contents (and optionally llms-full.txt)
 
 ## Logging System