-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[Self-Host] Screenshots are not supported #1028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
same issue, seems that cause by 'screenshot'. |
Including a list of actions also causes it to fail. Here are the logs from my docker compose service:
My guess would be that the issue stems from here: Playwright technically should support screenshots, but maybe it requires additional configuration? |
Same issue here, has anyone found a workaround?
|
After some more testing, it seems to work when called like this: curl -X POST http://firecrawl-api:3002/v1/scrape \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com",
"formats": ["html", "markdown"]
}' But this fails: curl -X POST http://firecrawl-api:3002/v1/scrape \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com",
"formats": ["html", "markdown"],
"location": {
"country": "us",
"languages": ["en"]
}
}'
# {"success":false,"error":"(Internal server error) - All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected]."} |
After some investigating the codebase it looks like some things just aren't supported by the puppeteer engine, such as screenshots and I guess location? Options seem to be to either use firecrawl directly, or pay for scrapingbee and use that engine. |
Hi there! Here's a list of what options aren't supported by the self-hosted version:
Some stuff that requires extra configuration:
And that should be about it! Now that we have a test suite running on the self-hosted version too, you can look at |
So for hosting screenshots I would personally do it like this (this is without looking at the codebase, so I could be way off base here):
|
That's a good idea, thinking about whether we can roll how we do it in prod together with this, so there wouldn't be too much code divergence. Local storage is tough since most of the self-hosted environments are dockerized, we would probably need to do static file serving on the playwright service, which means that the playwright service would need to be exposed in order to access the screenshots. |
Maybe avoiding a local storage option right now would be good. If people really want screenshots I don't think asking them to use an S3 compatible service is too much to ask. For the self hosted docker-compose file you could even include minio by default to make things especially easy. Then I'd probably add a route Granted this also depends on how you guys manage filestorage internally, but I'd be pretty surprised if you aren't using some form of object storage. Lmk if you want any help with this and I'd be happy to look into making a PR myself. |
On the prod side, the screenshot management is handled either by
This sounds great. Going to look into it!
Hmm... would prefer to try and expose minio and point to it, but it might have to come to this.
Would love a PR! Even if it's just a draft of the actual upload logic in the above file, I can connect the other bits up afterwards. |
Based on the above state of the screenshot upload function, does it not work to just add Supabase ENVs and create a "media" bucket in your Supabase instance to receive a copy of the screenshot output? Obviously this is not a fully inclusive solution for the Self-Host case like a running a simple MinIO container, but firecrawl has already opened the door by having Supabase a part of the local docker config for other purposes? For my specific use case, this would be perfect as the final resting place for the scraped screenshots was going to be stored in our self hosted Supabase stack in a bucket anyways. |
If you have Supa set up, you should just be able to point Firecrawl at it and have it work fine. |
I attempted this but i think there is some missing steps or docs on this setup. I setup my supa anon, url and service key, turned on DB auth and set a TEST_API_KEY but just get 401 un-auth errors to my Supa instance (local dev via supabase cli). TBH i didn't think this would work as looking through the other source code for all the supabase client logic there seems to be schema/migration or setup needed. And i don't see any migration actions happening when the ENVs are setup and run in the docker compose logs. Am I missing something here? |
FYI, created a draft PR for this: #1372 |
How about base64 encode the result and return it as part of the response? |
I just remove "location" from the option, and it works. Thank you so much. |
To Reproduce
Steps to reproduce the issue:
Firstly
system config keep default
Then
run this code
client error log
{'success': False, 'error': "(Internal server error) - All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected]."}
server error log
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: