Skip to content

Pass options to Cheerio? #484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gmhenderson opened this issue Feb 22, 2022 · 5 comments
Closed

Pass options to Cheerio? #484

gmhenderson opened this issue Feb 22, 2022 · 5 comments
Labels

Comments

@gmhenderson
Copy link

I recently had the need to set a specific option for Cheerio (scriptingEnabled: false) but there is currently no way to pass any configuration options. Does it make sense to create a config option that can be passed along to Cheerio?

@s0ph1e
Copy link
Member

s0ph1e commented Mar 30, 2022

Hi @gmhenderson 👋

Sorry for late response.

Could you please share a use-case example when it can be needed?
I didn't face such need before and I'm not sure about config option for cheerio because making cheerio more configurable will make it easier to break everything (website-scraper will not work as expected without current options for cheerio)

@gmhenderson
Copy link
Author

gmhenderson commented Mar 31, 2022

Hi @s0ph1e, thank you for the response.

I am using this tool to create static HTML versions of CMS-powered websites that I have built. These websites load their CSS assets via Javascript (rather than within a <link> tag, with no-JS fallbacks contained within <noscript> tags. With scriptingEnabled: true the noscript tags are ignored and thus the fallback resource URLs are not scraped. One might think that the default scriptingEnabled value for Cheerio would be false, however it is not (see here ). Thus I had the need to be able to set its value.

As a workaround I was just about to fork website-scraper and hardcode the false config value. Here's where I made the change:

lib/resource-handler/html/index.js line 84:

return cheerio.load(text, { scriptingEnabled: false });

@s0ph1e
Copy link
Member

s0ph1e commented Apr 1, 2022

Thanks for sharing @gmhenderson 👍

Yep, makes sense to have content inside <noscript> parsed. I'll leave the issue open and think about proper solution.
Maybe it makes sense to have always { scriptingEnabled: false } 🤔

@gmhenderson
Copy link
Author

I think always using { scriptingEnabled: false } makes sense since Javascript is not being parsed, but it seems like that could possibly break some existing projects.

gmhenderson pushed a commit to rocket-media/node-website-scraper that referenced this issue Jun 13, 2022
@stale
Copy link

stale bot commented Nov 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 16, 2022
@stale stale bot closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants