-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[Feature request] ability to set localStorage/sessionStorage w/o loading a page from target domain #3692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@bluepeter why don't you use chrome profiles for this? |
Thanks @aslushnikov! Not super familiar. We aren't guaranteed that we're re-using the same Lambda container on subsequent crawls, so we need to export and re-import cookies + local/sessionStorage for each crawl. Possible you think w/ Chrome profiles? |
@bluepeter it should. What I mean is a two-step process:
|
Thanks @aslushnikov ... so it sounds like that would require launching Chrome each time? There's overhead to that, but it may be unavoidable (save for our current solution of going to, e.g., |
@bluepeter I think you can launch it once and then open multiple pages. But yes, this might be suboptimal.
Another way of doing this is using request interception to load dummy page on the correct security origin and use it to pre-setup cookies and local storage: const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', r => {
r.respond({
status: 200,
contentType: 'text/plain',
body: 'tweak me.'
});
});
await page.goto('https://pptr.dev');
// Use page to setup cookies and local storage for pptr.dev
// ... This should be bulletproof comparing to loading favicon or robots.txt. |
Thanks... we will try this approach and report back in this issue! |
@aslushnikov great recommendation! This approach is working nicely for us. And as you note it's a lot more bulletproof than hitting |
Just an FYI I had success using page.evaluateOnNewDocument to solve the above problem |
Mehn Why didnt I see this approach on time. I implemented usin a custom dir but it seems the file size for like a thousand browsers is way to heavy. i was practically using a loop to create lots of director for each bot so as to maintain google sessions as i noticed when i appended the cookies and local storage for other website i am able to still be logged in but for google's website i am told to relogin again how ever i get to see it show as signed out meaning the cookies where of indeed appended but how ever since it was polluted it had to be rehashed properly, should i try this bullet proof method i would drop a feedback
|
We run Puppeteer on AWS Lambda to orchestrate multi-step, multi-page crawl sessions on our SaaS Fluxguard (e.g., login, go to dashboard, go to page C). Due to time constraints on Lambda, and other reasons, each page is handled by its own Lambda execution in sequence. We save all browser state (cookies, localStorage, webStorage) in an object store for reuse by subsequent page crawls.
The problem arises when we want to re-use saved local/sessionStorage on subsequent crawls. We cannot set local or session storage w/o first loading a page from the target site via, e.g.:
We initially loaded the target page twice: first so that we could set storage, and second, once storage was set, to properly load the page w/ appropriate state. However, this is troublesome, as the first load, regardless of whether we disable Javascript/etc, will often pollute the cookie/storage space with new data. It's also messy to have to load the page twice.
Currently, we try/catch loading "innocuous" pages of the target site, such as
robots.txt
andfavicon.ico
: we use these then to contextually set storage before loading the target page. This is "fine," but not ideal and introduces its own problems.It would be great if Puppeteer could set storage generally or for a specific domain without the need to load a page from that domain first.
The text was updated successfully, but these errors were encountered: