Skip to content

Initializing context/content specific fetch defaults #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
igrigorik opened this issue Apr 7, 2015 · 32 comments
Closed

Initializing context/content specific fetch defaults #43

igrigorik opened this issue Apr 7, 2015 · 32 comments

Comments

@igrigorik
Copy link
Member

When initiating a fetch request for a resource, the user agent initializes the fetch with content/context specific settings. For example, when fetching an <img>, the UA applies the appropriate image-src CSP checks; may advertise different set of request headers (e.g. advertise support for some image formats via Accept header, and may emit additional "hint" headers); may set different transport options, such as request priority, dependencies, etc.

How do we enable the same functionality with fetch() API? Without the above, in the worst case, an image resource fetch initiated via fetch() may violate CSP, will return an incorrect/suboptimal asset due to missing headers, and may be fetched with incorrect priority. Of course, images are just one example; same logic applies to all other content types. Related discussions:


Intuitively, the difference between using fetch() vs. say, and <img>, is that the latter knows its context and is thus able to initialize all the right checks + properties. Perhaps we can address this whole issue by providing "treat this fetch as if it was an 'image'" setting or signal? Handwaving...

fetch(url, {as: 'image', headers: {...}, otherOpt: {}, ...})

  • Use as (or whatever fits best) as an explicit context signal that initializes appropriate defaults?
  • headers and other fetch options are then applied on top of initialized defaults as overrides
  • As a last step, UA initializes any missing / not specified headers? E.g. User-Agent, etc.

The mental model here is very similar to what @domenic proposed in #37 (comment):

fetchSettings = mergeMultimaps(asDefaults, optSpecifiedSettings, uaDefaultSettings);

Assuming above sounds plausible, we could then also enable this in other parts of the platform:

  • <img src="photo.jpg" fetch-settings="{...}"> -- implicit: as == image, and fetch-settings can be used to override other UA + fetch defaults.
  • <link rel=preload as="image" href="photo.jpg" fetch-settings="{...}"> -- as is declared explicitly, and options passed in through same mechanism.

WDYT, does it sound crazy?

@annevk
Copy link
Member

annevk commented Apr 8, 2015

To be clear, <img> and fetch() already share logic. They both invoke fetch. That algorithm takes care of HSTS, CSP, Mixed Content, Referrer Policy, Service Workers, etc. E.g. fetch() ends up as connect-src in CSP.

The problematic case with allowing fetch() to set the CSP policy to something other than connect-src is that you could then use fetch() to bypass CSP. The only way to prevent that would be to taint the response somehow. So that if you do fetch(url, {context: "image"}) the Response object can only be used by <img> and background-image and such.

About initializing defaults. We could do this in fetch, but are engines actually setting these defaults based on an enum in the network layer or are they set in the DOM layer based on the environment? I can see how we could take over handling Accept in fetch, but client hints seems trickier?

(As for header management. The way it works is that feature-specific headers are set by those invoking fetch. So currently e.g. the specification for <img> would have to set Accept and any client hint headers. And fetch adds headers in the network layer for CORS, cookies, etc. and these are therefore not exposed to service workers.)

@igrigorik
Copy link
Member Author

The problematic case with allowing fetch() to set the CSP policy to something other than connect-src is that you could then use fetch() to bypass CSP. The only way to prevent that would be to taint the response somehow. So that if you do fetch(url, {context: "image"}) the Response object can only be used by and background-image and such.

That makes sense. In the case of rel=preload you'd get same enforcement when you try to use the preloaded response within <img>, etc.. just as we noted here: w3c/preload#17 (comment).

About initializing defaults. We could do this in fetch, but are engines actually setting these defaults based on an enum in the network layer or are they set in the DOM layer based on the environment? I can see how we could take over handling Accept in fetch, but client hints seems trickier?

Our goal here is to provide context to the engine where its currently missing, such that the engine can make more informed decisions. For example, when the engine sees an <img> resource, it will initialize some common defaults (e.g. Accept headers), and it may also use some environment/runtime variables to adjust priority and other settings: lower priority for below the fold images, advertise some hints if those values are known (e.g. resource width), and so on.

By contrast, if you try to fetch() an image resource today, the engine is completely blind and can't do any of the above. So, exposing "as", which can communicate that this is an "image" resource is already a big step forward. The engine may not have as rich of a context to determine all the other plausible optimizations, but our goal here is not to provide feature parity... When you're using fetch() you're opting into "manual control", which is why custom headers and other properties are important -- e.g. I want to override the UA defaults and use fetch to advertise own hints and other fetch settings.

In short, I think "as" is sufficient as a bootstrapping mechanism to help the UA set some defaults, it doesn't need to provide the exact same behavior.

@annevk
Copy link
Member

annevk commented Apr 13, 2015

You have to get more specific than "engine" and "some defaults". We actually have to define that in more detail than just a hint since it is all rather observable and user agents will be required to support the same functionality.

@igrigorik
Copy link
Member Author

Sure, let's make it concrete. The developer provides the destination context via "as" attribute, e.g..

fetch('/gallery/photo.jpg', {as: 'image'}).then(...)

or

<link rel=preload href=/gallery/photo.jpg as=image>

The user agent needs to...

  1. Enforce CSP based on specified context? /cc @mikewest
  2. Set appropriate HTTP request headers based on context - e.g. Accept, etc.
  3. Set appropriate request priority: for HTTP/1.x this is for internal prioritization; for HTTP/2 this prioritization information is communicated to the server.

I know that in Blink we advertise different headers, and assign different priorities, based on type of resource being fetched. AFAIK, same applies to FF, WebKit, and IE.

@pmeenan @mcmanus @toddreifsteck any thoughts or comments on this one?

@annevk
Copy link
Member

annevk commented Apr 14, 2015

What I'd like to do is that if we're certain that e.g. Accept is always set based on a request context, we specify that logic as part of Fetch. That way it is clear when Accept is set, whether service workers can observe it, and what kind of values it can have. (That would mean that <img> no longer needs to talk about Accept but can depend on the request context.)

And then go through that for each desired feature so that it's clear what the realm of a request context (e.g. "image") is and what's in the realm of a feature (e.g. <img>).

In the case of fetch(url, {as: "image"}) we'd need to restrict the Response so that it can only be used by <img> and background-image (otherwise CSP is broken). At this point, that would only make such an API useful in service workers (and even that is a bit of a stretch as service workers have their own CSP policy), but perhaps in the future <img> et al can be fed a Response directly.

@igrigorik
Copy link
Member Author

What I'd like to do is that if we're certain that e.g. Accept is always set based on a request context, we specify that logic as part of Fetch. That way it is clear when Accept is set, whether service workers can observe it, and what kind of values it can have. (That would mean that no longer needs to talk about Accept but can depend on the request context.)

That makes sense. What's the best way to tackle this? Go through each context and compile a set of context-specific headers and values into a table, or some such? Would it make sense to draft some fetch-spec boilerplate that we can fill in as we go? Might help clarify what we need, etc.

In the case of fetch(url, {as: "image"}) we'd need to restrict the Response so that it can only be used by and background-image (otherwise CSP is broken). At this point, that would only make such an API useful in service workers (and even that is a bit of a stretch as service workers have their own CSP policy), but perhaps in the future et al can be fed a Response directly.

Not sure I follow the "useful in SW only" - how so? For <img> case, I'm picture: fetch() the data, then img.src = window.URL.createObjectURL(responseBlob)... not sure what the right/required security plumbing to make that work though.

@annevk
Copy link
Member

annevk commented Apr 15, 2015

So Accept and Accept-Language are set by pretty much all contexts except for "fetch". EventSource sets an header but it needs to set that itself. (Also, I'm not sure it makes sense to allow fetch(url, {as:"eventsource"}). Perhaps we should limit the allowed values initially.) I could come up with language for Accept and Accept-Language I think.

Re: "useful for SW only". There is no Blob for opaque responses. So we'd need some new API that creates a URL from a Response to make that work in such a way. I was even thinking that it might be nice if that API could take a promise, so you could have:

img.src = createResponseURL(fetch(url, {as:"image"}))

(The other reason why I think we want it for a Response rather than a body of sorts is that then we keep the response headers available.)

@annevk
Copy link
Member

annevk commented Apr 15, 2015

An alternative approach would be to reuse the media element srcObject API on other objects and make one of the accepted objects a Response. That does seem a little cleaner as using a URL effectively requires another roundtrip through Fetch (and gets a bit ickier with GC). @zcorpan, is srcObject being considered for <img>?

@zcorpan
Copy link
Member

zcorpan commented Apr 15, 2015

Yes: https://www.w3.org/Bugs/Public/show_bug.cgi?id=23502 (marked as "Needs Impl Interest")

@yoavweiss
Copy link
Collaborator

Enforce CSP based on specified context? /cc @mikewest

Having as set context makes sense, but wouldn't that prevent authors from specifying various priorities to the fetched resource? e.g. If I have a background image that's extremely important and I want it to download in the same priority as CSS, I won't have a way to communicate that to the browser. IIRC that use case is the main reason the attribute is called as rather than context.

If we go that route then maybe, for the case of <link rel=preload>, we need to add some priority hints on top of the context.

@annevk
Copy link
Member

annevk commented Apr 16, 2015

I don't understand why you think you would not be able to control the priority. That's a largely orthogonal feature we have not sorted out yet.

@yoavweiss
Copy link
Collaborator

I agree that it's orthogonal to context, but the original intention of as was to indicate download priority to the browser. The way this discussion is going, we'll probably need to add another, separate hints to enable the "download a low priority context in high priority" use case. (which is fine, but we'd need to address that)

@annevk
Copy link
Member

annevk commented Apr 16, 2015

If the intent was fetch priority, as would be way too generic a name and not that useful (priority is much more than a context). I recommend opening up a distinct issue for indicating priority.

@igrigorik
Copy link
Member Author

@annevk I'll defer to you on createResponseURL vs. srcObject, you have much better context there.

@yoavweiss I agree with @annevk I think changing priorities is an orthogonal discussion. "as" is used to initialize request defaults, other options provided by the developer would override them. For example...

fetch(url, {as:"image", headers: {"Accept": "custom/format, image/jpeg"}})
  • as would set the default image Accept header
  • User provided headers would override the defaults

Changing prioritization would work the same way, we "just" need to define syntax to control transport priorities and dependencies... previous discussion on whatwg. Assuming we arrive at some such thing, it could potentially look as... commence handwaving...

<link rel=preload href=photo.jpg as=image fetch-settings="{headers: {..}, priority: {..}}">

Or some such.


On a more tactical level, I'm wondering if we can unblock Resource Hints / Preload and continue iterating on other details in parallel... Concretely:

  • Add some language to Fetch to recognize as fetch option, with some starter language to indicate that it should be used to enforce security policies and initialize request defaults.
  • In RH/Preload I can link "as" attribute value to this Fetch option.

The response blob vs. URL does not block either RH or Preload because both are used to populate the cache, and are then consumed by another element/object that would enforce its CSP policy.

@annevk wdyt, does that sound reasonable? I'd love to land the RH and Preload refactors, as they're blocking implementation in Blink.

@annevk
Copy link
Member

annevk commented Apr 21, 2015

I think for Resource Hints / Preload as could set context directly, no? So maybe we should just call it context? I'm still a bit concerned though. E.g. if script-src and img-src are equal something fetched as image could be used as script and the other way around. But maybe that is not different from today. Just wondering whether there is anyway in which this might cause us problems going forward.

@annevk
Copy link
Member

annevk commented Apr 22, 2015

Actually, strike that. In the presence of service workers, preload et al, would cease to work since you cannot set context for fetch(). So we still need to solve the low-level first.

@igrigorik
Copy link
Member Author

In the presence of service workers, preload et al, would cease to work since you cannot set context for fetch(). So we still need to solve the low-level first.

@annevk not sure I followed that, can you elaborate a bit more? Why would SW break preload?

As far as nomenclature goes, I'm open to ideas and suggestions. That said, I'll just note that at least one (good, I think) argument for as is that it spans across CSP + request headers + transport priorities + maybe other things in the future? If we tie the name too closely to CSP "context", I think we run the risk of confusing developers about its intent. That, and it's nice and short and easy to explain/interpret. :)

@annevk
Copy link
Member

annevk commented Apr 27, 2015

@igrigorik SW only has access to fetch() to retrieve resources. So if you use SW, <link rel=preload> will invoke fetch() at which point we need to have this solved for fetch().

context is used for CSP/MIX/X-Content-Type-Options. Per this bug we'll start using it for request headers. Transport priorities is still unclear as they can depend on many things (e.g. whether an element is in the viewport).

Anyway, context is everything fetch() is capable of within the network layer (the Fetch Standard). So in that respect as seems strictly equal as it has no access outside the network layer either.

@igrigorik
Copy link
Member Author

@annevk thanks, that makes sense. Anything I can do to help land this in Fetch? At a minimum, I just need some hooks to map the attribute value to fetch initialization, or some such. Both RH and Preload are currently blocked on this - e.g. w3c/preload#17 (comment).

@annevk
Copy link
Member

annevk commented Apr 27, 2015

If you could help prepare the list of header names/values against contexts that would help. I can do the necessary refactoring. (Basically #concept-fetch needs some kind of preprocessor that sets these headers before invoking #concept-fetch.)

@igrigorik
Copy link
Member Author

@annevk after doing a quick spot-check, the advertised Accept headers are all over the map...

Chrome...

Type Header
HTML text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
JS */*
CSS text/css,*/*;q=0.1
Image image/webp,*/*;q=0.8
Video */*

Firefox

Type Header
HTML text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
JS */*
CSS text/css,*/*;q=0.1
Image image/png,image/*;q=0.8,*/*;q=0.5
Video video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5

Safari

Type Header
HTML text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
JS */*
CSS text/css,*/*;q=0.1
Image */*
Video */*

IE

Type Header
HTML text/html, application/xhtml+xml, */*
JS application/javascript, */*;q=0.8
CSS text/css, */*
Image image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5
Video */*

I guess one way to look at this would be to say that perhaps Fetch is the one place where we can (finally) rationalize and normalize all this across the different browsers? However, even if we were to go down that route, we still need to allow for UA-specific values (e.g. WebP advertisements for Chrome, and so on). As a result, perhaps a fixed list of context -> Accept headers within fetch spec is also not the best route? Should we keep it more general? Same applies for prioritization...

/cc @mnot

@mnot
Copy link
Member

mnot commented May 1, 2015

It'd be nice, but I suspect implementations are going to want to do their own thing here, to a degree.

Fetch could give advice about what's appropriate; e.g., sending image/wepb for a HTML fetch doesn't make sense (I get that they're trying to hint to the server that it's OK to send links to webp images in the HTML, but that's not really what accept is for, and other browsers don't do it).

We could also save a few bytes by recommending that they not send /; it doesn't do anything (especially when it's on its own)`.

Latest and greatest:
http://httpwg.github.io/specs/rfc7231.html#header.accept

@igrigorik
Copy link
Member Author

Fetch could give advice about what's appropriate; e.g., sending image/webp for a HTML fetch doesn't make sense (I get that they're trying to hint to the server that it's OK to send links to webp images in the HTML, but that's not really what accept is for, and other browsers don't do it).

The problems are: inline images; direct image requests (when we don't have the <img> context to set the appropriate header); hint to HTML/CSS/JS that WebP can be used. In practice, we need this -- see crbug.com/267212 for context. FWIW, latest IE is doing the same advertisement for jpegxr.

We could also save a few bytes by recommending that they not send /; it doesn't do anything (especially when it's on its own)`.

Been tried, sadly omitting the Accept header breaks bunch of sites. I'll see if I can dig up the old thread around this.

P.S. That said.. some consistency would be nice. It would help weed out things like crbug.com/443094

@annevk
Copy link
Member

annevk commented May 1, 2015

I think Fetch could have a "SHOULD" value for Accept headers to encourage user agents to converge. But it's important to define this since the point where the UA adds the Accept header is going to be observable. E.g. Accept we want to set before service workers, Host we want to set after (when we go to the network). If we ignore interoperability on the value, that timing alone is somewhat crucial.

annevk added a commit that referenced this issue May 4, 2015
…n context. See #43

This also allowed for some nice cleanup as now internal parameters are
no longer exposed at the base of the algorithm. I also made the note
about response tainting visible as it might just as easily confuse
readers.
annevk added a commit that referenced this issue May 4, 2015
As part of this, introduce various context groupings to avoid
duplicating lists throughout the specification.
@annevk
Copy link
Member

annevk commented May 4, 2015

I have added header initialization for Accept and Accept-Language. What remains is opaque responses for when a request context is set by developers. And feeding responses to an API (either via URL or object). Should we create new issues for those? Anything remaining here?

@igrigorik
Copy link
Member Author

@annevk nice. A couple of questions...

  1. Can we add a note indicating that the specified Accept values are recommended values, and should be extended by the user agent to match the supported types - e.g. WebP, JPEGXR, and so on.
  2. Can we also extend this mechanism to cover transport priorities? E.g. navigation requests have higher priority than CSS/JS (critical resources), which have higher priority than images, and so on.
    • Similar to Accept, the priority values vary for each UA and protocol in use (HTTP/1.1 vs HTTP/2). Perhaps we can simply indicate that the user agent should set the transport priority based on indicated context?

@annevk
Copy link
Member

annevk commented May 5, 2015

Well, it already says "should" so user agents have some leeway and I think user agents ought to converge on supported types long term. I'd rather not mention formats that don't have buy in from everyone.

We can mention priorities I suppose. Pointer? Part of the problem with priorities is that I know that e.g. in Gecko we want to do more. We want to not only take into account the context, but also layout information and such. So setting priority in Fetch may be too late.

@igrigorik
Copy link
Member Author

Well, it already says "should" so user agents have some leeway and I think user agents ought to converge on supported types long term. I'd rather not mention formats that don't have buy in from everyone.

Fair enough 👍

We can mention priorities I suppose. Pointer? Part of the problem with priorities is that I know that e.g. in Gecko we want to do more. We want to not only take into account the context, but also layout information and such. So setting priority in Fetch may be too late.

I don't think it's any different from Accept, which may also be set upstream - e.g. through explicit Accept header passed to fetch(). For priority, we could use similar logic: "if requests transport priority is not set, use the request's context to initialize the transport priority", or some such. Perhaps we can just add this as a substep in fetching algorithm - say, between current steps 2-3?

@annevk
Copy link
Member

annevk commented May 6, 2015

That seems reasonable. What reference should I use for priorities?

@igrigorik
Copy link
Member Author

The best definition is "stream (request) priority" section in HTTP/2: https://tools.ietf.org/html/draft-ietf-httpbis-http2-17#section-5.3. For HTTP/1.x the priority is not communicated to the server, but is used internally by the net-stack to prioritize order and request dispatch times.

Perhaps... ~"if requests transport priority is not set, use the request's context to initialize the request priority - e.g. stream weight and dependency for HTTP/2, or equivalent priority class used to prioritize dispatch and processing of HTTP/1.x requests."

@annevk
Copy link
Member

annevk commented May 6, 2015

I guess I should also give request a priority field then whose value and function is mostly left as an exercise to the reader.

@igrigorik
Copy link
Member Author

Yep, sounds good. As a side note, there was this: http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Aug/0081.html .. except, we never reached any meaningful conclusions on that thread.

That said, now that the dust has settled on HTTP/2, I think we can take another run at it. In the meantime, exposing a "priority" field on fetch is sufficient. We can refine how that looks later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants