Skip to content

Load module directly from a URL is very cute #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pibi opened this issue Jun 7, 2018 · 32 comments
Closed

Load module directly from a URL is very cute #195

pibi opened this issue Jun 7, 2018 · 32 comments

Comments

@pibi
Copy link

pibi commented Jun 7, 2018

So please, remove this feature. It is unnecessary.

P.S. just think what could be a leftpad case without a centralized source of trust.

@jedahan
Copy link

jedahan commented Jun 7, 2018

There are environments (such as node) that require a centralized source of trust, maybe those fit your needs better? I believe the nice thing about loading from any resource:// is that it makes centralized source of trust optional-not-required.

Related to #94 , maybe this should be closed?

@sean256
Copy link

sean256 commented Jun 7, 2018

import { test } from "https://unpkg.com/[email protected]/testing.ts"

I can see a lot of devs mistakenly using different versions in different source files causing unnecessary duplication of cached packages. I much prefer the single source of package and version declaration!

@pibi
Copy link
Author

pibi commented Jun 7, 2018

Point is: you can do that (module download) upfront. No need to do that at runtime or even be part of the project. This is right the same as the require<->package.json coupling that led to npm and leftpad. Just keep Deno simple now.

@rivertam
Copy link

rivertam commented Jun 7, 2018

Just keep Deno simple now.

That's a mighty subjective definition of "simple".

Did you watch Ryan's talk? He has already offered rationale.

@jedahan
Copy link

jedahan commented Jun 7, 2018

I see having package.json and require as being unnecessarily complicated and implicit - see package.json, package-lock.json, npmjs... all baked in. A URI is just an explicit, simple identifier. Seems simpler to me.

@pibi
Copy link
Author

pibi commented Jun 7, 2018

Did you watch Ryan's talk?

Yep, this is why I think this is really a cute feature

@pibi
Copy link
Author

pibi commented Jun 7, 2018

@jedahan really? what do you think will happen if you are using a module which internally imports something else which depends on a module linked to an expired domains? what if the domain is hijacked? I cannot trust multiple origins, this is the whole point of having a modular system: trust the work of others, not their server provider

@chainhelen
Copy link
Contributor

Too many similar issue, please see #47

@rivertam
Copy link

rivertam commented Jun 7, 2018

@pibi What if that module specifies the alternate origin in their package.json? Then you've got the exact same problem.

Unless you're suggesting that Deno becomes actually beholden to a centralized repository like npm, and specifying any other host is now not allowed entirely. In which case I think you're going to find very little support from anyone.

@jedahan
Copy link

jedahan commented Jun 7, 2018

@pibi thats why I am excited to try non-http protocols, content-addressed ones like ipfs, dat, ssb, etc.

@pibi
Copy link
Author

pibi commented Jun 7, 2018

@rivertam No, I just think Ryan have some good points and an unnecessary one. I cannot see why having the package manager (because this is still a package manager) embedded in the runtime could solve the issues npm still has.

@jedahan I don't understand why you can't do that in node, right now. just specify it on your package.json as we are already doing with git based dependencies.

@Jusys
Copy link

Jusys commented Jun 7, 2018

@pibi:

what do you think will happen if you are using a module which internally imports something else which depends on a module linked to an expired domains? what if the domain is hijacked? I cannot trust multiple origins,

why not just add some switch like "--allow-remote-imports" for guys like you?
or even "--allow-remote-imports" + "--allow-third-party-imports" (like 3rd party cookies)?
or "--allowed-import-url 'some_url'" + "--allowed-import-url 'another-url'"...
or "--allowed-import-url regex" + "--allowed-import-url 'another-url'"...

@jedahan
Copy link

jedahan commented Jun 7, 2018

@pibi does npm support arbitrary resolvers? According to the docs...

is one of git, git+ssh, git+http, git+https, or git+file

I know projects like gx resolve ipfs, but that is a separate package manager

I hope I can share what I think is interesting - none of the ideas I've seen in deno stop you from writing tooling to support centralized use-cases, reproduceability, etc; but by not prescribing a particular ahead-of-time package manager, we will not be restricted to it's design decisions in the future.

@jbreckmckye
Copy link

jbreckmckye commented Jun 7, 2018

There seems to be some consternation about this feature on Reddit & Hackernews, so it might be worth having a recap of the broader issues.

It's true that loading a package from an unfamiliar URL has its risks. The domain might go down. It might get hijacked! Or you might make a typo.

What's not true is that Node's package system really solves this. package.json already supports URL-based dependencies, and NPM has always had ongoing problems with namesquatting. There's no guarantee that an OSS package will be the same in NPM as it is on GitHub / Bitbucket / et al, nor that NPM itself will always be secure and reliable.

What's also not true is that URL-based imports preclude using some centralised package authority. It would be quite feasible to -

import lodash from "https://d.pm/lodash/5.7.1.ts"

... if the community decided that dpm was an appropriate steward for Deno dependencies. This approach would also allow for competition between package providers, which admittedly can cause pain in the short term (e.g. if there isn't parity between them) but gives a platform long-term robustness (imagine if there was only one Linux package manager).

Another approach might be using independent devtools to bundle the files directly with the executable, so you could then import them like a local module:

./deptool --install lodash --exact 5.7.1 --save
import lodash from '../deps/lodash.ts'

Deptool could be any script produced by anyone. You could even write your own! Or an entity like dpm could publish one that integrates well with their ecosystem. Just like NPM - but optional.

One thing that might help here is a better way to specify a path resolving from the top level of the project. If you could use such "absolute" paths, you could rewrite the above to

import lodash from 'deps/lodash.ts'

...which I think would be a good feature anyway.

Finally, it would probably not be unfeasible to insist on HTTPS resources. Perhaps HTTP imports could throw a warning or exception unless a certain flag was set.

@elldritch
Copy link

elldritch commented Jun 7, 2018

Another approach might be using independent devtools to bundle the files directly with the executable, so you could then import them like a local module

This is the best approach, and should be the only possible approach. Allowing modules to be loaded directly from a URL adds unnecessary complexity and risk when this functionality could instead be provided by a separate tool.

Context/disclaimer: I build dependency analysis tools for my day job (fossa.io), I've worked with and built analysis tools for many languages and many package managers (fossa-cli), and I spend a lot of time thinking about package and dependency management.

Imports over the network in the way that the talk describes ("load once on first execution, then cache") is a really bad idea. The problem is reproducibility.

A network resource may change or become unavailable between the times that:

  1. I first run a program on my development machine.
  2. My colleague first runs a program on their development machine.
  3. The program first runs on one of my production machines servers.
  4. The program first runs again on that same production machine after a new deployment (possibly in a new container/image that blows away ~/.deno/src).

Implicit caches make it very difficult for me to guarantee that my build is reproducible across all of these environments: I either need to replicate the dependency cache (at the very least, copying ~/.deno/src) or mock out all network requests to my network dependencies. Neither of these is impossible, but they're both unnecessary footguns. Instead, all imports should be local and network resources should be explicitly downloaded locally before importing locally.

One counterpoint I've seen is that Go uses URL imports and it seems to work well. This is because Go's imports follow extremely specific and limited semantics:

  1. Go imports are downloaded by the go get tool, which the user must explicitly run. They are not downloaded implicitly the first time the program is executed.
  2. Go packages are downloaded into the Go workspace alongside your source code. This makes it extremely easy to (1) identify the exact source code of the imported package and (2) reproduce a particular build (there is no implicit cache that lives in a special place far away from your own source code).
  3. Since imported Go packages are part of a regular Go workspace, they obey regular semantics for Go source code. Another way of thinking about this is that the import cache in Go is part of Go's public API. There are no separate, potentially unstable semantics that a user must learn to examine their import cache, which seems to be the case with ~/.deno/src.

These problems could all be mitigated if the dependency cache's semantics were explicit, public, and stable, but at that point you might as well use a separate tool and reduce the complexity of the runtime.

(Using a separate tool also has a variety of other advantages e.g. allowing users to support their own network protocols, but reproducibility is the most important one.)

@pkoretic
Copy link

pkoretic commented Jun 7, 2018

To me the implementation seems more in line with browser behaviour. In browsers you also specify .js files/modules as urls which then browser caches (depending on the HTTP caching headers) upon loading the html page the first time.

In turn this made people use cdn and package builders which can bundle all dependencies in one file which is then cached by the runtime after being included the first time. The core functionality of the runtime (browser or deno) is kept simple, but the tooling around it is free to develop.

So in the end instead of npm install you might be doing something like grunt build or whatever tool one would use.

@elldritch
Copy link

elldritch commented Jun 7, 2018

Regarding matching browser semantics: availability requirements are different for browsers and not-browsers. For not-browsers, it makes sense to run a program offline even if that program has third-party dependencies. For browsers, all programs must be run online anyway, so the conditional likelihood of a dependency being unavailable given that the program is available is much lower.

(That said, have your web apps ever been broken by a third-party hosting an analytics tool or a jQuery plugin that was modified or became unavailable? Mine have. It's not great.)

@pkoretic
Copy link

pkoretic commented Jun 7, 2018

@ilikebits In that case npm install in Node.js, which also needs internet access for remote modules, would be equivalent to some other bundler that would do the same for deno (as it does for many browser frameworks today like angular where you do not explicitly import files from internet at runtime) where you would just require those files locally after bundling. No difference there for the end result.
I do agree it would be great having a single executable bundling everything you need that you could just run ala go.

@elldritch
Copy link

elldritch commented Jun 8, 2018

Requiring a tool like npm install to do generate bundles is the ideal. While both a bundling tool and network imports require network access to download remote modules, the difference is when they need network access.

Using a tool to generate the bundle means you need access when the tool is explicitly invoked to download dependencies, and it's easy to copy vendored dependencies after they've been downloaded so that future deployments don't rely on the network.

Using the proposed "download at first execution time, then cache" mechanism requires network access every first run of the program, which is what I'm concerned about. Since a program may be "first executed" (e.g. on a new machine, in a new container image, etc.) at many different points in time, it's difficult to ensure that, for every "first execution", a program will be able to download its dependencies and that it'll download the same dependency source code.

@sean256
Copy link

sean256 commented Jun 8, 2018

One feature I do like about npm is the ability to "wildcard" versions. The absolute url path removes this ability. That is unless the servers implement some means to do it (/url/[email protected], etc), but then you would only get this feature from some package hosts.

@pkoretic
Copy link

pkoretic commented Jun 8, 2018

@ilikebits It doesn't require network access every first run of the program if you bundle the sources at build time and require that. You would again use some tool as you do for web apps. Deployed application in docker or wherever would not need to have network access same as you do that today with npm install during docker build. User provided libraries could again have everything bundled in library similar as those .min.js versions in dist folders you use for browsers. I'm talking about how today you deploy and build web apps which is exactly: Using a tool to generate the bundle means you need access when the tool is explicitly invoked to download dependencies, and it's easy to copy vendored dependencies after they've been downloaded so that future deployments don't rely on the network. You (can) bundle your web app the same today like with angular and ng build -prod.

I'm just an outsider looking on how you can get the same behaviour in this case because the primary reason is to not have npm complexity as far as I understood from the presentation and I'm perfectly fine with that.

@pibi
Copy link
Author

pibi commented Jun 8, 2018

So, we are all-in for reproducible builds (yarn.lock, package-lock.json, docker files), immutability, sandboxing and security, right? So, what about CI/CD for deno apps when we are using tons of third party modules we cannot cache every time?

BTW, some of the Ryan's points about npm are quite true, but let me ask what prevents us to just drop the npm server dependency and move on to something else distributed, immutable, reproducible (ipfs maybe?). If the main point of deno is "get rid of the npm mess", then we are talking about a new package manager, but:

  • this is just one point of many others. I mean: this is a platform, right?
  • even if it is, importing from url just looks cute. where cute here is in the @ry's meaning.

@jbreckmckye
Copy link

@ilikebits You and I are of the same mind, I think. If I were designing Deno - and I'm not - I would simply have my runtime pull dependencies from a local cache. I'd leave it up to a separate tool to populate that cache, perhaps provided as a 'sibling' project a la go get.

One advantage of this approach might be that you could instrument your dependency code during unit tests. You could quite easily replace a module with a stub or a mocked equivalent. You could also use symlinks to pull in other projects as dependencies - a transparent, non-proprietary alternative to npm link.

@ry
Copy link
Member

ry commented Jun 8, 2018

It was unfortunate phrasing in my talk to deride unnecessary features - calling them “cute” - and then to minutes later to use the same word for URL imports.
The URL imports are not unnecessary - they are the central way to link to external code - I had only meant that I think this design would be beautiful.
I’ve thought a lot about this and I’m not going to concider removing it. I believe all vendoring/reproducible build/security concerns can be fit into this import scheme. There are many details to be worked out - like how to locally develop multiple codependent modules - but I’m confident these are surmountable and do not require extra complexity in the module resolution algorithm.

(And apologies if I’m not repling to some other comments here - I’ve only skimmed it - ping me otherwise.)

@thysultan
Copy link

@ry Considering that browser vendors use the MIME type in the response instead of the url, is it a goal to gain some inter-op with this heuristic?

@mikew
Copy link

mikew commented Jun 8, 2018

Imagine we're working with something like React, that has a good chance of being imported in most of your files.

import { Component } from "https://unpkg.com/[email protected]/react.ts"

Now you want to update to [email protected], are you supposed to replace all instances? Or keep a reference that react means https://unpkg.com/[email protected]/react.ts somewhere else.

@Janpot
Copy link

Janpot commented Jun 8, 2018

I guess you can do

// ./react.ts
export * from "https://unpkg.com/[email protected]/react.ts"

// ./foo.ts
import { Component } from './react.ts';

?

@mikew
Copy link

mikew commented Jun 8, 2018

Wouldn't that be very poor for code splitting purposes? And isn't it just shifting the responsibility of package management to users? And like @ilikebits mentioned, if dependencies are downloaded at execution time, how do we do the equivalent "install dependencies" when building a container image.

I get @ry's complaints about package.json, but I do see value in there being a file like packages:

react: https://unpkg.com/[email protected]/react.ts
underscore: https://unpkg.com/[email protected]/underscore.ts

Not unlike markdown's reference links.

@lukejagodzinski
Copy link

@mikew I think that file resolution could be done on the bundler level. So as you wrote react means https://unpkg.com/[email protected]/react.ts. Or you can just refactor your code and replace all the imports. IMHO deno shouldn't bother with that.

@mikew
Copy link

mikew commented Jun 8, 2018

It's probably outside the scope of this issue, but how would common dependencies be handled. If both some-auth-library and some-api-library require underscore, and are both compiled down to one file, wouldn't that mean underscore is definitely in there twice?

@ry
Copy link
Member

ry commented Jun 8, 2018

@mikew If they both load it from the same URL they will not be.

@thysultan I think for now we'll just pass everything thru the TS compiler and ignore MIME.

I will close this issue. Reopen new issues with more specific comments if necessary.

@the-vampiire
Copy link

the-vampiire commented May 14, 2020

Another approach might be using independent devtools to bundle the files directly with the executable, so you could then import them like a local module

This is the best approach, and should be the only possible approach. Allowing modules to be loaded directly from a URL adds unnecessary complexity and risk when this functionality could instead be provided by a separate tool.

Context/disclaimer: I build dependency analysis tools for my day job (fossa.io), I've worked with and built analysis tools for many languages and many package managers (fossa-cli), and I spend a lot of time thinking about package and dependency management.

Imports over the network in the way that the talk describes ("load once on first execution, then cache") is a really bad idea. The problem is reproducibility.

A network resource may change or become unavailable between the times that:

1. I first run a program on my development machine.

2. My colleague first runs a program on their development machine.

3. The program first runs on one of my production machines servers.

4. The program first runs again on that same production machine after a new deployment (possibly in a new container/image that blows away `~/.deno/src`).

Implicit caches make it very difficult for me to guarantee that my build is reproducible across all of these environments: I either need to replicate the dependency cache (at the very least, copying ~/.deno/src) or mock out all network requests to my network dependencies. Neither of these is impossible, but they're both unnecessary footguns. Instead, all imports should be local and network resources should be explicitly downloaded locally before importing locally.

One counterpoint I've seen is that Go uses URL imports and it seems to work well. This is because Go's imports follow extremely specific and limited semantics:

1. Go imports are downloaded by the `go get` tool, which the user must explicitly run. They are not downloaded _implicitly_ the first time the program is executed.

2. Go packages are downloaded into the Go workspace alongside your source code. This makes it extremely easy to (1) identify the exact source code of the imported package and (2) reproduce a particular build (there is no _implicit cache_ that lives in a special place far away from your own source code).

3. Since imported Go packages are part of a regular Go workspace, they obey regular semantics for Go source code. Another way of thinking about this is that the import cache in Go is part of Go's public API. There are no separate, potentially unstable semantics that a user must learn to examine their import cache, which seems to be the case with `~/.deno/src`.

These problems could all be mitigated if the dependency cache's semantics were explicit, public, and stable, but at that point you might as well use a separate tool and reduce the complexity of the runtime.

(Using a separate tool also has a variety of other advantages e.g. allowing users to support their own network protocols, but reproducibility is the most important one.)

hey @liftM im late to the party here and might have missed someone making a similar suggestion. but what about a hosted dependency bundle that the team can share?

say you have 3 machines:

  • dep host (each team hosts this / some dep host provider exists for use)
  • dev A (local)
  • dev B (local)

dev A wants to import a dependency. rather than deal with local caching they issue some command "install dependency X". now this triggers the following process:

  • dep host: pull and cache X dependency in its dependency dir or to recognize it already has it and do nothing
  • in return the dep host responds with a hash of the latest state of its dependency dir to dev A
  • dev A checks the hash against the previous hash they had locally. if they are the same do nothing (equivalent of npm install when you already have them). if they differ then it requests the dep host to provide the latest dep bundle
  • dev A downloads the dep bundle and updates its hash

dev B comes along and wants to play. rather than rely on pulling and locally caching each of those deps themself they issue "install" which requests the dep bundle from the dep host. the process repeats itself

essentially the dep host acts as a project-scoped CDN of all the deps. installing / updating / removing are commands issued against that dep host not the local machine that issues it. it centralizes (relative to a project and its devs/consumers) the cache so that it can be used across the various environments and pipelines etc. it can be controlled to restrict changes (semver rules etc). the dep host can even version the dep bundle so that it can be quickly rolled back if needed.

what do you think of that mate? i appreciate your insight and think its shitty out of all the responses here yours didnt get one. felt you brought a lot of experience to the table.

hardfist pushed a commit to hardfist/deno that referenced this issue Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests