Skip to content

Dependency cli parsing #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Dependency cli parsing #673

wants to merge 11 commits into from

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented May 16, 2025

Related to #672, this PR adds the logic to parse dependencies from the cli. Still not used, but worth discussing syntax and implementation.

The Dependency class unfortunately is not fit to generate the shard.yaml. Resolver key and source are normalized. I created a separate type.

The added spec shows the various syntax supported.

To create a proper dependency, with the shard name, we need to actually fetch the shard to resolve that. This is done in DependencyDefinition.from_cli while the DependencyDefinition.parts_from_cli is just the parsing without side effects.

In general {resolver}:{source} syntax is supported, but there are some shorthands for convenience.

  • git@...
  • github:foo/bar
  • github:Foo/Bar#1.2.3 (which will mean ~> 1.2.3)
  • https://github.com/foo/bar and other known hosts like gitlab, bitbucket, and codeberg
  • https://github.com/foo/bar#1.2.3 and other known hosts like gitlab, bitbucket, and codeberg
  • https://github.com/Foo/Bar/commit/000000
  • https://github.com/Foo/Bar/tree/v1.2.3 (which is interpreted as a tag since it's prefixed with v)
  • https://github.com/Foo/Bar/tree/some/branch
  • relative and absolute paths

Some of these are maybe controversial, feel free to discuss.

If you want to play around you can use the following code.

require "./src/dependency_definition"

def t(s)
  d = Shards::DependencyDefinition.from_cli(s)
  pp! d

  puts "\nshard.yaml"
  puts(YAML.build do |yaml|
    yaml.mapping do
      d.to_yaml(yaml)
    end
  end)

  puts "\nshard.lock"
  puts(YAML.build do |yaml|
    yaml.mapping do
      d.dependency.to_yaml(yaml)
    end
  end)
end

t("github:crystal-lang/crystal-db")

t("github:crystal-lang/crystal-db#0.13.0")

which will output

d # => #<Shards::DependencyDefinition:0x1011fbbd0
 @dependency=
  #<Shards::Dependency:0x101217e60
   @name="db",
   @requirement=*,
   @resolver=
    #<Shards::GitResolver:0x10120f200
     @local_path=
      "/Users/bcardiff/.cache/shards/github.com/crystal-lang/crystal-db.git",
     @name="unknown",
     @origin_url="https://github.com/crystal-lang/crystal-db.git",
     @source="https://github.com/crystal-lang/crystal-db.git",
     @updated_cache=true>,
   @resolver_key=nil,
   @source=nil>,
 @resolver_key="github",
 @source="crystal-lang/crystal-db">

shard.yaml
---
db:
  github: crystal-lang/crystal-db

shard.lock
---
db:
  git: https://github.com/crystal-lang/crystal-db.git
d # => #<Shards::DependencyDefinition:0x1011fb240
 @dependency=
  #<Shards::Dependency:0x1012179b0
   @name="db",
   @requirement=Shards::VersionReq(@patterns=["~> 0.13.0"]),
   @resolver=
    #<Shards::GitResolver:0x10120f200
     @local_path=
      "/Users/bcardiff/.cache/shards/github.com/crystal-lang/crystal-db.git",
     @name="unknown",
     @origin_url="https://github.com/crystal-lang/crystal-db.git",
     @source="https://github.com/crystal-lang/crystal-db.git",
     @updated_cache=true>,
   @resolver_key=nil,
   @source=nil>,
 @resolver_key="github",
 @source="crystal-lang/crystal-db">

shard.yaml
---
db:
  github: crystal-lang/crystal-db
  version: ~> 0.13.0

shard.lock
---
db:
  git: https://github.com/crystal-lang/crystal-db.git
  version: ~> 0.13.0

@ysbaddaden
Copy link
Contributor

I think I'd still go the other way around: from a CLI arg that directly targets a resolver and have it parse the URL into a dependency, and fallback to iterate each resolver to parse the URL for known hosts when no CLI arg is specified, and fail asking for an explicit resolver when we can't know what the remote is.

@straight-shoota
Copy link
Member

straight-shoota commented May 16, 2025

I would go the other way: encode the full information in a single URI.
This should work well. Most of the time, the natural formats are already URI-compatible. That's a fully descriptive, self-contained format that's easy to reuse in other contexts.

The implementation of different strategies is quite simple: We parse the value as a URI and route it to a resolver-specific parser based on the scheme (e.g. git, github, git+https, etc. go to GitResolver).

If the URI scheme does not indicate a specific resolver (e.g. https), we fall back to heuristics based on the full URI (e.g. based on known hostname, or file extension) or potentially even trial-and-error through possible resolvers (very optional).

@bcardiff
Copy link
Member Author

For GitHub, from the clone menu we should support

I want to support the most common format the user can copy paste directly.

And I want to preserve as much as possible user intent. That's why I don't want the Resolver to offer a canonical representation of the dependency for the shard.yaml. Eg: urls are downcased in the normalization and that will look odd.

I saw mentions of git+ssh somewhere. IIUC those will be [email protected]:owner/repo.git, am I missing another pattern?

Thanks for the feedback.

@bcardiff
Copy link
Member Author

Oh and should I move parsing and rendering to a DependencyDefinition type to avoid having these changes in dependency but not used in all places?

@straight-shoota
Copy link
Member

We could probably turn the named tuple return type from .parts_from_cli into a dedicated type which holds this information.

@bcardiff bcardiff force-pushed the dependency-cli-parsing branch from 20d0567 to eafca30 Compare May 16, 2025 13:56
@bcardiff
Copy link
Member Author

I think I incorporated all of the feedback.

expect_parses("github:Foo/[email protected]", "github", "Foo/Bar", VersionReq.new("~> 1.2.3"))

# GitHub urls
expect_parses("https://github.com/foo/bar", "github", "foo/bar", Any)
Copy link
Member

@straight-shoota straight-shoota May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm wondering about why this gets normalized to github resolver instead of "git", "https://github.com/foo/bar".
Both options are valid. So we only need to decide which one we want to pick.
I suppose the reason for github is that it's more concise? That's nice but not strictly necessary.
git would be closer to the original input.

Copy link
Member Author

@bcardiff bcardiff May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me closer to the intent is to have github: foo/bar, because I'm assuming the user is copy-pasting the url in the browser. We do preserve the format with trailing .git as those are copied from the clone popup.

At some point maybe it's worth configuring shards to map all github source to be resolved in some specific way, but is a separate story.

expect_parses("..\\relative\\windows", "path", "../relative/windows", Any)
{% end %}
# Path file schema
expect_parses("file://#{local_relative}", "path", local_relative, Any)
Copy link
Member

@straight-shoota straight-shoota May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: This is technically incorrect. The URL file://a/releative/path means path /relative/path on host a. The // indicates an authority.

The correct URL for a local path would be file:a/relative/path.

I believe some applications implicitly interpret file://a/b/c as file:a/b/c because this is a common mistake.
The file scheme is not intended for remote access. We can consider doing that, but I don't think it's a good idea as it would solidify invalid behaviour.
Instead, the best way to handle this is probably to error if the host of a file URI is anything else but empty or localhost.

@straight-shoota
Copy link
Member

straight-shoota commented May 20, 2025

Following up on #673 (comment), I propose an alternative implementation, fully based on URI parsing in #678

IMO this makes it much easier to handle things. The different routes are clearly decided by the URI scheme. All parsing is safe and sound using stdlib's URI class.

@straight-shoota
Copy link
Member

For reference, VCS dependencies in PIP: https://pip.pypa.io/en/stable/topics/vcs-support/

@bcardiff
Copy link
Member Author

The alternative implementation seems good. We can close this PR then in favor of the other. Anything else that is missing?, because it's marked as draft.

@straight-shoota
Copy link
Member

#678 is a patch against the branch of this PR. We should merge that in and then continue here.

@bcardiff
Copy link
Member Author

Commit pushed into this branch

@straight-shoota
Copy link
Member

The windows test is failing because for relative URIs we expect .starts_with?("./"), .starts_with?("../") to recognize a relative path by prefix.
If we want to support path's with backslash separators as well, we need to add .starts_with?(".\\"), .starts_with?("..\\")

Co-authored-by: Johannes Müller <[email protected]>
Comment on lines 82 to 88
source = uri.to_s
# narrow down requirement
requirement = Any
if source.includes?("@")
source, version = source.split("@")
requirement = VersionReq.new("~> #{version}")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Splitting on "@" over the entire URL is error prone. It would also match the user info portion of a URI.
git:user@host:repo parses into source user with version requirement ~> host:repo.
This mechanism must be restricted to the path of the URI.

Even then, there's a chance for misinterpretation because @ is a completely valid character in a URI path. Might not be likely to encounter shards at such a path, but I'm wondering if it's worth it. IMO the @ syntax is not essential.
We could consider a different format for expressing version constraints concisely in the URL.

Nix flake uses query parameters to encode version information (e.g. git+https://example.com/foo/bar?branch=master)
https://nix.dev/manual/nix/2.28/command-ref/new-cli/nix3-flake#examples

I think this is not entirely elegant because query parameters have a different purpose. They're supposed to be resolved by the remote party.

The URI fragment is more elegant for this. It represents resource-internal information resolved by the local party.
Our URIs point to shard repositories, that's the resource. And navigating to a specific portion of a resource is the purpose of a fragment.

So URLs would look like this: github:Foo/Bar#1.2.3, git+https://example.com/foo/bar#1.2.3.

We can further enhance this by adding support for parameters to the fragment, thus being able to resolve fragments like #branch=master, or #tag=1.2.3-rc1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that # force the need of quoting in the shell?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You can use URLs with fragments perfectly fine in shell commands.

The shell considers # as a comment only if it's preceded by whitespace.

$ echo foo#bar
foo#bar
$ echo foo #bar
foo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fragments for version numbers with the same semantics currently have @ seems good to me.

Tag and branch (and revision/commit) can come later imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may work, but other package managers (cargo, bundler, ...) use @version, so using something else like #version is confusing, and # has a special meaning in shells (comment), which only increases the confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like more @ than # but I don't feel strongly. The only thing that makes me doubt is that @version could be expected to be = version instead of ~> version, because the at read.

# is more tidy for treating things as url. but then github:repo/owner is not exactly a url. I find github:repo/owner@version more intuitive than github:repo/owner#version but we need to pick one symbol.

It seems that # is a cleaner implementation than @ because we can rely on URI parsing, but there is no ambiguity, we can split @ in the last path component. I don't think we need to choose based on the implementation but rather by the UX, but again there is not much difference I think in that here.

Any final calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need to fix this now. We could leave version restrictions out of this first step and continue the discussion on that separately, while moving forward with producing an MVP. Having support for restrictions is useful, but not essential for a working prototype.
The syntax for version restrictions is closely connected to information for other details such as commit, branch and tag which we had already agreed to defer. Let's talk about all this together and in full.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do. I rather support github:owner/repo with version rather than all the other combinations we've been discussing. It seems that it will be more used IMO.

python pip uses @ for versioning also, and fragment for some additional thing.

I would stick with @ as the simplest syntax and if we need branch or other we can add # as an alternative later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy about jumping to a syntax for one use case without proper consideration of other related applications. Even if it's for the most common use case.
There are many important details to consider.
For example, the format used in pip and others references a concrete version: [email protected] means package foo at exactly version 1.2.3. In this proposal, however, [email protected] implicitly means package foo with a restriction of ~>1.2.3. Adopting an existing syntax but applying different semantics can be confusing to users.
I'd like us to be sure that's what we want, and we see no reasonable, better alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact shards encourage semver, calver seems good enough for me to make @ or whatever syntax that mentions just a version to mean ~>.

I would leave out for now any other version operator.

I don't mean to jump. I mean to discuss and decide. I don't want to not support ~> version restriction out in this first cut.

@bcardiff
Copy link
Member Author

bcardiff commented Jun 8, 2025

Made some changes to use # instead of @ while supporting it also in other syntax like git@ https:// etc and not just github:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants