Skip to content

Dependency cli parsing #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
75 changes: 75 additions & 0 deletions spec/unit/dependency_definition_spec.cr
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
require "./spec_helper"
require "../../src/dependency_definition"

private def expect_parses(value, resolver_key : String, source : String, requirement : Shards::Requirement)
Shards::DependencyDefinition.parts_from_cli(value).should eq(Shards::DependencyDefinition::Parts.new(resolver_key: resolver_key, source: source, requirement: requirement))
end

module Shards
describe DependencyDefinition do
it ".parts_from_cli" do
# GitHub short syntax
expect_parses("github:foo/bar", "github", "foo/bar", Any)
expect_parses("github:Foo/[email protected]", "github", "Foo/Bar", VersionReq.new("~> 1.2.3"))

# GitHub urls
expect_parses("https://github.com/foo/bar", "github", "foo/bar", Any)
Copy link
Member

@straight-shoota straight-shoota May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm wondering about why this gets normalized to github resolver instead of "git", "https://github.com/foo/bar".
Both options are valid. So we only need to decide which one we want to pick.
I suppose the reason for github is that it's more concise? That's nice but not strictly necessary.
git would be closer to the original input.

Copy link
Member Author

@bcardiff bcardiff May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me closer to the intent is to have github: foo/bar, because I'm assuming the user is copy-pasting the url in the browser. We do preserve the format with trailing .git as those are copied from the clone popup.

At some point maybe it's worth configuring shards to map all github source to be resolved in some specific way, but is a separate story.


# GitHub urls from clone popup
expect_parses("https://github.com/foo/bar.git", "github", "foo/bar", Any)
expect_parses("[email protected]:foo/bar.git", "git", "[email protected]:foo/bar.git", Any)

# GitLab short syntax
expect_parses("gitlab:foo/bar", "gitlab", "foo/bar", Any)

# GitLab urls
expect_parses("https://gitlab.com/foo/bar", "gitlab", "foo/bar", Any)

# GitLab urls from clone popup
expect_parses("https://gitlab.com/foo/bar.git", "gitlab", "foo/bar", Any)
expect_parses("[email protected]:foo/bar.git", "git", "[email protected]:foo/bar.git", requirement: Any)

# Bitbucket short syntax
expect_parses("bitbucket:foo/bar", "bitbucket", "foo/bar", Any)

# bitbucket urls
expect_parses("https://bitbucket.com/foo/bar", "bitbucket", "foo/bar", Any)

# unknown https urls
expect_raises Shards::Error, "Cannot determine resolver for HTTPS URI" do
Shards::DependencyDefinition.parts_from_cli("https://example.com/foo/bar")
end

# Git convenient syntax since resolver matches scheme
expect_parses("git://git.example.org/crystal-library.git", "git", "git://git.example.org/crystal-library.git", Any)
expect_parses("[email protected]:foo/bar.git", "git", "[email protected]:foo/bar.git", Any)

# Local paths
local_absolute = "/an/absolute/path"
local_relative = "an/relative/path"

# Path short syntax
expect_parses("../#{local_relative}", "path", "../#{local_relative}", Any)
{% if flag?(:windows) %}
expect_parses(".\\relative\\windows", "path", "./relative/windows", Any)
expect_parses("..\\relative\\windows", "path", "../relative/windows", Any)
{% else %}
expect_parses("./#{local_relative}", "path", "./#{local_relative}", Any)
{% end %}
# Path file schema
expect_raises Shards::Error, "Invalid file URI" do
Shards::DependencyDefinition.parts_from_cli("file://#{local_relative}")
end
expect_parses("file:#{local_relative}", "path", local_relative, Any)
expect_parses("file:#{local_absolute}", "path", local_absolute, Any)
expect_parses("file://#{local_absolute}", "path", local_absolute, Any)
# Path resolver syntax
expect_parses("path:#{local_absolute}", "path", local_absolute, Any)
expect_parses("path:#{local_relative}", "path", local_relative, Any)
# Other resolvers short
expect_parses("git:git://git.example.org/crystal-library.git", "git", "git://git.example.org/crystal-library.git", Any)
expect_parses("git+https://example.org/foo/bar", "git", "https://example.org/foo/bar", Any)
expect_parses("git:https://example.org/foo/bar", "git", "https://example.org/foo/bar", Any)
end
end
end
1 change: 1 addition & 0 deletions src/dependency.cr
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ module Shards
end
end

# Used to generate the shard.lock file.
def to_yaml(yaml : YAML::Builder)
yaml.scalar name
yaml.mapping do
Expand Down
96 changes: 96 additions & 0 deletions src/dependency_definition.cr
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
require "./dependency"

module Shards
class DependencyDefinition
record Parts, resolver_key : String, source : String, requirement : Requirement = Any

property dependency : Dependency
# resolver's key and source are normalized. We preserve the key and source to be used
# in the shard.yml file in these field. This is used to generate the shard.yml file
# in a more human-readable way.
property resolver_key : String
property source : String

def initialize(@dependency : Dependency, @resolver_key : String, @source : String)
end

# Used to generate the shard.yml file.
def to_yaml(yaml : YAML::Builder)
yaml.scalar dependency.name
yaml.mapping do
yaml.scalar resolver_key
yaml.scalar source
dependency.requirement.to_yaml(yaml)
end
end

# Parse a dependency from a CLI argument
def self.from_cli(value : String) : DependencyDefinition
parts = parts_from_cli(value)

# We need to check the actual shard name to create a dependency.
# This requires getting the actual spec file from some matching version.
resolver = Resolver.find_resolver(parts.resolver_key, "unknown", parts.source)
version = resolver.versions_for(parts.requirement).first || raise Shards::Error.new("No versions found for dependency: #{value}")
spec = resolver.spec(version)
name = spec.name || raise Shards::Error.new("No name found for dependency: #{value}")

DependencyDefinition.new(Dependency.new(name, resolver, parts.requirement), parts.resolver_key, parts.source)
end

# :nodoc:
#
# Parse the dependency from a CLI argument
# and return the parts needed to create the proper dependency.
#
# Split to allow better unit testing.
def self.parts_from_cli(value : String) : Parts
uri = URI.parse(value)

case scheme = uri.scheme
when Nil
case value
when .starts_with?("./"), .starts_with?("../")
Parts.new("path", Path[value].to_posix.to_s)
when .starts_with?(".\\"), .starts_with?("..\\")
{% if flag?(:windows) %}
Parts.new("path", Path[value].to_posix.to_s)
{% else %}
raise Shards::Error.new("Invalid dependency format: #{value}")
{% end %}
when .starts_with?("git@")
Parts.new("git", value)
else
raise Shards::Error.new("Invalid dependency format: #{value}")
end
when "file"
raise Shards::Error.new("Invalid file URI: #{uri}") if !uri.host.in?(nil, "", "localhost") || uri.port || uri.user
Parts.new("path", uri.path)
when "https"
if resolver_key = GitResolver::KNOWN_PROVIDERS[uri.host]?
Parts.new(resolver_key, uri.path[1..-1].rchop(".git")) # drop first "/""
else
raise Shards::Error.new("Cannot determine resolver for HTTPS URI: #{value}")
end
else
scheme, _, subscheme = scheme.partition('+')
subscheme = subscheme.presence
if Resolver.find_class(scheme)
if uri.host.nil? || subscheme
uri.scheme = subscheme
end
source = uri.to_s
# narrow down requirement
requirement = Any
if source.includes?("@")
source, version = source.split("@")
requirement = VersionReq.new("~> #{version}")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Splitting on "@" over the entire URL is error prone. It would also match the user info portion of a URI.
git:user@host:repo parses into source user with version requirement ~> host:repo.
This mechanism must be restricted to the path of the URI.

Even then, there's a chance for misinterpretation because @ is a completely valid character in a URI path. Might not be likely to encounter shards at such a path, but I'm wondering if it's worth it. IMO the @ syntax is not essential.
We could consider a different format for expressing version constraints concisely in the URL.

Nix flake uses query parameters to encode version information (e.g. git+https://example.com/foo/bar?branch=master)
https://nix.dev/manual/nix/2.28/command-ref/new-cli/nix3-flake#examples

I think this is not entirely elegant because query parameters have a different purpose. They're supposed to be resolved by the remote party.

The URI fragment is more elegant for this. It represents resource-internal information resolved by the local party.
Our URIs point to shard repositories, that's the resource. And navigating to a specific portion of a resource is the purpose of a fragment.

So URLs would look like this: github:Foo/Bar#1.2.3, git+https://example.com/foo/bar#1.2.3.

We can further enhance this by adding support for parameters to the fragment, thus being able to resolve fragments like #branch=master, or #tag=1.2.3-rc1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that # force the need of quoting in the shell?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You can use URLs with fragments perfectly fine in shell commands.

The shell considers # as a comment only if it's preceded by whitespace.

$ echo foo#bar
foo#bar
$ echo foo #bar
foo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fragments for version numbers with the same semantics currently have @ seems good to me.

Tag and branch (and revision/commit) can come later imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may work, but other package managers (cargo, bundler, ...) use @version, so using something else like #version is confusing, and # has a special meaning in shells (comment), which only increases the confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like more @ than # but I don't feel strongly. The only thing that makes me doubt is that @version could be expected to be = version instead of ~> version, because the at read.

# is more tidy for treating things as url. but then github:repo/owner is not exactly a url. I find github:repo/owner@version more intuitive than github:repo/owner#version but we need to pick one symbol.

It seems that # is a cleaner implementation than @ because we can rely on URI parsing, but there is no ambiguity, we can split @ in the last path component. I don't think we need to choose based on the implementation but rather by the UX, but again there is not much difference I think in that here.

Any final calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need to fix this now. We could leave version restrictions out of this first step and continue the discussion on that separately, while moving forward with producing an MVP. Having support for restrictions is useful, but not essential for a working prototype.
The syntax for version restrictions is closely connected to information for other details such as commit, branch and tag which we had already agreed to defer. Let's talk about all this together and in full.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do. I rather support github:owner/repo with version rather than all the other combinations we've been discussing. It seems that it will be more used IMO.

python pip uses @ for versioning also, and fragment for some additional thing.

I would stick with @ as the simplest syntax and if we need branch or other we can add # as an alternative later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy about jumping to a syntax for one use case without proper consideration of other related applications. Even if it's for the most common use case.
There are many important details to consider.
For example, the format used in pip and others references a concrete version: [email protected] means package foo at exactly version 1.2.3. In this proposal, however, [email protected] implicitly means package foo with a restriction of ~>1.2.3. Adopting an existing syntax but applying different semantics can be confusing to users.
I'd like us to be sure that's what we want, and we see no reasonable, better alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact shards encourage semver, calver seems good enough for me to make @ or whatever syntax that mentions just a version to mean ~>.

I would leave out for now any other version operator.

I don't mean to jump. I mean to discuss and decide. I don't want to not support ~> version restriction out in this first cut.


return Parts.new(scheme, source, requirement)
end
raise Shards::Error.new("Invalid dependency format: #{value}")
end
end
end
end
20 changes: 10 additions & 10 deletions src/resolvers/git.cr
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,15 @@ module Shards
"git"
end

private KNOWN_PROVIDERS = {
"www.github.com",
"github.com",
"www.bitbucket.com",
"bitbucket.com",
"www.gitlab.com",
"gitlab.com",
"www.codeberg.org",
"codeberg.org",
KNOWN_PROVIDERS = {
"www.github.com" => "github",
"github.com" => "github",
"www.bitbucket.com" => "bitbucket",
"bitbucket.com" => "bitbucket",
"www.gitlab.com" => "gitlab",
"gitlab.com" => "gitlab",
"www.codeberg.org" => "codeberg",
"codeberg.org" => "codeberg",
}

def self.normalize_key_source(key : String, source : String) : {String, String}
Expand All @@ -117,7 +117,7 @@ module Shards
uri = URI.parse(source)
downcased_host = uri.host.try &.downcase
scheme = uri.scheme.try &.downcase
if scheme.in?("git", "http", "https") && downcased_host && downcased_host.in?(KNOWN_PROVIDERS)
if scheme.in?("git", "http", "https") && downcased_host && downcased_host.in?(KNOWN_PROVIDERS.keys)
# browsers are requested to enforce HTTP Strict Transport Security
uri.scheme = "https"
downcased_path = uri.path.downcase
Expand Down
2 changes: 1 addition & 1 deletion src/resolvers/resolver.cr
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ module Shards
end

private record ResolverCacheKey, key : String, name : String, source : String
private RESOLVER_CLASSES = {} of String => Resolver.class
RESOLVER_CLASSES = {} of String => Resolver.class
private RESOLVER_CACHE = {} of ResolverCacheKey => Resolver

def self.register_resolver(key, resolver)
Expand Down