Skip to content

Add resolve() method (or function) #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pix666 opened this issue Apr 28, 2018 · 3 comments
Open

Add resolve() method (or function) #31

pix666 opened this issue Apr 28, 2018 · 3 comments

Comments

@pix666
Copy link

pix666 commented Apr 28, 2018

Here's the helper function I wrote:

from purl import URL
from urllib import parse

def resolve(base, url):
    """
    Resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF
    :param base: str|URL
    :param url: str|URL
    :return: URL
    """
    if isinstance(base, URL):
        baseurl = base
    else:
        baseurl = URL(base)

    if isinstance(url, URL):
        relurl = url
    else:
        relurl = URL(url)

    if relurl.host():
        return relurl

    if relurl.path():
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=parse.urljoin(baseurl.path(), relurl.path()),
            query=relurl.query(),
            fragment=relurl.fragment(),
        )
    elif relurl.query() or '?' in url:
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=baseurl.path(),
            query=relurl.query(),
            fragment=relurl.fragment(),
        )
    elif relurl.fragment() or '#' in url:
        return URL(
            scheme=baseurl.scheme(),
            host=baseurl.host(),
            port=baseurl.port(),
            path=baseurl.path(),
            query=baseurl.query(),
            fragment=relurl.fragment(),
        )
    return baseurl

Usage:

>>> base = URL('http://user:[email protected]:8080/path/to/some/doc.html?q=query#frag')
...
>>> print(resolve(base, '../home'))
http://user:[email protected]:8080/path/to/home

>>> print(resolve(base, 'doc2.html'))
http://user:[email protected]:8080/path/to/some/doc2.html

>>> print(resolve(base, '?'))
http://user:[email protected]:8080/path/to/some/doc.html

>>> print(resolve(base, '?q=git'))
http://user:[email protected]:8080/path/to/some/doc.html?q=git

>>> print(resolve(base, '#'))
http://user:[email protected]:8080/path/to/some/doc.html?q=query
@codeinthehole
Copy link
Owner

Am not sure about this one. I can see how the ../home request translates to a path change but values like ? already have an API in purl (setting the query string to an empty string).

Have you seen an API like this in another URL lib?

@pix666
Copy link
Author

pix666 commented Apr 30, 2018

Yes. url module in NodeJS: https://www.npmjs.com/package/url

resolve() is more than just urllib.parse.urljoin(). It resolves a target URL relative to a base URL in a manner similar to that of a Web browser resolving an anchor tag HREF.

Examples:

  1. <a href="../home">Go home</a> => change path + clear query + clear fragment.
  2. <a href="?">Page 1</a> => clear query + clear fragment.

Very useful for HTML parsing.

@codeinthehole
Copy link
Owner

Interesting. I'll add a similar method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants