config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
#ifndef URL_MATCH_H
|
2018-08-16 01:54:08 +08:00
|
|
|
#define URL_MATCH_H
|
|
|
|
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
#include "string-list.h"
|
|
|
|
|
|
|
|
struct url_info {
|
|
|
|
/* normalized url on success, must be freed, otherwise NULL */
|
|
|
|
char *url;
|
|
|
|
/* if !url, a brief reason for the failure, otherwise NULL */
|
|
|
|
const char *err;
|
|
|
|
|
|
|
|
/* the rest of the fields are only set if url != NULL */
|
|
|
|
|
|
|
|
size_t url_len; /* total length of url (which is now normalized) */
|
|
|
|
size_t scheme_len; /* length of scheme name (excluding final :) */
|
|
|
|
size_t user_off; /* offset into url to start of user name (0 => none) */
|
|
|
|
size_t user_len; /* length of user name; if user_off != 0 but
|
|
|
|
user_len == 0, an empty user name was given */
|
|
|
|
size_t passwd_off; /* offset into url to start of passwd (0 => none) */
|
|
|
|
size_t passwd_len; /* length of passwd; if passwd_off != 0 but
|
|
|
|
passwd_len == 0, an empty passwd was given */
|
|
|
|
size_t host_off; /* offset into url to start of host name (0 => none) */
|
2017-01-31 17:01:45 +08:00
|
|
|
size_t host_len; /* length of host name;
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
* file urls may have host_len == 0 */
|
2017-01-31 17:01:45 +08:00
|
|
|
size_t port_off; /* offset into url to start of port number (0 => none) */
|
|
|
|
size_t port_len; /* if a portnum is present (port_off != 0), it has
|
|
|
|
* this length (excluding the leading ':') starting
|
|
|
|
* from port_off (always 0 for file urls) */
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
size_t path_off; /* offset into url to the start of the url path;
|
|
|
|
* this will always point to a '/' character
|
|
|
|
* after the url has been normalized */
|
|
|
|
size_t path_len; /* length of path portion excluding any trailing
|
|
|
|
* '?...' and '#...' portion; will always be >= 1 */
|
|
|
|
};
|
|
|
|
|
2019-04-29 16:28:14 +08:00
|
|
|
char *url_normalize(const char *, struct url_info *);
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
|
2013-08-01 01:42:01 +08:00
|
|
|
struct urlmatch_item {
|
2017-01-31 17:01:46 +08:00
|
|
|
size_t hostmatch_len;
|
|
|
|
size_t pathmatch_len;
|
2013-08-01 01:42:01 +08:00
|
|
|
char user_matched;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct urlmatch_config {
|
|
|
|
struct string_list vars;
|
|
|
|
struct url_info url;
|
|
|
|
const char *section;
|
|
|
|
const char *key;
|
|
|
|
|
|
|
|
void *cb;
|
|
|
|
int (*collect_fn)(const char *var, const char *value, void *cb);
|
|
|
|
int (*cascade_fn)(const char *var, const char *value, void *cb);
|
credential: allow wildcard patterns when matching config
In some cases, a user will want to use a specific credential helper for
a wildcard pattern, such as https://*.corp.example.com. We have code
that handles this already with the urlmatch code, so let's use that
instead of our custom code.
Since the urlmatch code is a superset of our current matching in terms
of capabilities, there shouldn't be any cases of things that matched
previously that don't match now. However, in addition to wildcard
matching, we now use partial path matching, which can cause slightly
different behavior in the case that a helper applies to the prefix
(considering path components) of the remote URL. While different, this
is probably the behavior people were wanting anyway.
Since we're using the urlmatch code, we need to encode the components
we've gotten into a URL to match, so add a function to percent-encode
data and format the URL with it. We now also no longer need to the
custom code to match URLs, so let's remove it.
Additionally, the urlmatch code always looks for the best match, whereas
we want all matches for credential helpers to preserve existing
behavior. Let's add an optional field, select_fn, that lets us control
which items we want (in this case, all of them) and default it to the
best-match code that already exists for other users.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-20 10:24:13 +08:00
|
|
|
/*
|
|
|
|
* Compare the two matches, the one just discovered and the existing
|
|
|
|
* best match and return a negative value if the found item is to be
|
|
|
|
* rejected or a non-negative value if it is to be accepted. If this
|
|
|
|
* field is set to NULL, use the default comparison technique, which
|
|
|
|
* checks to ses if found is better (according to the urlmatch
|
|
|
|
* specificity rules) than existing.
|
|
|
|
*/
|
|
|
|
int (*select_fn)(const struct urlmatch_item *found, const struct urlmatch_item *existing);
|
2020-04-25 06:35:49 +08:00
|
|
|
/*
|
|
|
|
* An optional callback to allow e.g. for partial URLs; it shall
|
|
|
|
* return 1 or 0 depending whether `url` matches or not.
|
|
|
|
*/
|
|
|
|
int (*fallback_match_fn)(const char *url, void *cb);
|
2013-08-01 01:42:01 +08:00
|
|
|
};
|
|
|
|
|
2021-10-01 18:27:33 +08:00
|
|
|
#define URLMATCH_CONFIG_INIT { \
|
|
|
|
.vars = STRING_LIST_INIT_DUP, \
|
|
|
|
}
|
|
|
|
|
2019-04-29 16:28:14 +08:00
|
|
|
int urlmatch_config_entry(const char *var, const char *value, void *cb);
|
2022-03-05 02:32:07 +08:00
|
|
|
void urlmatch_config_release(struct urlmatch_config *config);
|
2013-08-01 01:42:01 +08:00
|
|
|
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
#endif /* URL_MATCH_H */
|