config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
#ifndef URL_MATCH_H
|
2018-08-16 01:54:08 +08:00
|
|
|
#define URL_MATCH_H
|
|
|
|
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
#include "string-list.h"
|
2023-06-29 03:26:21 +08:00
|
|
|
#include "config.h"
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
|
|
|
|
struct url_info {
|
|
|
|
/* normalized url on success, must be freed, otherwise NULL */
|
|
|
|
char *url;
|
|
|
|
/* if !url, a brief reason for the failure, otherwise NULL */
|
|
|
|
const char *err;
|
|
|
|
|
|
|
|
/* the rest of the fields are only set if url != NULL */
|
|
|
|
|
|
|
|
size_t url_len; /* total length of url (which is now normalized) */
|
|
|
|
size_t scheme_len; /* length of scheme name (excluding final :) */
|
|
|
|
size_t user_off; /* offset into url to start of user name (0 => none) */
|
|
|
|
size_t user_len; /* length of user name; if user_off != 0 but
|
|
|
|
user_len == 0, an empty user name was given */
|
|
|
|
size_t passwd_off; /* offset into url to start of passwd (0 => none) */
|
|
|
|
size_t passwd_len; /* length of passwd; if passwd_off != 0 but
|
|
|
|
passwd_len == 0, an empty passwd was given */
|
|
|
|
size_t host_off; /* offset into url to start of host name (0 => none) */
|
2017-01-31 17:01:45 +08:00
|
|
|
size_t host_len; /* length of host name;
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
* file urls may have host_len == 0 */
|
2017-01-31 17:01:45 +08:00
|
|
|
size_t port_off; /* offset into url to start of port number (0 => none) */
|
|
|
|
size_t port_len; /* if a portnum is present (port_off != 0), it has
|
|
|
|
* this length (excluding the leading ':') starting
|
|
|
|
* from port_off (always 0 for file urls) */
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
size_t path_off; /* offset into url to the start of the url path;
|
|
|
|
* this will always point to a '/' character
|
|
|
|
* after the url has been normalized */
|
|
|
|
size_t path_len; /* length of path portion excluding any trailing
|
|
|
|
* '?...' and '#...' portion; will always be >= 1 */
|
|
|
|
};
|
|
|
|
|
2019-04-29 16:28:14 +08:00
|
|
|
char *url_normalize(const char *, struct url_info *);
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
|
2013-08-01 01:42:01 +08:00
|
|
|
struct urlmatch_item {
|
2017-01-31 17:01:46 +08:00
|
|
|
size_t hostmatch_len;
|
|
|
|
size_t pathmatch_len;
|
2013-08-01 01:42:01 +08:00
|
|
|
char user_matched;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct urlmatch_config {
|
|
|
|
struct string_list vars;
|
|
|
|
struct url_info url;
|
|
|
|
const char *section;
|
|
|
|
const char *key;
|
|
|
|
|
|
|
|
void *cb;
|
2023-06-29 03:26:21 +08:00
|
|
|
config_fn_t collect_fn;
|
|
|
|
config_fn_t cascade_fn;
|
credential: allow wildcard patterns when matching config
In some cases, a user will want to use a specific credential helper for
a wildcard pattern, such as https://*.corp.example.com. We have code
that handles this already with the urlmatch code, so let's use that
instead of our custom code.
Since the urlmatch code is a superset of our current matching in terms
of capabilities, there shouldn't be any cases of things that matched
previously that don't match now. However, in addition to wildcard
matching, we now use partial path matching, which can cause slightly
different behavior in the case that a helper applies to the prefix
(considering path components) of the remote URL. While different, this
is probably the behavior people were wanting anyway.
Since we're using the urlmatch code, we need to encode the components
we've gotten into a URL to match, so add a function to percent-encode
data and format the URL with it. We now also no longer need to the
custom code to match URLs, so let's remove it.
Additionally, the urlmatch code always looks for the best match, whereas
we want all matches for credential helpers to preserve existing
behavior. Let's add an optional field, select_fn, that lets us control
which items we want (in this case, all of them) and default it to the
best-match code that already exists for other users.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-20 10:24:13 +08:00
|
|
|
/*
|
|
|
|
* Compare the two matches, the one just discovered and the existing
|
|
|
|
* best match and return a negative value if the found item is to be
|
|
|
|
* rejected or a non-negative value if it is to be accepted. If this
|
|
|
|
* field is set to NULL, use the default comparison technique, which
|
|
|
|
* checks to ses if found is better (according to the urlmatch
|
|
|
|
* specificity rules) than existing.
|
|
|
|
*/
|
|
|
|
int (*select_fn)(const struct urlmatch_item *found, const struct urlmatch_item *existing);
|
2020-04-25 06:35:49 +08:00
|
|
|
/*
|
|
|
|
* An optional callback to allow e.g. for partial URLs; it shall
|
|
|
|
* return 1 or 0 depending whether `url` matches or not.
|
|
|
|
*/
|
|
|
|
int (*fallback_match_fn)(const char *url, void *cb);
|
2013-08-01 01:42:01 +08:00
|
|
|
};
|
|
|
|
|
2021-10-01 18:27:33 +08:00
|
|
|
#define URLMATCH_CONFIG_INIT { \
|
|
|
|
.vars = STRING_LIST_INIT_DUP, \
|
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
int urlmatch_config_entry(const char *var, const char *value,
|
|
|
|
const struct config_context *ctx, void *cb);
|
2022-03-05 02:32:07 +08:00
|
|
|
void urlmatch_config_release(struct urlmatch_config *config);
|
2013-08-01 01:42:01 +08:00
|
|
|
|
config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http "https://weak.example.com"]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com". The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.
https://weak.example.com/test
https://me@weak.example.com/test
The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://me@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none), but at a lower precedence than a config key with a user
name.
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:52:00 +08:00
|
|
|
#endif /* URL_MATCH_H */
|