git/gettext.c

213 lines
5.4 KiB
C
Raw Normal View History

/*
* Copyright (c) 2010 Ævar Arnfjörð Bjarmason
*/
#include "cache.h"
#include "exec-cmd.h"
#include "gettext.h"
#include "strbuf.h"
#include "utf8.h"
i18n: make GETTEXT_POISON a runtime option Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON test parameter to only be a GIT_TEST_GETTEXT_POISON=<non-empty?> runtime parameter, to be consistent with other parameters documented in "Running tests with special setups" in t/README. When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON to simulate unfriendly translator", 2011-02-22) I was concerned with ensuring that the _() function would get constant folded if NO_GETTEXT was defined, and likewise that GETTEXT_POISON would be compiled out unless it was defined. But as the benchmark in my [1] shows doing a one-off runtime getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was originally added the GIT_TEST_* env variables have become the common idiom for turning on special test setups. So change GETTEXT_POISON to work the same way. Now the GETTEXT_POISON=YesPlease compile-time option is gone, and running the tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off without recompiling. This allows for conditionally amending tests to test with/without poison, similar to what 859fdc0c3c ("commit-graph: define GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do some of that, now we e.g. always run the t0205-gettext-poison.sh test. I did enough there to remove the GETTEXT_POISON prerequisite, but its inverse C_LOCALE_OUTPUT is still around, and surely some tests using it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=. Notes on the implementation: * We still compile a dedicated GETTEXT_POISON build in Travis CI. Perhaps this should be revisited and integrated into the "linux-gcc" build, see ae59a4e44f ("travis: run tests with GIT_TEST_SPLIT_INDEX", 2018-01-07) for prior art in that area. Then again maybe not, see [2]. * We now skip a test in t0000-basic.sh under GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This test relies on C locale output, but due to an edge case in how the previous implementation of GETTEXT_POISON worked (reading it from GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does, and needs to be skipped. * The getenv() function is not reentrant, so out of paranoia about code of the form: printf(_("%s"), getenv("some-env")); call use_gettext_poison() in our early setup in git_setup_gettext() so we populate the "poison_requested" variable in a codepath that's won't suffer from that race condition. * We error out in the Makefile if you're still saying GETTEXT_POISON=YesPlease to prompt users to change their invocation. * We should not print out poisoned messages during the test initialization itself to keep it more readable, so the test library hides the variable if set in $GIT_TEST_GETTEXT_POISON_ORIG during setup. See [3]. See also [4] for more on the motivation behind this patch, and the history of the GETTEXT_POISON facility. 1. https://public-inbox.org/git/871s8gd32p.fsf@evledraar.gmail.com/ 2. https://public-inbox.org/git/20181102163725.GY30222@szeder.dev/ 3. https://public-inbox.org/git/20181022202241.18629-2-szeder.dev@gmail.com/ 4. https://public-inbox.org/git/878t2pd6yu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-09 05:15:29 +08:00
#include "config.h"
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
#ifndef NO_GETTEXT
# include <locale.h>
# include <libintl.h>
# ifdef HAVE_LIBCHARSET_H
# include <libcharset.h>
# else
# include <langinfo.h>
# define locale_charset() nl_langinfo(CODESET)
# endif
#endif
static const char *charset;
/*
* Guess the user's preferred languages from the value in LANGUAGE environment
* variable and LC_MESSAGES locale category if NO_GETTEXT is not defined.
*
* The result can be a colon-separated list like "ko:ja:en".
*/
const char *get_preferred_languages(void)
{
const char *retval;
retval = getenv("LANGUAGE");
if (retval && *retval)
return retval;
#ifndef NO_GETTEXT
retval = setlocale(LC_MESSAGES, NULL);
if (retval && *retval &&
strcmp(retval, "C") &&
strcmp(retval, "POSIX"))
return retval;
#endif
return NULL;
}
int use_gettext_poison(void)
{
static int poison_requested = -1;
i18n: make GETTEXT_POISON a runtime option Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON test parameter to only be a GIT_TEST_GETTEXT_POISON=<non-empty?> runtime parameter, to be consistent with other parameters documented in "Running tests with special setups" in t/README. When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON to simulate unfriendly translator", 2011-02-22) I was concerned with ensuring that the _() function would get constant folded if NO_GETTEXT was defined, and likewise that GETTEXT_POISON would be compiled out unless it was defined. But as the benchmark in my [1] shows doing a one-off runtime getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was originally added the GIT_TEST_* env variables have become the common idiom for turning on special test setups. So change GETTEXT_POISON to work the same way. Now the GETTEXT_POISON=YesPlease compile-time option is gone, and running the tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off without recompiling. This allows for conditionally amending tests to test with/without poison, similar to what 859fdc0c3c ("commit-graph: define GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do some of that, now we e.g. always run the t0205-gettext-poison.sh test. I did enough there to remove the GETTEXT_POISON prerequisite, but its inverse C_LOCALE_OUTPUT is still around, and surely some tests using it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=. Notes on the implementation: * We still compile a dedicated GETTEXT_POISON build in Travis CI. Perhaps this should be revisited and integrated into the "linux-gcc" build, see ae59a4e44f ("travis: run tests with GIT_TEST_SPLIT_INDEX", 2018-01-07) for prior art in that area. Then again maybe not, see [2]. * We now skip a test in t0000-basic.sh under GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This test relies on C locale output, but due to an edge case in how the previous implementation of GETTEXT_POISON worked (reading it from GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does, and needs to be skipped. * The getenv() function is not reentrant, so out of paranoia about code of the form: printf(_("%s"), getenv("some-env")); call use_gettext_poison() in our early setup in git_setup_gettext() so we populate the "poison_requested" variable in a codepath that's won't suffer from that race condition. * We error out in the Makefile if you're still saying GETTEXT_POISON=YesPlease to prompt users to change their invocation. * We should not print out poisoned messages during the test initialization itself to keep it more readable, so the test library hides the variable if set in $GIT_TEST_GETTEXT_POISON_ORIG during setup. See [3]. See also [4] for more on the motivation behind this patch, and the history of the GETTEXT_POISON facility. 1. https://public-inbox.org/git/871s8gd32p.fsf@evledraar.gmail.com/ 2. https://public-inbox.org/git/20181102163725.GY30222@szeder.dev/ 3. https://public-inbox.org/git/20181022202241.18629-2-szeder.dev@gmail.com/ 4. https://public-inbox.org/git/878t2pd6yu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-09 05:15:29 +08:00
if (poison_requested == -1) {
const char *v = getenv("GIT_TEST_GETTEXT_POISON");
poison_requested = v && strlen(v) ? 1 : 0;
}
return poison_requested;
}
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
#ifndef NO_GETTEXT
static int test_vsnprintf(const char *fmt, ...)
{
char buf[26];
int ret;
va_list ap;
va_start(ap, fmt);
ret = vsnprintf(buf, sizeof(buf), fmt, ap);
va_end(ap);
return ret;
}
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
static void init_gettext_charset(const char *domain)
{
/*
This trick arranges for messages to be emitted in the user's
requested encoding, but avoids setting LC_CTYPE from the
environment for the whole program.
This primarily done to avoid a bug in vsnprintf in the GNU C
Library [1]. which triggered a "your vsnprintf is broken" error
on Git's own repository when inspecting v0.99.6~1 under a UTF-8
locale.
That commit contains a ISO-8859-1 encoded author name, which
the locale aware vsnprintf(3) won't interpolate in the format
argument, due to mismatch between the data encoding and the
locale.
Even if it wasn't for that bug we wouldn't want to use LC_CTYPE at
this point, because it'd require auditing all the code that uses C
functions whose semantics are modified by LC_CTYPE.
But only setting LC_MESSAGES as we do creates a problem, since
we declare the encoding of our PO files[2] the gettext
implementation will try to recode it to the user's locale, but
without LC_CTYPE it'll emit something like this on 'git init'
under the Icelandic locale:
Bj? til t?ma Git lind ? /hlagh/.git/
Gettext knows about the encoding of our PO file, but we haven't
told it about the user's encoding, so all the non-US-ASCII
characters get encoded to question marks.
But we're in luck! We can set LC_CTYPE from the environment
only while we call nl_langinfo and
bind_textdomain_codeset. That suffices to tell gettext what
encoding it should emit in, so it'll now say:
Bjó til tóma Git lind í /hlagh/.git/
And the equivalent ISO-8859-1 string will be emitted under a
ISO-8859-1 locale.
With this change way we get the advantages of setting LC_CTYPE
(talk to the user in his language/encoding), without the major
drawbacks (changed semantics for C functions we rely on).
However foreign functions using other message catalogs that
aren't using our neat trick will still have a problem, e.g. if
we have to call perror(3):
#include <stdio.h>
#include <locale.h>
#include <errno.h>
int main(void)
{
setlocale(LC_MESSAGES, "");
setlocale(LC_CTYPE, "C");
errno = ENODEV;
perror("test");
return 0;
}
Running that will give you a message with question marks:
$ LANGUAGE= LANG=de_DE.utf8 ./test
test: Kein passendes Ger?t gefunden
The vsnprintf bug has been fixed since glibc 2.17.
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
Then we could simply set LC_CTYPE from the environment, which would
make things like the external perror(3) messages work.
See t/t0203-gettext-setlocale-sanity.sh's "gettext.c" tests for
regression tests.
1. http://sourceware.org/bugzilla/show_bug.cgi?id=6530
2. E.g. "Content-Type: text/plain; charset=UTF-8\n" in po/is.po
*/
setlocale(LC_CTYPE, "");
charset = locale_charset();
bind_textdomain_codeset(domain, charset);
/* the string is taken from v0.99.6~1 */
if (test_vsnprintf("%.*s", 13, "David_K\345gedal") < 0)
setlocale(LC_CTYPE, "C");
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
}
void git_setup_gettext(void)
{
const char *podir = getenv(GIT_TEXT_DOMAIN_DIR_ENVIRONMENT);
char *p = NULL;
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
if (!podir)
podir = p = system_path(GIT_LOCALE_PATH);
i18n: make GETTEXT_POISON a runtime option Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON test parameter to only be a GIT_TEST_GETTEXT_POISON=<non-empty?> runtime parameter, to be consistent with other parameters documented in "Running tests with special setups" in t/README. When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON to simulate unfriendly translator", 2011-02-22) I was concerned with ensuring that the _() function would get constant folded if NO_GETTEXT was defined, and likewise that GETTEXT_POISON would be compiled out unless it was defined. But as the benchmark in my [1] shows doing a one-off runtime getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was originally added the GIT_TEST_* env variables have become the common idiom for turning on special test setups. So change GETTEXT_POISON to work the same way. Now the GETTEXT_POISON=YesPlease compile-time option is gone, and running the tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off without recompiling. This allows for conditionally amending tests to test with/without poison, similar to what 859fdc0c3c ("commit-graph: define GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do some of that, now we e.g. always run the t0205-gettext-poison.sh test. I did enough there to remove the GETTEXT_POISON prerequisite, but its inverse C_LOCALE_OUTPUT is still around, and surely some tests using it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=. Notes on the implementation: * We still compile a dedicated GETTEXT_POISON build in Travis CI. Perhaps this should be revisited and integrated into the "linux-gcc" build, see ae59a4e44f ("travis: run tests with GIT_TEST_SPLIT_INDEX", 2018-01-07) for prior art in that area. Then again maybe not, see [2]. * We now skip a test in t0000-basic.sh under GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This test relies on C locale output, but due to an edge case in how the previous implementation of GETTEXT_POISON worked (reading it from GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does, and needs to be skipped. * The getenv() function is not reentrant, so out of paranoia about code of the form: printf(_("%s"), getenv("some-env")); call use_gettext_poison() in our early setup in git_setup_gettext() so we populate the "poison_requested" variable in a codepath that's won't suffer from that race condition. * We error out in the Makefile if you're still saying GETTEXT_POISON=YesPlease to prompt users to change their invocation. * We should not print out poisoned messages during the test initialization itself to keep it more readable, so the test library hides the variable if set in $GIT_TEST_GETTEXT_POISON_ORIG during setup. See [3]. See also [4] for more on the motivation behind this patch, and the history of the GETTEXT_POISON facility. 1. https://public-inbox.org/git/871s8gd32p.fsf@evledraar.gmail.com/ 2. https://public-inbox.org/git/20181102163725.GY30222@szeder.dev/ 3. https://public-inbox.org/git/20181022202241.18629-2-szeder.dev@gmail.com/ 4. https://public-inbox.org/git/878t2pd6yu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-09 05:15:29 +08:00
use_gettext_poison(); /* getenv() reentrancy paranoia */
if (!is_directory(podir)) {
free(p);
return;
}
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
bindtextdomain("git", podir);
setlocale(LC_MESSAGES, "");
setlocale(LC_TIME, "");
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
init_gettext_charset("git");
textdomain("git");
free(p);
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
}
/* return the number of columns of string 's' in current locale */
int gettext_width(const char *s)
{
static int is_utf8 = -1;
if (is_utf8 == -1)
is_utf8 = is_utf8_locale();
return is_utf8 ? utf8_strwidth(s) : strlen(s);
}
i18n: add infrastructure for translating Git with gettext Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 07:14:42 +08:00
#endif
int is_utf8_locale(void)
{
#ifdef NO_GETTEXT
if (!charset) {
const char *env = getenv("LC_ALL");
if (!env || !*env)
env = getenv("LC_CTYPE");
if (!env || !*env)
env = getenv("LANG");
if (!env)
env = "";
if (strchr(env, '.'))
env = strchr(env, '.') + 1;
charset = xstrdup(env);
}
#endif
return is_encoding_utf8(charset);
}