gcc/libcpp
David Malcolm 1a7f2c0774 libcpp: escape non-ASCII source bytes in -Wbidi-chars= [PR103026]
This flags rich_locations associated with -Wbidi-chars= so that
non-ASCII bytes will be escaped when printing the source lines
(using the diagnostics support I added in
r12-4825-gbd5e882cf6e0def3dd1bc106075d59a303fe0d1e).

In particular, this ensures that the printed source lines will
be pure ASCII, and thus the visual ordering of the characters
will be the same as the logical ordering.

Before:

  Wbidi-chars-1.c: In function ‘main’:
  Wbidi-chars-1.c:6:43: warning: unpaired UTF-8 bidirectional control character detected [-Wbidi-chars=]
      6 |     /*‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only */
        |                                           ^
  Wbidi-chars-1.c:9:28: warning: unpaired UTF-8 bidirectional control character detected [-Wbidi-chars=]
      9 |     /* end admins only ‮ { ⁦*/
        |                            ^

  Wbidi-chars-11.c:6:15: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
      6 | int LRE_‪_PDF_\u202c;
        |               ^
  Wbidi-chars-11.c:8:19: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
      8 | int LRE_\u202a_PDF_‬_;
        |                   ^
  Wbidi-chars-11.c:10:28: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
     10 | const char *s1 = "LRE_‪_PDF_\u202c";
        |                            ^
  Wbidi-chars-11.c:12:33: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
     12 | const char *s2 = "LRE_\u202a_PDF_‬";
        |                                 ^

After:

  Wbidi-chars-1.c: In function ‘main’:
  Wbidi-chars-1.c:6:43: warning: unpaired UTF-8 bidirectional control character detected [-Wbidi-chars=]
      6 |     /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066> begin admins only */
        |                                                                           ^
  Wbidi-chars-1.c:9:28: warning: unpaired UTF-8 bidirectional control character detected [-Wbidi-chars=]
      9 |     /* end admins only <U+202E> { <U+2066>*/
        |                                            ^

  Wbidi-chars-11.c:6:15: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
      6 | int LRE_<U+202A>_PDF_\u202c;
        |                       ^
  Wbidi-chars-11.c:8:19: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
      8 | int LRE_\u202a_PDF_<U+202C>_;
        |                   ^
  Wbidi-chars-11.c:10:28: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
     10 | const char *s1 = "LRE_<U+202A>_PDF_\u202c";
        |                                    ^
  Wbidi-chars-11.c:12:33: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidi-chars=]
     12 | const char *s2 = "LRE_\u202a_PDF_<U+202C>";
        |                                 ^

libcpp/ChangeLog:
	PR preprocessor/103026
	* lex.c (maybe_warn_bidi_on_close): Use a rich_location
	and call set_escape_on_output (true) on it.
	(maybe_warn_bidi_on_char): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-17 17:32:30 -05:00
..
include libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
po Daily bump. 2021-08-17 00:16:32 +00:00
aclocal.m4 libcpp: Enable Intel CET on Intel CET enabled host for jit 2020-05-12 09:17:45 -07:00
ChangeLog Daily bump. 2021-11-02 00:16:32 +00:00
ChangeLog.jit Merger of dmalcolm/jit branch from git 2014-11-11 21:55:52 +00:00
charset.c diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
config.in Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856). 2018-10-31 17:03:16 +00:00
configure GCC_CET_HOST_FLAGS: Check if host supports multi-byte NOPs 2021-05-03 05:01:23 -07:00
configure.ac libcpp, libdecnumber: configure and substitute AR 2020-05-23 21:59:02 +00:00
directives.c libcpp: Fix _Pragma expansion [PR102409] 2021-10-29 22:55:32 +02:00
errors.c diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
expr.c preprocessor: Fix pp-number lexing of digit separators [PR83873, PR97604] 2021-05-06 23:20:35 +00:00
files.c diagnostics: Support for -finput-charset [PR93067] 2021-08-25 11:15:28 -04:00
generated_cpp_wcwidth.h libcpp: Update cpp_wcwidth() to Unicode 13.0.0 2020-11-07 09:36:43 -05:00
identifiers.c Update copyright years. 2021-01-04 10:26:59 +01:00
init.c libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
internal.h libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
lex.c libcpp: escape non-ASCII source bytes in -Wbidi-chars= [PR103026] 2021-11-17 17:32:30 -05:00
line-map.c diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
location-example.txt PR preprocessor/83173: Enhance -fdump-internal-locations output 2018-11-27 16:04:31 +00:00
macro.c libcpp: Fix _Pragma expansion [PR102409] 2021-10-29 22:55:32 +02:00
Makefile.in Add install-dvi Makefile targets. 2021-10-22 15:43:50 -07:00
makeucnid.c libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
mkdeps.c preprocessor: Make quoting : [PR 95253] 2021-01-15 08:56:20 -08:00
pch.c Update copyright years. 2021-01-04 10:26:59 +01:00
symtab.c Update copyright years. 2021-01-04 10:26:59 +01:00
system.h Update copyright years. 2021-01-04 10:26:59 +01:00
traditional.c Update copyright years. 2021-01-04 10:26:59 +01:00
ucnid.h libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
ucnid.tab Update copyright years. 2021-01-04 10:26:59 +01:00