gcc/libcpp
Marek Polacek 51c500269b libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026]
From a link below:
"An issue was discovered in the Bidirectional Algorithm in the Unicode
Specification through 14.0. It permits the visual reordering of
characters via control sequences, which can be used to craft source code
that renders different logic than the logical ordering of tokens
ingested by compilers and interpreters. Adversaries can leverage this to
encode source code for compilers accepting Unicode such that targeted
vulnerabilities are introduced invisibly to human reviewers."

More info:
https://nvd.nist.gov/vuln/detail/CVE-2021-42574
https://trojansource.codes/

This is not a compiler bug.  However, to mitigate the problem, this patch
implements -Wbidi-chars=[none|unpaired|any] to warn about possibly
misleading Unicode bidirectional control characters the preprocessor may
encounter.

The default is =unpaired, which warns about improperly terminated
bidirectional control characters; e.g. a LRE without its corresponding PDF.
The level =any warns about any use of bidirectional control characters.

This patch handles both UCNs and UTF-8 characters.  UCNs designating
bidi characters in identifiers are accepted since r204886.  Then r217144
enabled -fextended-identifiers by default.  Extended characters in C/C++
identifiers have been accepted since r275979.  However, this patch still
warns about mixing UTF-8 and UCN bidi characters; there seems to be no
good reason to allow mixing them.

We warn in different contexts: comments (both C and C++-style), string
literals, character constants, and identifiers.  Expectedly, UCNs are ignored
in comments and raw string literals.  The bidirectional control characters
can nest so this patch handles that as well.

I have not included nor tested this at all with Fortran (which also has
string literals and line comments).

Dave M. posted patches improving diagnostic involving Unicode characters.
This patch does not make use of this new infrastructure yet.

	PR preprocessor/103026

gcc/c-family/ChangeLog:

	* c.opt (Wbidi-chars, Wbidi-chars=): New option.

gcc/ChangeLog:

	* doc/invoke.texi: Document -Wbidi-chars.

libcpp/ChangeLog:

	* include/cpplib.h (enum cpp_bidirectional_level): New.
	(struct cpp_options): Add cpp_warn_bidirectional.
	(enum cpp_warning_reason): Add CPP_W_BIDIRECTIONAL.
	* internal.h (struct cpp_reader): Add warn_bidi_p member
	function.
	* init.c (cpp_create_reader): Set cpp_warn_bidirectional.
	* lex.c (bidi): New namespace.
	(get_bidi_utf8): New function.
	(get_bidi_ucn): Likewise.
	(maybe_warn_bidi_on_close): Likewise.
	(maybe_warn_bidi_on_char): Likewise.
	(_cpp_skip_block_comment): Implement warning about bidirectional
	control characters.
	(skip_line_comment): Likewise.
	(forms_identifier_p): Likewise.
	(lex_identifier): Likewise.
	(lex_string): Likewise.
	(lex_raw_string): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/Wbidi-chars-1.c: New test.
	* c-c++-common/Wbidi-chars-2.c: New test.
	* c-c++-common/Wbidi-chars-3.c: New test.
	* c-c++-common/Wbidi-chars-4.c: New test.
	* c-c++-common/Wbidi-chars-5.c: New test.
	* c-c++-common/Wbidi-chars-6.c: New test.
	* c-c++-common/Wbidi-chars-7.c: New test.
	* c-c++-common/Wbidi-chars-8.c: New test.
	* c-c++-common/Wbidi-chars-9.c: New test.
	* c-c++-common/Wbidi-chars-10.c: New test.
	* c-c++-common/Wbidi-chars-11.c: New test.
	* c-c++-common/Wbidi-chars-12.c: New test.
	* c-c++-common/Wbidi-chars-13.c: New test.
	* c-c++-common/Wbidi-chars-14.c: New test.
	* c-c++-common/Wbidi-chars-15.c: New test.
	* c-c++-common/Wbidi-chars-16.c: New test.
	* c-c++-common/Wbidi-chars-17.c: New test.
2021-11-16 21:56:16 -05:00
..
include libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
po Daily bump. 2021-08-17 00:16:32 +00:00
aclocal.m4 libcpp: Enable Intel CET on Intel CET enabled host for jit 2020-05-12 09:17:45 -07:00
ChangeLog Daily bump. 2021-11-02 00:16:32 +00:00
ChangeLog.jit Merger of dmalcolm/jit branch from git 2014-11-11 21:55:52 +00:00
charset.c diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
config.in Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856). 2018-10-31 17:03:16 +00:00
configure GCC_CET_HOST_FLAGS: Check if host supports multi-byte NOPs 2021-05-03 05:01:23 -07:00
configure.ac libcpp, libdecnumber: configure and substitute AR 2020-05-23 21:59:02 +00:00
directives.c libcpp: Fix _Pragma expansion [PR102409] 2021-10-29 22:55:32 +02:00
errors.c diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
expr.c preprocessor: Fix pp-number lexing of digit separators [PR83873, PR97604] 2021-05-06 23:20:35 +00:00
files.c diagnostics: Support for -finput-charset [PR93067] 2021-08-25 11:15:28 -04:00
generated_cpp_wcwidth.h libcpp: Update cpp_wcwidth() to Unicode 13.0.0 2020-11-07 09:36:43 -05:00
identifiers.c Update copyright years. 2021-01-04 10:26:59 +01:00
init.c libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
internal.h libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
lex.c libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] 2021-11-16 21:56:16 -05:00
line-map.c diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
location-example.txt PR preprocessor/83173: Enhance -fdump-internal-locations output 2018-11-27 16:04:31 +00:00
macro.c libcpp: Fix _Pragma expansion [PR102409] 2021-10-29 22:55:32 +02:00
Makefile.in Add install-dvi Makefile targets. 2021-10-22 15:43:50 -07:00
makeucnid.c libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
mkdeps.c preprocessor: Make quoting : [PR 95253] 2021-01-15 08:56:20 -08:00
pch.c Update copyright years. 2021-01-04 10:26:59 +01:00
symtab.c Update copyright years. 2021-01-04 10:26:59 +01:00
system.h Update copyright years. 2021-01-04 10:26:59 +01:00
traditional.c Update copyright years. 2021-01-04 10:26:59 +01:00
ucnid.h libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
ucnid.tab Update copyright years. 2021-01-04 10:26:59 +01:00