libcpp: reject codepoints above 0x10FFFF

Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

	* charset.cc: Reject encodings of codepoints above 0x10FFFF.
	UTF-16 does not support such codepoints and therefore all
	Unicode rejects such values.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
This commit is contained in:
Ben Boeckel 2023-06-06 16:50:22 -04:00 committed by Jason Merrill
parent dbcbc858c7
commit c1dbaa6656

View File

@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp); int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp);
if (err) if (err)
return false; return false;
/* Additionally, Unicode declares that all codepoints above 0010FFFF are
invalid because they cannot be represented in UTF-16.
Reject such values.*/
if (cp >= UCS_LIMIT)
return false;
} }
/* No problems encountered. */ /* No problems encountered. */
return true; return true;