gcc/libcpp/ucnid.tab

244 lines
6.3 KiB
Plaintext
Raw Permalink Normal View History

Index: gcc/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * doc/cppopts.texi (-fexec-charset): Add concept index entry. (-fwide-exec-charset): Likewise. (-finput-charset): Likewise. * doc/invoke.texi (Warning Options): Document -Wnormalized=. * c-opts.c (c_common_handle_option): Handle -Wnormalized=. * c.opt (Wnormalized): New. Index: libcpp/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * init.c (cpp_create_reader): Default warn_normalize to normalized_C. * charset.c: Update for new format of ucnid.h. (ucn_valid_in_identifier): Update for new format of ucnid.h. Add NST parameter, and update it; update callers. (cpp_valid_ucn): Add NST parameter, update callers. Replace abort with cpp_error. (convert_ucn): Pass normalize_state to cpp_valid_ucn. * internal.h (struct normalize_state): New. (INITIAL_NORMALIZE_STATE): New. (NORMALIZE_STATE_RESULT): New. (NORMALIZE_STATE_UPDATE_IDNUM): New. (_cpp_valid_ucn): New. * lex.c (warn_about_normalization): New. (forms_identifier_p): Add normalize_state parameter, update callers. (lex_identifier): Add normalize_state parameter, update callers. Keep the state current. (lex_number): Likewise. (_cpp_lex_direct): Pass normalize_state to subroutines. Check it with warn_about_normalization. * makeucnid.c: New. * ucnid.h: Replace. * ucnid.pl: Remove. * ucnid.tab: Make appropriate for input to makeucnid.c. Remove comments about obsolete version of C++. * include/cpplib.h (enum cpp_normalize_level): New. (struct cpp_options): Add warn_normalize field. Index: gcc/testsuite/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * gcc.dg/cpp/normalize-1.c: New. * gcc.dg/cpp/normalize-2.c: New. * gcc.dg/cpp/normalize-3.c: New. * gcc.dg/cpp/normalize-4.c: New. * gcc.dg/cpp/ucnid-4.c: New. * gcc.dg/cpp/ucnid-5.c: New. * g++.dg/cpp/normalize-1.C: New. * g++.dg/cpp/ucnid-1.C: New. From-SVN: r96459
2005-03-15 08:36:33 +08:00
; Table of UCNs which are valid in identifiers.
2024-01-03 19:19:35 +08:00
; Copyright (C) 2003-2024 Free Software Foundation, Inc.
Index: gcc/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * doc/cppopts.texi (-fexec-charset): Add concept index entry. (-fwide-exec-charset): Likewise. (-finput-charset): Likewise. * doc/invoke.texi (Warning Options): Document -Wnormalized=. * c-opts.c (c_common_handle_option): Handle -Wnormalized=. * c.opt (Wnormalized): New. Index: libcpp/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * init.c (cpp_create_reader): Default warn_normalize to normalized_C. * charset.c: Update for new format of ucnid.h. (ucn_valid_in_identifier): Update for new format of ucnid.h. Add NST parameter, and update it; update callers. (cpp_valid_ucn): Add NST parameter, update callers. Replace abort with cpp_error. (convert_ucn): Pass normalize_state to cpp_valid_ucn. * internal.h (struct normalize_state): New. (INITIAL_NORMALIZE_STATE): New. (NORMALIZE_STATE_RESULT): New. (NORMALIZE_STATE_UPDATE_IDNUM): New. (_cpp_valid_ucn): New. * lex.c (warn_about_normalization): New. (forms_identifier_p): Add normalize_state parameter, update callers. (lex_identifier): Add normalize_state parameter, update callers. Keep the state current. (lex_number): Likewise. (_cpp_lex_direct): Pass normalize_state to subroutines. Check it with warn_about_normalization. * makeucnid.c: New. * ucnid.h: Replace. * ucnid.pl: Remove. * ucnid.tab: Make appropriate for input to makeucnid.c. Remove comments about obsolete version of C++. * include/cpplib.h (enum cpp_normalize_level): New. (struct cpp_options): Add warn_normalize field. Index: gcc/testsuite/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * gcc.dg/cpp/normalize-1.c: New. * gcc.dg/cpp/normalize-2.c: New. * gcc.dg/cpp/normalize-3.c: New. * gcc.dg/cpp/normalize-4.c: New. * gcc.dg/cpp/ucnid-4.c: New. * gcc.dg/cpp/ucnid-5.c: New. * g++.dg/cpp/normalize-1.C: New. * g++.dg/cpp/ucnid-1.C: New. From-SVN: r96459
2005-03-15 08:36:33 +08:00
;
; This program is free software; you can redistribute it and/or modify it
; under the terms of the GNU General Public License as published by the
; Free Software Foundation; either version 3, or (at your option) any
Index: gcc/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * doc/cppopts.texi (-fexec-charset): Add concept index entry. (-fwide-exec-charset): Likewise. (-finput-charset): Likewise. * doc/invoke.texi (Warning Options): Document -Wnormalized=. * c-opts.c (c_common_handle_option): Handle -Wnormalized=. * c.opt (Wnormalized): New. Index: libcpp/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * init.c (cpp_create_reader): Default warn_normalize to normalized_C. * charset.c: Update for new format of ucnid.h. (ucn_valid_in_identifier): Update for new format of ucnid.h. Add NST parameter, and update it; update callers. (cpp_valid_ucn): Add NST parameter, update callers. Replace abort with cpp_error. (convert_ucn): Pass normalize_state to cpp_valid_ucn. * internal.h (struct normalize_state): New. (INITIAL_NORMALIZE_STATE): New. (NORMALIZE_STATE_RESULT): New. (NORMALIZE_STATE_UPDATE_IDNUM): New. (_cpp_valid_ucn): New. * lex.c (warn_about_normalization): New. (forms_identifier_p): Add normalize_state parameter, update callers. (lex_identifier): Add normalize_state parameter, update callers. Keep the state current. (lex_number): Likewise. (_cpp_lex_direct): Pass normalize_state to subroutines. Check it with warn_about_normalization. * makeucnid.c: New. * ucnid.h: Replace. * ucnid.pl: Remove. * ucnid.tab: Make appropriate for input to makeucnid.c. Remove comments about obsolete version of C++. * include/cpplib.h (enum cpp_normalize_level): New. (struct cpp_options): Add warn_normalize field. Index: gcc/testsuite/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * gcc.dg/cpp/normalize-1.c: New. * gcc.dg/cpp/normalize-2.c: New. * gcc.dg/cpp/normalize-3.c: New. * gcc.dg/cpp/normalize-4.c: New. * gcc.dg/cpp/ucnid-4.c: New. * gcc.dg/cpp/ucnid-5.c: New. * g++.dg/cpp/normalize-1.C: New. * g++.dg/cpp/ucnid-1.C: New. From-SVN: r96459
2005-03-15 08:36:33 +08:00
; later version.
;
; This program is distributed in the hope that it will be useful,
; but WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
; GNU General Public License for more details.
;
; You should have received a copy of the GNU General Public License
; along with this program; see the file COPYING3. If not see
; <http://www.gnu.org/licenses/>.
Index: gcc/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * doc/cppopts.texi (-fexec-charset): Add concept index entry. (-fwide-exec-charset): Likewise. (-finput-charset): Likewise. * doc/invoke.texi (Warning Options): Document -Wnormalized=. * c-opts.c (c_common_handle_option): Handle -Wnormalized=. * c.opt (Wnormalized): New. Index: libcpp/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * init.c (cpp_create_reader): Default warn_normalize to normalized_C. * charset.c: Update for new format of ucnid.h. (ucn_valid_in_identifier): Update for new format of ucnid.h. Add NST parameter, and update it; update callers. (cpp_valid_ucn): Add NST parameter, update callers. Replace abort with cpp_error. (convert_ucn): Pass normalize_state to cpp_valid_ucn. * internal.h (struct normalize_state): New. (INITIAL_NORMALIZE_STATE): New. (NORMALIZE_STATE_RESULT): New. (NORMALIZE_STATE_UPDATE_IDNUM): New. (_cpp_valid_ucn): New. * lex.c (warn_about_normalization): New. (forms_identifier_p): Add normalize_state parameter, update callers. (lex_identifier): Add normalize_state parameter, update callers. Keep the state current. (lex_number): Likewise. (_cpp_lex_direct): Pass normalize_state to subroutines. Check it with warn_about_normalization. * makeucnid.c: New. * ucnid.h: Replace. * ucnid.pl: Remove. * ucnid.tab: Make appropriate for input to makeucnid.c. Remove comments about obsolete version of C++. * include/cpplib.h (enum cpp_normalize_level): New. (struct cpp_options): Add warn_normalize field. Index: gcc/testsuite/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * gcc.dg/cpp/normalize-1.c: New. * gcc.dg/cpp/normalize-2.c: New. * gcc.dg/cpp/normalize-3.c: New. * gcc.dg/cpp/normalize-4.c: New. * gcc.dg/cpp/ucnid-4.c: New. * gcc.dg/cpp/ucnid-5.c: New. * g++.dg/cpp/normalize-1.C: New. * g++.dg/cpp/ucnid-1.C: New. From-SVN: r96459
2005-03-15 08:36:33 +08:00
;
; This file reproduces the table in ISO/IEC 9899:1999 (C99) Annex
; D, which is itself a reproduction from ISO/IEC TR 10176:1998, and
; the similar table from ISO/IEC 14882:1988 (C++98) Annex E, which is
; a reproduction of ISO/IEC PDTR 10176. Unfortunately these tables
ucnid-2011-1.c: New test. gcc/testsuite: * c-c++-common/cpp/ucnid-2011-1.c: New test. libcpp: * ucnid.tab: Add C11 and C11NOSTART data. * makeucnid.c (digit): Rename enum value to N99. (C11, N11, all_languages): New enum values. (NUM_CODE_POINTS, MAX_CODE_POINT): New macros. (flags, decomp, combining_value): Use NUM_CODE_POINTS as array size. (decomp): Use unsigned int as element type. (all_decomp): New array. (read_ucnid): Handle C11 and C11NOSTART. Use MAX_CODE_POINT. (read_table): Use MAX_CODE_POINT. Store all decompositions in all_decomp. (read_derived): Use MAX_CODE_POINT. (write_table): Use NUM_CODE_POINTS. Print N99, C11 and N11 flags. Print whole array variable declaration rather than just array contents. (char_id_valid, write_context_switch): New functions. (main): Call write_context_switch. * ucnid.h: Regenerate. * include/cpplib.h (struct cpp_options): Add c11_identifiers. * init.c (struct lang_flags): Add c11_identifiers. (cpp_set_lang): Set c11_identifiers option from selected language. * internal.h (struct normalize_state): Document "previous" as previous starter character. (NORMALIZE_STATE_UPDATE_IDNUM): Take character as argument. * charset.c (DIG): Rename enum value to N99. (C11, N11): New enum values. (struct ucnrange): Give name to struct. Use short for flags and unsigned int for end of range. Include ucnid.h for whole variable declaration. (ucn_valid_in_identifier): Allow for characters up to 0x10FFFF. Allow for C11 in determining valid characters and valid start characters. Use check_nfc for non-Hangul context-dependent checks. Only store starter characters in nst->previous. (_cpp_valid_ucn): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. * lex.c (lex_identifier): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. Call NORMALIZE_STATE_UPDATE_IDNUM after initial non-UCN part of identifier. (lex_number): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. From-SVN: r204886
2013-11-16 08:05:08 +08:00
; are not identical. It also reproduces the somewhat different tables
; in C11 and C++11, which are identical to each other.
cpplib.h (CPP_AT_NAME, [...]): New token types. * cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types. (struct cpp_options): Add narrow_charset, wide_charset, bytes_big_endian fields. Remove EBCDIC field. (cpp_init_iconv, cpp_interpret_string): New external interfaces. * cpphash.h: Include <iconv.h> if we have it, otherwise provide a dummy definition of iconv_t. (struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields. (_cpp_valid_ucn): Update prototype. (_cpp_destroy_iconv): New prototype. * doc/cpp.texi: Document character set handling. * doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=. * doc/extend.texi: Delete entire section on multiline strings. Rewrite section on __FUNCTION__ etc now that these are variables in C. * cppucnid.tab, cppucnid.pl: New files. * cppucnid.h: New generated file. * cppcharset.c: Include cppucnid.h. Lots of commentary added. (iconv_open, iconv, iconv_close): Provide dummy definitions if !HAVE_ICONV. (SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv, _cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn, emit_numeric_escape, convert_hex, convert_oct, convert_escape, cpp_interpret_string, narrow_str_to_charconst, wide_str_to_charconst): New. (ucn_valid_in_identifier): Use a binary search through the ucnranges table defined in cppucnid.h, not a long chain of if statements. (_cpp_valid_ucn): Add a limit pointer. Downgrade "universal character names are only valid in C++ and C99" to a warning. Issue the "meaning of \[uU] is different in traditional C" warning here. Take care not to let iconv see an invalid UCS value if we get a malformed UCN. Issue an error if we don't have iconv. (cpp_interpret_charconst): Moved here from cpplex.c. Use cpp_interpret_string to do the heavy lifting. * cppinit.c (cpp_create_reader): Initialize bytes_big_endian, narrow_charset, wide_charset fields of options structure. (cpp_destroy): Call _cpp_destroy_iconv. * cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn. (maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete. (cpp_interpret_charconst): Moved to cppcharset.c. * cpplib.c (dequote_string): Delete. (interpret_string_notranslate): New. (do_line, do_linemarker): Use interpret_string_notranslate. * Makefile.in (cppcharset.o): Depend on cppucnid.h. * c-common.c (fname_string, combine_strings): Delete. * c-common.h (fname_string, combine_strings): Delete prototypes. * c-lex.c (ignore_escape_flag): Delete. (cb_ident): Use cpp_interpret_string, not lex_string. (get_nonpadding_token): New function. (c_lex): Handle Objective-C @-prefixed identifiers and strings here. Adjust calls to lex_string. Don't write *value twice. (lex_string): Now handles string constant concatenation. Most of the work handed off to cpp_interpret_string. Call fix_string_type here. * c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with FUNC_NAME, throughout. (OBJC_STRING): New token type. (primary:STRING): No need to call fix_string_type here. (primary:objc_string): Make that OBJC_STRING. (objc_string nonterminal): Delete. (yylexname): Delete code to handle fake string constants. (yylexstring): Delete entirely. (_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING. No need to handle CPP_ATSIGN. * c.opt (-fexec-charset=, -fwide-exec-charset=): New options. * c-opts.c (missing_arg, c_common_handle_option): Handle OPT_fexec_charset_ and OPT_fwide_exec_charset_. (c_common_init): Set cpp_opts->bytes_big_endian, not cpp_opts->EBCDIC. Call cpp_init_iconv. (print_help): Document -fexec-charset= and -fexec-wide-charset=. (TARGET_EBCDIC): Delete default definition. * objc/objc-act.c (build_objc_string_object): No need to handle string constant concatenation. cp: * parser.c (cp_lexer_read_token): No need to handle string constant concatenation. testsuite: * gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c everywhere. * gcc.dg/concat.c: Concatenation of string constants with __FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error. * gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp. * gcc.dg/cpp/escape-2.c: Use wide character constants where necessary to avoid multi-character character constant warning. * gcc.dg/cpp/escape.c: Likewise. * gcc.dg/cpp/ucs.c: Likewise. Remove backslashes from dg-bogus comments, as they confuse Tcl. Fix a typo. libstdc++-v3: * testsuite/22_locale/collate/compare/wchar_t/2.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/hash/wchar_t/2.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/transform/wchar_t/2.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc: XFAIL on all targets. From-SVN: r68952
2003-07-05 08:24:00 +08:00
[C99]
; Latin
00aa 00ba 00c0-00d6 00d8-00f6 00f8-01f5 01fa-0217 0250-02a8 1e00-1e9b
1ea0-1ef9 207f
; Greek
0386 0388-038a 038c 038e-03a1 03a3-03ce 03d0-03d6 03da 03dc 03de 03e0
03e2-03f3 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b
1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fbc 1fc2-1fc4 1fc6-1fcc 1fd0-1fd3
1fd6-1fdb 1fe0-1fec 1ff2-1ff4 1ff6-1ffc
; Cyrillic
0401-040c 040e-044f 0451-045c 045e-0481 0490-04c4 04c7-04c8 04cb-04cc
04d0-04eb 04ee-04f5 04f8-04f9
; Armenian
0531-0556 0561-0587
; Hebrew
05b0-05b9 05bb-05bd 05bf 05c1-05c2 05d0-05ea 05f0-05f2
; Arabic
0621-063a 0640-0652 0670-06b7 06ba-06be 06c0-06ce 06d0-06dc 06e5-06e8
06ea-06ed
; Devanagari
0901-0903 0905-0939 093e-094d 0950-0952 0958-0963
; Bengali
0981-0983 0985-098c 098f-0990 0993-09a8 09aa-09b0 09b2 09b6-09b9
09be-09c4 09c7-09c8 09cb-09cd 09dc-09dd 09df-09e3 09f0-09f1
; Gurmukhi
0a02 0a05-0a0a 0a0f-0a10 0a13-0a28 0a2a-0a30 0a32-0a33 0a35-0a36
0a38-0a39 0a3e-0a42 0a47-0a48 0a4b-0a4d 0a59-0a5c 0a5e 0a74
; Gujarati
0a81-0a83 0a85-0a8b 0a8d 0a8f-0a91 0a93-0aa8 0aaa-0ab0 0ab2-0ab3
0ab5-0ab9 0abd-0ac5 0ac7-0ac9 0acb-0acd 0ad0 0ae0
; Oriya
0b01-0b03 0b05-0b0c 0b0f-0b10 0b13-0b28 0b2a-0b30 0b32-0b33 0b36-0b39
0b3e-0b43 0b47-0b48 0b4b-0b4d 0b5c-0b5d 0b5f-0b61
; Tamil
0b82-0b83 0b85-0b8a 0b8e-0b90 0b92-0b95 0b99-0b9a 0b9c 0b9e-0b9f
0ba3-0ba4 0ba8-0baa 0bae-0bb5 0bb7-0bb9 0bbe-0bc2 0bc6-0bc8 0bca-0bcd
; Telugu
0c01-0c03 0c05-0c0c 0c0e-0c10 0c12-0c28 0c2a-0c33 0c35-0c39 0c3e-0c44
0c46-0c48 0c4a-0c4d 0c60-0c61
; Kannada
0c82-0c83 0c85-0c8c 0c8e-0c90 0c92-0ca8 0caa-0cb3 0cb5-0cb9 0cbe-0cc4
0cc6-0cc8 0cca-0ccd 0cde 0ce0-0ce1
; Malayalam
0d02-0d03 0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d3e-0d43 0d46-0d48
0d4a-0d4d 0d60-0d61
# CORRECTION: exclude 0e50-0e59 from the Thai range as it also appears
# in the Digits range below.
; Thai
0e01-0e3a 0e40-0e49 0e5a-0e5b
; Lao
0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
0ea7 0eaa-0eab 0ead-0eae 0eb0-0eb9 0ebb-0ebd 0ec0-0ec4 0ec6 0ec8-0ecd
0edc-0edd
; Tibetan
0f00 0f18-0f19 0f35 0f37 0f39 0f3e-0f47 0f49-0f69 0f71-0f84 0f86-0f8b
0f90-0f95 0f97 0f99-0fad 0fb1-0fb7 0fb9
; Georgian
10a0-10c5 10d0-10f6
; Hiragana
3041-3093 309b-309c
; Katakana
30a1-30f6 30fb-30fc
; Bopomofo
3105-312c
; CJK Unified Ideographs
4e00-9fa5
; Hangul
ac00-d7a3
; Special characters
00b5 00b7 02b0-02b8 02bb 02bd-02c1 02d0-02d1 02e0-02e4 037a 0559 093d
0b3d 1fbe 203f-2040 2102 2107 210a-2113 2115 2118-211d 2124 2126 2128
212a-2131 2133-2138 2160-2182 3005-3007 3021-3029
[C99DIG]
cpplib.h (CPP_AT_NAME, [...]): New token types. * cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types. (struct cpp_options): Add narrow_charset, wide_charset, bytes_big_endian fields. Remove EBCDIC field. (cpp_init_iconv, cpp_interpret_string): New external interfaces. * cpphash.h: Include <iconv.h> if we have it, otherwise provide a dummy definition of iconv_t. (struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields. (_cpp_valid_ucn): Update prototype. (_cpp_destroy_iconv): New prototype. * doc/cpp.texi: Document character set handling. * doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=. * doc/extend.texi: Delete entire section on multiline strings. Rewrite section on __FUNCTION__ etc now that these are variables in C. * cppucnid.tab, cppucnid.pl: New files. * cppucnid.h: New generated file. * cppcharset.c: Include cppucnid.h. Lots of commentary added. (iconv_open, iconv, iconv_close): Provide dummy definitions if !HAVE_ICONV. (SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv, _cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn, emit_numeric_escape, convert_hex, convert_oct, convert_escape, cpp_interpret_string, narrow_str_to_charconst, wide_str_to_charconst): New. (ucn_valid_in_identifier): Use a binary search through the ucnranges table defined in cppucnid.h, not a long chain of if statements. (_cpp_valid_ucn): Add a limit pointer. Downgrade "universal character names are only valid in C++ and C99" to a warning. Issue the "meaning of \[uU] is different in traditional C" warning here. Take care not to let iconv see an invalid UCS value if we get a malformed UCN. Issue an error if we don't have iconv. (cpp_interpret_charconst): Moved here from cpplex.c. Use cpp_interpret_string to do the heavy lifting. * cppinit.c (cpp_create_reader): Initialize bytes_big_endian, narrow_charset, wide_charset fields of options structure. (cpp_destroy): Call _cpp_destroy_iconv. * cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn. (maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete. (cpp_interpret_charconst): Moved to cppcharset.c. * cpplib.c (dequote_string): Delete. (interpret_string_notranslate): New. (do_line, do_linemarker): Use interpret_string_notranslate. * Makefile.in (cppcharset.o): Depend on cppucnid.h. * c-common.c (fname_string, combine_strings): Delete. * c-common.h (fname_string, combine_strings): Delete prototypes. * c-lex.c (ignore_escape_flag): Delete. (cb_ident): Use cpp_interpret_string, not lex_string. (get_nonpadding_token): New function. (c_lex): Handle Objective-C @-prefixed identifiers and strings here. Adjust calls to lex_string. Don't write *value twice. (lex_string): Now handles string constant concatenation. Most of the work handed off to cpp_interpret_string. Call fix_string_type here. * c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with FUNC_NAME, throughout. (OBJC_STRING): New token type. (primary:STRING): No need to call fix_string_type here. (primary:objc_string): Make that OBJC_STRING. (objc_string nonterminal): Delete. (yylexname): Delete code to handle fake string constants. (yylexstring): Delete entirely. (_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING. No need to handle CPP_ATSIGN. * c.opt (-fexec-charset=, -fwide-exec-charset=): New options. * c-opts.c (missing_arg, c_common_handle_option): Handle OPT_fexec_charset_ and OPT_fwide_exec_charset_. (c_common_init): Set cpp_opts->bytes_big_endian, not cpp_opts->EBCDIC. Call cpp_init_iconv. (print_help): Document -fexec-charset= and -fexec-wide-charset=. (TARGET_EBCDIC): Delete default definition. * objc/objc-act.c (build_objc_string_object): No need to handle string constant concatenation. cp: * parser.c (cp_lexer_read_token): No need to handle string constant concatenation. testsuite: * gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c everywhere. * gcc.dg/concat.c: Concatenation of string constants with __FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error. * gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp. * gcc.dg/cpp/escape-2.c: Use wide character constants where necessary to avoid multi-character character constant warning. * gcc.dg/cpp/escape.c: Likewise. * gcc.dg/cpp/ucs.c: Likewise. Remove backslashes from dg-bogus comments, as they confuse Tcl. Fix a typo. libstdc++-v3: * testsuite/22_locale/collate/compare/wchar_t/2.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/hash/wchar_t/2.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/transform/wchar_t/2.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc: XFAIL on all targets. From-SVN: r68952
2003-07-05 08:24:00 +08:00
0660-0669 06f0-06f9 0966-096f 09e6-09ef 0a66-0a6f 0ae6-0aef 0b66-0b6f
0be7-0bef 0c66-0c6f 0ce6-0cef 0d66-0d6f 0e50-0e59 0ed0-0ed9 0f20-0f33
[CXX]
; Latin
00c0-00d6 00d8-00f6 00f8-01f5 01fa-0217 0250-02a8 1e00-1e9a 1ea0-1ef9
; Greek
0384 0388-038a 038c 038e-03a1 03a3-03ce 03d0-03d6 03da 03dc 03de 03e0
03e2-03f3 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b
1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fbc 1fc2-1fc4 1fc6-1fcc 1fd0-1fd3
1fd6-1fdb 1fe0-1fec 1ff2-1ff4 1ff6-1ffc
; Cyrillic
0401-040d 040f-044f 0451-045c 045e-0481 0490-04c4 04c7-04c8 04cb-04cc
04d0-04eb 04ee-04f5 04f8-04f9
; Armenian
0531-0556 0561-0587
; Hebrew
05d0-05ea 05f0-05f4
; Arabic
0621-063a 0640-0652 0670-06b7 06ba-06be 06c0-06ce 06e5-06e7
; Devanagari
0905-0939 0958-0962
; Bengali
0985-098c 098f-0990 0993-09a8 09aa-09b0 09b2 09b6-09b9 09dc-09dd
09df-09e1 09f0-09f1
; Gurmukhi
0a05-0a0a 0a0f-0a10 0a13-0a28 0a2a-0a30 0a32-0a33 0a35-0a36 0a38-0a39
0a59-0a5c 0a5e
; Gujarati
0a85-0a8b 0a8d 0a8f-0a91 0a93-0aa8 0aaa-0ab0 0ab2-0ab3 0ab5-0ab9 0ae0
; Oriya
0b05-0b0c 0b0f-0b10 0b13-0b28 0b2a-0b30 0b32-0b33 0b36-0b39 0b5c-0b5d
0b5f-0b61
; Tamil
0b85-0b8a 0b8e-0b90 0b92-0b95 0b99-0b9a 0b9c 0b9e-0b9f 0ba3-0ba4
0ba8-0baa 0bae-0bb5 0bb7-0bb9
; Telugu
0c05-0c0c 0c0e-0c10 0c12-0c28 0c2a-0c33 0c35-0c39 0c60-0c61
; Kannada
0c85-0c8c 0c8e-0c90 0c92-0ca8 0caa-0cb3 0cb5-0cb9 0ce0-0ce1
; Malayalam
0d05-0d0c 0d0e-0d10 0d12-0d28 0d2a-0d39 0d60-0d61
; Thai
Index: gcc/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * doc/cppopts.texi (-fexec-charset): Add concept index entry. (-fwide-exec-charset): Likewise. (-finput-charset): Likewise. * doc/invoke.texi (Warning Options): Document -Wnormalized=. * c-opts.c (c_common_handle_option): Handle -Wnormalized=. * c.opt (Wnormalized): New. Index: libcpp/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * init.c (cpp_create_reader): Default warn_normalize to normalized_C. * charset.c: Update for new format of ucnid.h. (ucn_valid_in_identifier): Update for new format of ucnid.h. Add NST parameter, and update it; update callers. (cpp_valid_ucn): Add NST parameter, update callers. Replace abort with cpp_error. (convert_ucn): Pass normalize_state to cpp_valid_ucn. * internal.h (struct normalize_state): New. (INITIAL_NORMALIZE_STATE): New. (NORMALIZE_STATE_RESULT): New. (NORMALIZE_STATE_UPDATE_IDNUM): New. (_cpp_valid_ucn): New. * lex.c (warn_about_normalization): New. (forms_identifier_p): Add normalize_state parameter, update callers. (lex_identifier): Add normalize_state parameter, update callers. Keep the state current. (lex_number): Likewise. (_cpp_lex_direct): Pass normalize_state to subroutines. Check it with warn_about_normalization. * makeucnid.c: New. * ucnid.h: Replace. * ucnid.pl: Remove. * ucnid.tab: Make appropriate for input to makeucnid.c. Remove comments about obsolete version of C++. * include/cpplib.h (enum cpp_normalize_level): New. (struct cpp_options): Add warn_normalize field. Index: gcc/testsuite/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * gcc.dg/cpp/normalize-1.c: New. * gcc.dg/cpp/normalize-2.c: New. * gcc.dg/cpp/normalize-3.c: New. * gcc.dg/cpp/normalize-4.c: New. * gcc.dg/cpp/ucnid-4.c: New. * gcc.dg/cpp/ucnid-5.c: New. * g++.dg/cpp/normalize-1.C: New. * g++.dg/cpp/ucnid-1.C: New. From-SVN: r96459
2005-03-15 08:36:33 +08:00
0e01-0e30 0e32-0e33 0e40-0e46 0e4f-0e5b
cpplib.h (CPP_AT_NAME, [...]): New token types. * cpplib.h (CPP_AT_NAME, CPP_OBJC_STRING): New token types. (struct cpp_options): Add narrow_charset, wide_charset, bytes_big_endian fields. Remove EBCDIC field. (cpp_init_iconv, cpp_interpret_string): New external interfaces. * cpphash.h: Include <iconv.h> if we have it, otherwise provide a dummy definition of iconv_t. (struct cpp_reader): Add narrow_cset_desc and wide_cset_desc fields. (_cpp_valid_ucn): Update prototype. (_cpp_destroy_iconv): New prototype. * doc/cpp.texi: Document character set handling. * doc/cppopts.texi: Document -fexec-charset= and -fexec-wide-charset=. * doc/extend.texi: Delete entire section on multiline strings. Rewrite section on __FUNCTION__ etc now that these are variables in C. * cppucnid.tab, cppucnid.pl: New files. * cppucnid.h: New generated file. * cppcharset.c: Include cppucnid.h. Lots of commentary added. (iconv_open, iconv, iconv_close): Provide dummy definitions if !HAVE_ICONV. (SOURCE_CHARSET, struct strbuf, init_iconv_desc, cpp_init_iconv, _cpp_destroy_iconv, convert_cset, width_to_mask, convert_ucn, emit_numeric_escape, convert_hex, convert_oct, convert_escape, cpp_interpret_string, narrow_str_to_charconst, wide_str_to_charconst): New. (ucn_valid_in_identifier): Use a binary search through the ucnranges table defined in cppucnid.h, not a long chain of if statements. (_cpp_valid_ucn): Add a limit pointer. Downgrade "universal character names are only valid in C++ and C99" to a warning. Issue the "meaning of \[uU] is different in traditional C" warning here. Take care not to let iconv see an invalid UCS value if we get a malformed UCN. Issue an error if we don't have iconv. (cpp_interpret_charconst): Moved here from cpplex.c. Use cpp_interpret_string to do the heavy lifting. * cppinit.c (cpp_create_reader): Initialize bytes_big_endian, narrow_charset, wide_charset fields of options structure. (cpp_destroy): Call _cpp_destroy_iconv. * cpplex.c (forms_identifier_p): Adjust call to _cpp_valid_ucn. (maybe_read_ucn, hex_digit_value, cpp_parse_escape): Delete. (cpp_interpret_charconst): Moved to cppcharset.c. * cpplib.c (dequote_string): Delete. (interpret_string_notranslate): New. (do_line, do_linemarker): Use interpret_string_notranslate. * Makefile.in (cppcharset.o): Depend on cppucnid.h. * c-common.c (fname_string, combine_strings): Delete. * c-common.h (fname_string, combine_strings): Delete prototypes. * c-lex.c (ignore_escape_flag): Delete. (cb_ident): Use cpp_interpret_string, not lex_string. (get_nonpadding_token): New function. (c_lex): Handle Objective-C @-prefixed identifiers and strings here. Adjust calls to lex_string. Don't write *value twice. (lex_string): Now handles string constant concatenation. Most of the work handed off to cpp_interpret_string. Call fix_string_type here. * c-parse.in (STRING_FUNC_NAME, VAR_FUNC_NAME): Replace with FUNC_NAME, throughout. (OBJC_STRING): New token type. (primary:STRING): No need to call fix_string_type here. (primary:objc_string): Make that OBJC_STRING. (objc_string nonterminal): Delete. (yylexname): Delete code to handle fake string constants. (yylexstring): Delete entirely. (_yylex): Handle CPP_AT_NAME and CPP_OBJC_STRING. No need to handle CPP_ATSIGN. * c.opt (-fexec-charset=, -fwide-exec-charset=): New options. * c-opts.c (missing_arg, c_common_handle_option): Handle OPT_fexec_charset_ and OPT_fwide_exec_charset_. (c_common_init): Set cpp_opts->bytes_big_endian, not cpp_opts->EBCDIC. Call cpp_init_iconv. (print_help): Document -fexec-charset= and -fexec-wide-charset=. (TARGET_EBCDIC): Delete default definition. * objc/objc-act.c (build_objc_string_object): No need to handle string constant concatenation. cp: * parser.c (cp_lexer_read_token): No need to handle string constant concatenation. testsuite: * gcc.c-torture/execute/wchar_t-1.x: New file; XFAIL wchar_t-1.c everywhere. * gcc.dg/concat.c: Concatenation of string constants with __FUNCTION__ / __PRETTY_FUNCTION__ is now a hard error. * gcc.dg/wtr-strcat-1.c: Loosen dg-warning regexp. * gcc.dg/cpp/escape-2.c: Use wide character constants where necessary to avoid multi-character character constant warning. * gcc.dg/cpp/escape.c: Likewise. * gcc.dg/cpp/ucs.c: Likewise. Remove backslashes from dg-bogus comments, as they confuse Tcl. Fix a typo. libstdc++-v3: * testsuite/22_locale/collate/compare/wchar_t/2.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/compare/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/hash/wchar_t/2.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/hash/wchar_t/wrapped_locale.cc * testsuite/22_locale/collate/transform/wchar_t/2.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_env.cc * testsuite/22_locale/collate/transform/wchar_t/wrapped_locale.cc: XFAIL on all targets. From-SVN: r68952
2003-07-05 08:24:00 +08:00
; Digits
0e50-0e59
; Lao
0e81-0e82 0e84 0e87-0e88 0e8a 0e8d 0e94-0e97 0e99-0e9f 0ea1-0ea3 0ea5
0ea7 0eaa-0eab 0ead-0eb0 0eb2 0eb3 0ebd 0ec0-0ec4 0ec6
; Georgian
10a0-10c5 10d0-10f6
; Hiragana
3041-3094 309b-309e
; Katakana
30a1-30fe
; Bopomofo
3105-312c
; Hangul
1100-1159 1161-11a2 11a8-11f9
; CJK Unified Ideographs
f900-fa2d fb1f-fb36 fb38-fb3c fb3e fb40-fb41 fb42-fb44 fb46-fbb1
fbd3-fd3f fd50-fd8f fd92-fdc7 fdf0-fdfb fe70-fe72 fe74 fe76-fefc
ff21-ff3a ff41-ff5a ff66-ffbe ffc2-ffc7 ffca-ffcf ffd2-ffd7
ffda-ffdc 4e00-9fa5
ucnid-2011-1.c: New test. gcc/testsuite: * c-c++-common/cpp/ucnid-2011-1.c: New test. libcpp: * ucnid.tab: Add C11 and C11NOSTART data. * makeucnid.c (digit): Rename enum value to N99. (C11, N11, all_languages): New enum values. (NUM_CODE_POINTS, MAX_CODE_POINT): New macros. (flags, decomp, combining_value): Use NUM_CODE_POINTS as array size. (decomp): Use unsigned int as element type. (all_decomp): New array. (read_ucnid): Handle C11 and C11NOSTART. Use MAX_CODE_POINT. (read_table): Use MAX_CODE_POINT. Store all decompositions in all_decomp. (read_derived): Use MAX_CODE_POINT. (write_table): Use NUM_CODE_POINTS. Print N99, C11 and N11 flags. Print whole array variable declaration rather than just array contents. (char_id_valid, write_context_switch): New functions. (main): Call write_context_switch. * ucnid.h: Regenerate. * include/cpplib.h (struct cpp_options): Add c11_identifiers. * init.c (struct lang_flags): Add c11_identifiers. (cpp_set_lang): Set c11_identifiers option from selected language. * internal.h (struct normalize_state): Document "previous" as previous starter character. (NORMALIZE_STATE_UPDATE_IDNUM): Take character as argument. * charset.c (DIG): Rename enum value to N99. (C11, N11): New enum values. (struct ucnrange): Give name to struct. Use short for flags and unsigned int for end of range. Include ucnid.h for whole variable declaration. (ucn_valid_in_identifier): Allow for characters up to 0x10FFFF. Allow for C11 in determining valid characters and valid start characters. Use check_nfc for non-Hangul context-dependent checks. Only store starter characters in nst->previous. (_cpp_valid_ucn): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. * lex.c (lex_identifier): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. Call NORMALIZE_STATE_UPDATE_IDNUM after initial non-UCN part of identifier. (lex_number): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. From-SVN: r204886
2013-11-16 08:05:08 +08:00
[C11]
; Group 1
00a8 00aa 00ad 00af 00b2-00b5 00b7-00ba 00bc-00be 00c0-00d6 00d8-00f6
00f8-00ff
; Group 2, minus characters under C11NOSTART
0100-02ff 0370-167f 1681-180d 180f-1dbf 1e00-1fff
; Group 3
200b-200d 202a-202e 203f-2040 2054 2060-206f
; Group 4, minus characters under C11NOSTART
2070-20cf 2100-218f 2460-24ff 2776-2793 2c00-2dff 2e80-2fff
; Group 5
3004-3007 3021-302f 3031-303f
; Group 6
3040-d7ff
; Group 7, minus characters under C11NOSTART
f900-fd3d fd40-fdcf fdf0-fe1f fe30-fe44 fe47-fffd
; Group 8
10000-1fffd 20000-2fffd 30000-3fffd 40000-4fffd 50000-5fffd
60000-6fffd 70000-7fffd 80000-8fffd 90000-9fffd a0000-afffd
b0000-bfffd c0000-cfffd d0000-dfffd e0000-efffd
[C11NOSTART]
; Group 1
0300-036f 1dc0-1dff 20d0-20ff fe20-fe2f