Upgrade to Oniguruma 6.9.4

Oniguruma 6.9.4 fixes several CVEs.
This commit is contained in:
Christoph M. Becker 2019-11-30 09:38:46 +01:00
parent 8c4b0ddde5
commit 1979c5d16f
74 changed files with 3735 additions and 2967 deletions

3
NEWS
View File

@ -11,6 +11,9 @@ PHP NEWS
- GD:
. Fixed bug #78849 (GD build broken with -D SIGNED_COMPARE_SLOW). (cmb)
- MBString:
. Upgraded bundled Oniguruma to 6.9.4. (cmb)
- OPcache:
. Fixed potential ASLR related invalid opline handler issues. (cmb)
. Fixed $x = (bool)$x; with opcache (should emit undeclared variable notice).

View File

@ -1,8 +1,33 @@
History
2019/11/29: Version 6.9.4
2019/11/22: Release Candidate 3 for Version 6.9.4
2019/11/20: fix a problem found by libFuzzer test
2019/11/14: Release Candidate 2 for Version 6.9.4
2019/11/12: fix integer overflow by nested quantifier
2019/11/11: fix CVE-2019-19012: Integer overflow related to reg->dmax in search_in_range()
2019/11/07: fix CVE-2019-19203: heap-buffer-overflow in gb18030_mbc_enc_len()
2019/11/06: fix CVE-2019-19204: heap-buffer-overflow in fetch_interval_quantifier()
2019/11/06: add HAVE_INTTYPES_H into config.h.windows.in and config.h.win{32,64}
2019/11/06: add HAVE_STDINT_H into config.h.win{32,64}
2019/11/05: Release Candidate 1 for Version 6.9.4
2019/10/31: Update Unicode Emoji version to 12.1 (Nothing data changed)
2019/10/29: implement USE_REPEAT_AND_EMPTY_CHECK_LOCAL_VAR configuration
2019/10/18: re-implement case fold conversion
2019/10/04: fix #156: Heap buffer overflow in match_at() with case-insensitive match
2019/09/30: NEW API: add onig_regset_replace()
2019/09/30: change Unicode VERSION value format
2019/09/20: NEW API: add regset functions
2019/09/20: add data ensure check before peek string value in OP_PUSH_IF_PEEK_NEXT
2019/09/20: fix loose code in encode-harness.c
2019/08/13: fix heap-buffer-overflow
2019/08/13: Add a macro to disable direct threading in the match engine (PR#149)
2019/08/06: Version 6.9.3 (secirity fix release)
2019/07/30: add ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE
2019/07/30: add ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC
2019/07/29: add STK_PREC_READ_START/END stack type
2019/07/29: Fix #147: Stack Exhaustion Problem caused by some parsing functions
2019/07/11: add a dictionary file for libfuzzer

View File

@ -27,25 +27,34 @@ Supported character encodings:
* doc/SYNTAX.md: contributed by seanofw
Version 6.9.4
-------------
* NEW API: RegSet (set of regexes)
* Fixed CVE-2019-19012
* Fixed CVE-2019-19203 (Does not affect UTF-8, UTF-16 and UTF-32 encodings)
* Fixed CVE-2019-19204 (Affects only PosixBasic, Emacs and Grep syntaxes)
* Fixed CVE-2019-19246
* Fixed some problems (found by libFuzzer test)
Version 6.9.3 (security fix release)
------------------------------------
* Fixed CVE-2019-13224
* Fixed CVE-2019-13225
* Fixed many problems (found by libfuzzer programs)
* Fixed CVE-2019-16163
* Fixed many problems (found by libFuzzer test)
Version 6.9.2 (Reiwa)
---------------------
* add doc/SYNTAX.md
* Direct threaded code (for GCC and Clang)
* Update Unicode version 12.1.0
* NEW: Unicode Text Segment mode option (?y{g}) (?y{w}) (*original)
g: Extended Grapheme Cluster mode / w: Word mode
(Unicode Standard Annex #29 [http://unicode.org/reports/tr29/])
Version 6.9.1
-------------
@ -118,7 +127,7 @@ Version 6.5.0
* NEW: \O (true anychar)
* NEW: if-then-else (?(...)...\|...)
* NEW: Backreference validity checker (?(xxx)) (*original)
* NEW: Absent repeater (?~absent) \[is equal to (?\~\|absent|\O*)]
* NEW: Absent repeater (?~absent) \[is equal to (?\~\|(?:absent)|\O*)]
* NEW: Absent expression (?~|absent|expr) (*original)
* NEW: Absent stopper (?~|absent) (*original)
@ -244,15 +253,18 @@ Sample Programs
|File |Description |
|:---------------------|:-----------------------------------------|
|sample/callout.c |example of callouts |
|sample/count.c |example of built-in callout *COUNT |
|sample/echo.c |example of user defined callouts of name |
|sample/encode.c |example of some encodings |
|sample/listcap.c |example of the capture history |
|sample/names.c |example of the named group callback |
|sample/posix.c |POSIX API sample |
|sample/regset.c |example of using RegSet API |
|sample/scan.c |example of using onig_scan() |
|sample/simple.c |example of the minimum (Oniguruma API) |
|sample/names.c |example of the named group callback. |
|sample/encode.c |example of some encodings. |
|sample/listcap.c |example of the capture history. |
|sample/posix.c |POSIX API sample. |
|sample/scan.c |example of using onig_scan(). |
|sample/sql.c |example of the variable meta characters. |
|sample/user_property.c|example of user defined Unicode property. |
|sample/callout.c |example of callouts. |
|sample/sql.c |example of the variable meta characters |
|sample/user_property.c|example of user defined Unicode property |
Test Programs

View File

@ -1,4 +1,4 @@
Oniguruma API Version 6.9.3 2019/07/06
Oniguruma API Version 6.9.4 2019/09/30
#include <oniguruma.h>
@ -168,7 +168,7 @@ Oniguruma API Version 6.9.3 2019/07/06
# int onig_new_deluxe(regex_t** reg, const UChar* pattern, const UChar* pattern_end,
OnigCompileInfo* ci, OnigErrorInfo* einfo)
This function is deprecate, and it does not allow the case where
This function is deprecated, and it does not allow the case where
the encoding of pattern and target is different.
Create a regex object.
@ -306,6 +306,7 @@ Oniguruma API Version 6.9.3 2019/07/06
normal return: match position offset (i.e. p - str >= 0)
not found: ONIG_MISMATCH (< 0)
error: error code (< 0)
arguments
1 reg: regex object
@ -342,7 +343,8 @@ Oniguruma API Version 6.9.3 2019/07/06
Do not pass invalid byte string in the regex character encoding.
normal return: match length (>= 0)
not match: ONIG_MISMATCH ( < 0)
not match: ONIG_MISMATCH (< 0)
error: error code (< 0)
arguments
1 reg: regex object
@ -391,6 +393,136 @@ Oniguruma API Version 6.9.3 2019/07/06
7 callback_arg: optional argument passed to callback
# int onig_regset_new(OnigRegSet** rset, int n, regex_t* regs[])
Create a regset object.
All regex objects must have the same character encoding.
All regex objects are prohibited from having the ONIG_OPTION_FIND_LONGEST option.
arguments
1 rset: return address of regset object
2 n: number of regex in regs
3 regs: array of regex
normal return: ONIG_NORMAL
# int onig_regset_add(OnigRegSet* set, regex_t* reg)
Add a regex into regset.
The regex object must have the same character encoding with the regset.
The regex object is prohibited from having the ONIG_OPTION_FIND_LONGEST option.
arguments
1 set: regset object
2 reg: regex object
normal return: ONIG_NORMAL
# int onig_regset_replace(OnigRegSet* set, int at, regex_t* reg)
Replace a regex in regset with another one.
If the reg argument value is NULL, then remove at-th regex. (and indexes of other regexes are changed)
arguments
1 set: regset object
2 at: index of regex (zero origin)
3 reg: regex object
normal return: ONIG_NORMAL
# void onig_regset_free(OnigRegSet* set)
Free memory used by regset object and regex objects in the regset.
If the same regex object is registered twice, the situation becomes destructive.
arguments
1 set: regset object
# int onig_regset_number_of_regex(OnigRegSet* set)
Returns number of regex objects in the regset.
arguments
1 set: regset object
# regex_t* onig_regset_get_regex(OnigRegSet* set, int at)
Returns the regex object corresponding to the at-th regex.
arguments
1 set: regset object
2 at: index of regex array (zero origin)
# OnigRegion* onig_regset_get_region(OnigRegSet* set, int at)
Returns the region object corresponding to the at-th regex.
arguments
1 set: regset object
2 at: index of regex array (zero origin)
# int onig_regset_search(OnigRegSet* set, const OnigUChar* str, const OnigUChar* end, const OnigUChar* start, const OnigUChar* range, OnigRegSetLead lead, OnigOptionType option, int* rmatch_pos)
Perform a search with regset.
return value:
normal return: index of match regex (zero origin)
not found: ONIG_MISMATCH (< 0)
error: error code (< 0)
arguments
1 set: regset object
2 str: target string
3 end: terminate address of target string
4 start: search start address of target string
5 range: search terminate address of target string
6 lead: outer loop element
ONIG_REGSET_POSITION_LEAD (returns most left position)
ONIG_REGSET_REGEX_LEAD (returns most left position)
ONIG_REGSET_PRIORITY_TO_REGEX_ORDER (returns first match regex)
7 option: search time option
ONIG_OPTION_NOTBOL string head(str) isn't considered as begin of line
ONIG_OPTION_NOTEOL string end (end) isn't considered as end of line
8 rmatch_pos: return address of match position (match_address - str)
* ONIG_REGSET_POSITION_LEAD and ONIG_REGSET_REGEX_LEAD return the same result.
These differences only appear in search time.
In most cases, ONIG_REGSET_POSITION_LEAD seems to be faster.
# int onig_regset_search_with_param(OnigRegSet* set, const OnigUChar* str, const OnigUChar* end, const OnigUChar* start, const OnigUChar* range, OnigRegSetLead lead, OnigOptionType option, OnigMatchParam* mps[], int* rmatch_pos)
Perform a search with regset and match-params.
return value:
normal return: index of match regex (zero origin)
not found: ONIG_MISMATCH (< 0)
error: error code (< 0)
arguments
1 set: regset object
2 str: target string
3 end: terminate address of target string
4 start: search start address of target string
5 range: search terminate address of target string
6 lead: outer loop element
ONIG_REGSET_POSITION_LEAD (returns most left position)
ONIG_REGSET_REGEX_LEAD (returns most left position)
ONIG_REGSET_PRIORITY_TO_REGEX_ORDER (returns first match regex)
7 option: search time option
ONIG_OPTION_NOTBOL string head(str) isn't considered as begin of line
ONIG_OPTION_NOTEOL string end (end) isn't considered as end of line
8 mps: array of match-params
9 rmatch_pos: return address of match position (match_address - str)
# OnigRegion* onig_region_new(void)
Create a region.

View File

@ -1,4 +1,4 @@
鬼車インターフェース Version 6.9.3 2019/07/06
鬼車インターフェース Version 6.9.4 2019/09/30
#include <oniguruma.h>
@ -390,6 +390,138 @@
7 callback_arg: コールバック関数に渡される付加引数値
# int onig_regset_new(OnigRegSet** rset, int n, regex_t* regs[])
regsetオブジェクトを生成する。
全ての正規表現オブジェクトは、同じ文字エンコーディングでなければならない。
全ての正規表現オブジェクトは、ONIG_OPTION_FIND_LONGESTオプションでコンパイルされていてはならない。
引数
1 rset: regsetオブジェクトを返すためのアドレス
2 n: 正規表現の個数
3 regs: 正規表現オブジェクトの配列
正常終了戻り値: ONIG_NORMAL
# int onig_regset_add(OnigRegSet* set, regex_t* reg)
regsetオブジェクトに正規表現を追加する。
正規表現オブジェクトは、regsetと同じ文字エンコーディングでなければならない。
正規表現オブジェクトは、ONIG_OPTION_FIND_LONGESTオプションでコンパイルされていてはならない。
引数
1 set: regsetオブジェクト
2 reg: 正規表現オブジェクト
正常終了戻り値: ONIG_NORMAL
# int onig_regset_replace(OnigRegSet* set, int at, regex_t* reg)
regsetの中の一個の正規表現オブジェクトを別のものに変更する。
若しreg引数の値がNULLであれば、at番目の正規表現オブジェクトを外す。(そして、以降の正規表現オブジェクトのインデックスは変化する)
引数
1 set: regsetオブジェクト
2 at: 変更する場所のインデックス
2 reg: 正規表現オブジェクト
正常終了戻り値: ONIG_NORMAL
# void onig_regset_free(OnigRegSet* set)
regsetオブジェクトとその中の正規表現オブジェクトの使用メモリを開放する。
若し、同一の正規表現オブジェクトを重複して登録していれば、破壊的な状況になる。
引数
1 set: regsetオブジェクト
# int onig_regset_number_of_regex(OnigRegSet* set)
regsetの中の正規表現オブジェクトの個数を返す。
引数
1 set: regsetオブジェクト
# regex_t* onig_regset_get_regex(OnigRegSet* set, int at)
regsetのat番目の正規表現を返す。
引数
1 set: regsetオブジェクト
2 at: 正規表現オブジェクトのインデックス (ゼロ開始)
# OnigRegion* onig_regset_get_region(OnigRegSet* set, int at)
regsetのat番目の正規表現に対応する領域を返す。
引数
1 set: regsetオブジェクト
2 at: 正規表現オブジェクトのインデックス (ゼロ開始)
# int onig_regset_search(OnigRegSet* set, const OnigUChar* str, const OnigUChar* end, const OnigUChar* start, const OnigUChar* range, OnigRegSetLead lead, OnigOptionType option, int* rmatch_pos)
regsetによる検索を実行する。
戻り値:
検索成功: マッチした正規表現オブジェクトのインデックス (ゼロ開始)
検索失敗: ONIG_MISMATCH (< 0)
エラー: エラーコード (< 0)
引数
1 set: regsetオブジェクト
2 str: 検索対象文字列
3 end: 検索対象文字列の終端アドレス
4 start: 検索対象文字列の検索先頭位置アドレス
5 range: 検索対象文字列の検索終了位置アドレス
(start <= 探索される文字列 < range)
6 lead: 外側のループ要素
ONIG_REGSET_POSITION_LEAD (最左位置でマッチした結果を返す)
ONIG_REGSET_REGEX_LEAD (最左位置でマッチした結果を返す)
ONIG_REGSET_PRIORITY_TO_REGEX_ORDER (最初にマッチした正規表現の結果を返す)
7 option: 検索時オプション
ONIG_OPTION_NOTBOL 文字列の先頭(str)を行頭と看做さない
ONIG_OPTION_NOTEOL 文字列の終端(end)を行末と看做さない
8 rmatch_pos: マッチした位置を返すためのアドレス (match_address - str)
* ONIG_REGSET_POSITION_LEADとONIG_REGSET_REGEX_LEADは同じ結果を返す。
これらの違いは検索時間にしか現れない。
ほとんどの場合、ONIG_REGSET_POSITION_LEADのほうが速いと思われる。
# int onig_regset_search_with_param(OnigRegSet* set, const OnigUChar* str, const OnigUChar* end, const OnigUChar* start, const OnigUChar* range, OnigRegSetLead lead, OnigOptionType option, OnigMatchParam* mps[], int* rmatch_pos)
regsetとOnigMatchParamオブジェクトによる検索を実行する。
戻り値:
検索成功: マッチした正規表現オブジェクトのインデックス (ゼロ開始)
検索失敗: ONIG_MISMATCH (< 0)
エラー: エラーコード (< 0)
引数
1 set: regsetオブジェクト
2 str: 検索対象文字列
3 end: 検索対象文字列の終端アドレス
4 start: 検索対象文字列の検索先頭位置アドレス
5 range: 検索対象文字列の検索終了位置アドレス
(start <= 探索される文字列 < range)
6 lead: 外側のループ要素
ONIG_REGSET_POSITION_LEAD (最左位置でマッチした結果を返す)
ONIG_REGSET_REGEX_LEAD (最左位置でマッチした結果を返す)
ONIG_REGSET_PRIORITY_TO_REGEX_ORDER (最初にマッチした正規表現の結果を返す)
7 option: 検索時オプション
ONIG_OPTION_NOTBOL 文字列の先頭(str)を行頭と看做さない
ONIG_OPTION_NOTEOL 文字列の終端(end)を行末と看做さない
8 mps: OnigMatchParamオブジェクトの配列
9 rmatch_pos: マッチした位置を返すためのアドレス (match_address - str)
# OnigRegion* onig_region_new(void)
マッチ領域情報(region)を作成する。

View File

@ -1,4 +1,4 @@
Oniguruma Regular Expressions Version 6.9.2 2019/03/29
Oniguruma Regular Expressions Version 6.9.4 2019/10/31
syntax: ONIG_SYNTAX_ONIGURUMA (default)
@ -289,6 +289,11 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
In negative look-behind, capturing group isn't allowed,
but non-capturing group (?:) is allowed.
* In look-behind and negative look-behind, support for
ignore-case option is limited. Only supports conversion
between single characters. (Does not support conversion
of multiple characters in Unicode)
(?>subexp) atomic group
no backtracks in subexp.
@ -338,7 +343,7 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
This works like .* (more precisely \O*), but it is
limited by the range that does not include the string
match with <absent>.
This is a written abbreviation of (?~|absent|\O*).
This is a written abbreviation of (?~|(?:absent)|\O*).
\O* is used as a repeater.
(?~|absent|exp) Absent expression (* original)

View File

@ -1,4 +1,4 @@
鬼車 正規表現 Version 6.9.2 2019/03/29
鬼車 正規表現 Version 6.9.4 2019/10/31
使用文法: ONIG_SYNTAX_ONIGURUMA (既定値)
@ -21,10 +21,10 @@
\f 改頁 (0x0C)
\a 鐘 (0x07)
\e 退避修飾 (0x1B)
\nnn 八進数表現 符号化バイト値(の一部)
\nnn 八進数表現 符号化バイト値
\o{17777777777} 拡張八進数表現 コードポイント値
\uHHHH 拡張十六進数表現 コードポイント値
\xHH 十六進数表現 符号化バイト値(の一部)
\xHH 十六進数表現 符号化バイト値
\x{7HHHHHHH} 拡張十六進数表現 コードポイント値
\cx 制御文字表現 コードポイント値
\C-x 制御文字表現 コードポイント値
@ -284,6 +284,10 @@
否定戻り読みでは、捕獲式集合は許されないが、
非捕獲式集合は許される。
* 戻り読み、否定戻り読みの中では、ignore-caseオプションの
対応が制限される。一文字と一文字の間の変換しか対応しない。
(Unicodeでの複数文字の変換に対応しない)
(?>式) 原子的式集合
式全体を通過したとき、式の中での後退再試行を行なわない
@ -334,20 +338,20 @@
<不在機能群>
(?~不在) 不在繰り返し (*原案 田中哲)
これは .*(より正確には\O*)のように動作するが、<不在>に
(?~不在) 不在繰り返し (*原案 田中哲)
これは .*(より正確には\O*)のように動作するが、<不在>に
適合する文字列を含まない範囲に制限される。
これは(?~|不在式|\O*)の省略表記である。
これは(?~|(?:不在)|\O*)の省略表記である。
(?~|不在|式) 不在式 (* 原作)
これは<式>のように動作するが、<不在>に適合する文字列を
(?~|不在|式) 不在式 (* 原作)
これは<式>のように動作するが、<不在>に適合する文字列を
含まない範囲に制限される。
例 (?~|345|\d*) "12345678" ==> "12", "1", ""
(?~|不在) 不在停止 (* 原作)
(?~|不在) 不在停止 (* 原作)
この演算子を通過した後は、対象文字列の適合範囲が
<不在>に適合する文字列を含まない範囲に制限される。
<不在>に適合する文字列を含まない範囲に制限される。
(?~|) 範囲消去
不在停止の効果を消して、それ以前の状態にする。

View File

@ -1,7 +1,7 @@
# Oniguruma syntax (operator) configuration
_Documented for Oniguruma 6.9.2 (2019/03/28)_
_Documented for Oniguruma 6.9.3 (2019/08/08)_
----------
@ -960,6 +960,12 @@ _Set in: Ruby, Oniguruma_
If this flag is set, Oniguruma will warn about nested repeat operators those have no meaning, like `(?:a*)+`.
If this flag is clear, Oniguruma will allow the nested repeat operators without warning about them.
### 26. ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC (allow [a-\x{7fffffff}])
_Set in: Oniguruma_
If this flag is set, then invalid code points at the end of range in character class are allowed.
### 31. ONIG_SYN_CONTEXT_INDEP_ANCHORS
_Set in: PosixExtended, GnuRegex, Java, Perl, Perl_NG, Ruby, Oniguruma_
@ -1066,4 +1072,5 @@ These tables show which of the built-in syntaxes use which flags and options, fo
| 23 | `ONIG_SYN_ALLOW_DOUBLE_RANGE_OP_IN_CC` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 24 | `ONIG_SYN_WARN_CC_OP_NOT_ESCAPED` | - | - | - | - | - | - | - | - | Yes | Yes |
| 25 | `ONIG_SYN_WARN_REDUNDANT_NESTED_REPEAT` | - | - | - | - | - | - | - | - | Yes | Yes |
| 26 | `ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC` | - | - | - | - | - | - | - | - | - | Yes |
| 31 | `ONIG_SYN_CONTEXT_INDEP_ANCHORS` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |

View File

@ -1,4 +1,4 @@
Unicode Properties (from Unicode Version: 12.1.0)
Unicode Properties (Unicode Version: 12.1.0, Emoji: 12.1)
15: ASCII_Hex_Digit
16: Adlam

View File

@ -2,7 +2,7 @@
ascii.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
big5.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -54,6 +54,16 @@ big5_mbc_enc_len(const UChar* p)
return EncLen_BIG5[*p];
}
static int
big5_code_to_mbclen(OnigCodePoint code)
{
if ((code & (~0xffff)) != 0) return ONIGERR_INVALID_CODE_POINT_VALUE;
if ((code & 0xff00) != 0) return 2;
if (EncLen_BIG5[(int )(code & 0xff)] == 1) return 1;
return ONIGERR_INVALID_CODE_POINT_VALUE;
}
static int
is_valid_mbc_string(const UChar* p, const UChar* end)
{
@ -99,15 +109,6 @@ big5_mbc_case_fold(OnigCaseFoldType flag, const UChar** pp, const UChar* end,
pp, end, lower);
}
#if 0
static int
big5_is_mbc_ambiguous(OnigCaseFoldType flag,
const UChar** pp, const UChar* end)
{
return onigenc_mbn_is_mbc_ambiguous(ONIG_ENCODING_BIG5, flag, pp, end);
}
#endif
static int
big5_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{
@ -174,7 +175,7 @@ OnigEncodingType OnigEncodingBIG5 = {
1, /* min enc length */
onigenc_is_mbc_newline_0x0a,
big5_mbc_to_code,
onigenc_mb2_code_to_mbclen,
big5_code_to_mbclen,
big5_code_to_mbc,
big5_mbc_case_fold,
onigenc_ascii_apply_all_case_fold,

View File

@ -1,82 +1,56 @@
#define STDC_HEADERS 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_FLOAT_H 1
#define HAVE_OFF_T 1
#define SIZEOF_INT 4
#define SIZEOF_SHORT 2
#define SIZEOF_LONG 4
#define SIZEOF_LONG_LONG 8
#define SIZEOF___INT64 8
#define SIZEOF_OFF_T 4
#define SIZEOF_VOIDP 4
#define SIZEOF_FLOAT 4
#define SIZEOF_DOUBLE 8
#define SIZEOF_SIZE_T 4
#define HAVE_PROTOTYPES 1
#define TOKEN_PASTE(x,y) x##y
#define HAVE_STDARG_PROTOTYPES 1
#ifndef NORETURN
#if _MSC_VER > 1100
#define NORETURN(x) __declspec(noreturn) x
#else
#define NORETURN(x) x
#endif
#endif
#define HAVE_DECL_SYS_NERR 1
#define STDC_HEADERS 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_LIMITS_H 1
#define HAVE_FCNTL_H 1
#define HAVE_SYS_UTIME_H 1
#define HAVE_MEMORY_H 1
#define uid_t int
#define gid_t int
#define GETGROUPS_T int
#define HAVE_ALLOCA 1
#define HAVE_DUP2 1
#define HAVE_MEMCMP 1
#define HAVE_MEMMOVE 1
#define HAVE_MKDIR 1
#define HAVE_STRCASECMP 1
#define HAVE_STRNCASECMP 1
#define HAVE_STRERROR 1
#define HAVE_STRFTIME 1
#define HAVE_STRCHR 1
#define HAVE_STRSTR 1
#define HAVE_STRTOD 1
#define HAVE_STRTOL 1
#define HAVE_STRTOUL 1
#define HAVE_FLOCK 1
#define HAVE_VSNPRINTF 1
#define HAVE_FINITE 1
#define HAVE_FMOD 1
#define HAVE_FREXP 1
#define HAVE_HYPOT 1
#define HAVE_MODF 1
#define HAVE_WAITPID 1
#define HAVE_CHSIZE 1
#define HAVE_TIMES 1
#define HAVE__SETJMP 1
#define HAVE_TELLDIR 1
#define HAVE_SEEKDIR 1
#define HAVE_MKTIME 1
#define HAVE_COSH 1
#define HAVE_SINH 1
#define HAVE_TANH 1
#define HAVE_EXECVE 1
#define HAVE_TZNAME 1
#define HAVE_DAYLIGHT 1
#define SETPGRP_VOID 1
#define inline __inline
#define NEED_IO_SEEK_BETWEEN_RW 1
#define RSHIFT(x,y) ((x)>>(int)y)
#define FILE_COUNT _cnt
#define FILE_READPTR _ptr
#define DEFAULT_KCODE KCODE_NONE
#define DLEXT ".so"
#define DLEXT2 ".dll"
#if defined(__MINGW32__) || _MSC_VER >= 1600
#define HAVE_STDINT_H 1
#endif
#if defined(__MINGW32__) || _MSC_VER >= 1800
#define HAVE_INTTYPES_H 1
#endif
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_MEMORY_H 1
#define HAVE_OFF_T 1
#define SIZEOF_INT 4
#define SIZEOF_LONG 4
#define SIZEOF_LONG_LONG 8
#define SIZEOF___INT64 8
#define SIZEOF_OFF_T 4
#define SIZEOF_VOIDP 4
#define SIZEOF_FLOAT 4
#define SIZEOF_DOUBLE 8
#define SIZEOF_SIZE_T 4
#define TOKEN_PASTE(x,y) x##y
#ifndef NORETURN
#if _MSC_VER > 1100
#define NORETURN(x) __declspec(noreturn) x
#else
#define NORETURN(x) x
#endif
#endif
#define HAVE_DECL_SYS_NERR 1
#define HAVE_FCNTL_H 1
#define HAVE_SYS_UTIME_H 1
#define HAVE_MEMORY_H 1
#define uid_t int
#define gid_t int
#define GETGROUPS_T int
#define HAVE_ALLOCA 1
#define HAVE_DUP2 1
#define HAVE_MKDIR 1
#define HAVE_FLOCK 1
#define HAVE_FINITE 1
#define HAVE_HYPOT 1
#define HAVE_WAITPID 1
#define HAVE_CHSIZE 1
#define HAVE_TIMES 1
#define HAVE_TELLDIR 1
#define HAVE_SEEKDIR 1
#define HAVE_EXECVE 1
#define HAVE_DAYLIGHT 1
#define SETPGRP_VOID 1
#define inline __inline
#define NEED_IO_SEEK_BETWEEN_RW 1
#define RSHIFT(x,y) ((x)>>(int)y)
#define FILE_COUNT _cnt
#define FILE_READPTR _ptr
#define DEFAULT_KCODE KCODE_NONE
#define DLEXT ".so"
#define DLEXT2 ".dll"

View File

@ -1,82 +1,56 @@
#define STDC_HEADERS 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_FLOAT_H 1
#define HAVE_OFF_T 1
#define SIZEOF_INT 4
#define SIZEOF_SHORT 2
#define SIZEOF_LONG 4
#define SIZEOF_LONG_LONG 8
#define SIZEOF___INT64 8
#define SIZEOF_OFF_T 4
#define SIZEOF_VOIDP 8
#define SIZEOF_FLOAT 4
#define SIZEOF_DOUBLE 8
#define SIZEOF_SIZE_T 8
#define HAVE_PROTOTYPES 1
#define TOKEN_PASTE(x,y) x##y
#define HAVE_STDARG_PROTOTYPES 1
#ifndef NORETURN
#if _MSC_VER > 1100
#define NORETURN(x) __declspec(noreturn) x
#else
#define NORETURN(x) x
#endif
#endif
#define HAVE_DECL_SYS_NERR 1
#define STDC_HEADERS 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_LIMITS_H 1
#define HAVE_FCNTL_H 1
#define HAVE_SYS_UTIME_H 1
#define HAVE_MEMORY_H 1
#define uid_t int
#define gid_t int
#define GETGROUPS_T int
#define HAVE_ALLOCA 1
#define HAVE_DUP2 1
#define HAVE_MEMCMP 1
#define HAVE_MEMMOVE 1
#define HAVE_MKDIR 1
#define HAVE_STRCASECMP 1
#define HAVE_STRNCASECMP 1
#define HAVE_STRERROR 1
#define HAVE_STRFTIME 1
#define HAVE_STRCHR 1
#define HAVE_STRSTR 1
#define HAVE_STRTOD 1
#define HAVE_STRTOL 1
#define HAVE_STRTOUL 1
#define HAVE_FLOCK 1
#define HAVE_VSNPRINTF 1
#define HAVE_FINITE 1
#define HAVE_FMOD 1
#define HAVE_FREXP 1
#define HAVE_HYPOT 1
#define HAVE_MODF 1
#define HAVE_WAITPID 1
#define HAVE_CHSIZE 1
#define HAVE_TIMES 1
#define HAVE__SETJMP 1
#define HAVE_TELLDIR 1
#define HAVE_SEEKDIR 1
#define HAVE_MKTIME 1
#define HAVE_COSH 1
#define HAVE_SINH 1
#define HAVE_TANH 1
#define HAVE_EXECVE 1
#define HAVE_TZNAME 1
#define HAVE_DAYLIGHT 1
#define SETPGRP_VOID 1
#define inline __inline
#define NEED_IO_SEEK_BETWEEN_RW 1
#define RSHIFT(x,y) ((x)>>(int)y)
#define FILE_COUNT _cnt
#define FILE_READPTR _ptr
#define DEFAULT_KCODE KCODE_NONE
#define DLEXT ".so"
#define DLEXT2 ".dll"
#if defined(__MINGW32__) || _MSC_VER >= 1600
#define HAVE_STDINT_H 1
#endif
#if defined(__MINGW32__) || _MSC_VER >= 1800
#define HAVE_INTTYPES_H 1
#endif
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_MEMORY_H 1
#define HAVE_OFF_T 1
#define SIZEOF_INT 4
#define SIZEOF_LONG 4
#define SIZEOF_LONG_LONG 8
#define SIZEOF___INT64 8
#define SIZEOF_OFF_T 4
#define SIZEOF_VOIDP 8
#define SIZEOF_FLOAT 4
#define SIZEOF_DOUBLE 8
#define SIZEOF_SIZE_T 8
#define TOKEN_PASTE(x,y) x##y
#ifndef NORETURN
#if _MSC_VER > 1100
#define NORETURN(x) __declspec(noreturn) x
#else
#define NORETURN(x) x
#endif
#endif
#define HAVE_DECL_SYS_NERR 1
#define HAVE_FCNTL_H 1
#define HAVE_SYS_UTIME_H 1
#define HAVE_MEMORY_H 1
#define uid_t int
#define gid_t int
#define GETGROUPS_T int
#define HAVE_ALLOCA 1
#define HAVE_DUP2 1
#define HAVE_MKDIR 1
#define HAVE_FLOCK 1
#define HAVE_FINITE 1
#define HAVE_HYPOT 1
#define HAVE_WAITPID 1
#define HAVE_CHSIZE 1
#define HAVE_TIMES 1
#define HAVE_TELLDIR 1
#define HAVE_SEEKDIR 1
#define HAVE_EXECVE 1
#define HAVE_DAYLIGHT 1
#define SETPGRP_VOID 1
#define inline __inline
#define NEED_IO_SEEK_BETWEEN_RW 1
#define RSHIFT(x,y) ((x)>>(int)y)
#define FILE_COUNT _cnt
#define FILE_READPTR _ptr
#define DEFAULT_KCODE KCODE_NONE
#define DLEXT ".so"
#define DLEXT2 ".dll"

View File

@ -2,8 +2,8 @@
cp1251.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2006-2018 Byte <byte AT mail DOT kna DOT ru>
* K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2006-2019 Byte <byte AT mail DOT kna DOT ru>
* K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
euc_jp.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -120,25 +120,6 @@ code_to_mbclen(OnigCodePoint code)
return ONIGERR_INVALID_CODE_POINT_VALUE;
}
#if 0
static int
code_to_mbc_first(OnigCodePoint code)
{
int first;
if ((code & 0xff0000) != 0) {
first = (code >> 16) & 0xff;
}
else if ((code & 0xff00) != 0) {
first = (code >> 8) & 0xff;
}
else {
return (int )code;
}
return first;
}
#endif
static int
code_to_mbc(OnigCodePoint code, UChar *buf)
{

View File

@ -1,5 +1,5 @@
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -pt -T -L ANSI-C -N onigenc_euc_jp_lookup_property_name --output-file gperf1.tmp euc_jp_prop.gperf */
/* Command-line: gperf -pt -T -L ANSI-C -N onigenc_euc_jp_lookup_property_name --output-file gperf1.tmp euc_jp_prop.gperf */
/* Computed positions: -k'1,3' */
#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \

View File

@ -2,7 +2,7 @@
euc_kr.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -54,6 +54,16 @@ euckr_mbc_enc_len(const UChar* p)
return EncLen_EUCKR[*p];
}
static int
euckr_code_to_mbclen(OnigCodePoint code)
{
if ((code & (~0xffff)) != 0) return ONIGERR_INVALID_CODE_POINT_VALUE;
if ((code & 0xff00) != 0) return 2;
if (EncLen_EUCKR[(int )(code & 0xff)] == 1) return 1;
return ONIGERR_INVALID_CODE_POINT_VALUE;
}
static int
is_valid_mbc_string(const UChar* p, const UChar* end)
{
@ -98,15 +108,6 @@ euckr_mbc_case_fold(OnigCaseFoldType flag, const UChar** pp, const UChar* end,
pp, end, lower);
}
#if 0
static int
euckr_is_mbc_ambiguous(OnigCaseFoldType flag,
const UChar** pp, const UChar* end)
{
return onigenc_mbn_is_mbc_ambiguous(ONIG_ENCODING_EUC_KR, flag, pp, end);
}
#endif
static int
euckr_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{
@ -149,7 +150,7 @@ OnigEncodingType OnigEncodingEUC_KR = {
1, /* min enc length */
onigenc_is_mbc_newline_0x0a,
euckr_mbc_to_code,
onigenc_mb2_code_to_mbclen,
euckr_code_to_mbclen,
euckr_code_to_mbc,
euckr_mbc_case_fold,
onigenc_ascii_apply_all_case_fold,
@ -174,7 +175,7 @@ OnigEncodingType OnigEncodingEUC_CN = {
1, /* min enc length */
onigenc_is_mbc_newline_0x0a,
euckr_mbc_to_code,
onigenc_mb2_code_to_mbclen,
euckr_code_to_mbclen,
euckr_code_to_mbc,
euckr_mbc_case_fold,
onigenc_ascii_apply_all_case_fold,

View File

@ -2,7 +2,7 @@
euc_tw.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -54,6 +54,20 @@ euctw_mbc_enc_len(const UChar* p)
return EncLen_EUCTW[*p];
}
static int
euctw_code_to_mbclen(OnigCodePoint code)
{
if ((code & 0xff000000) != 0) return 4;
else if ((code & 0xff0000) != 0) return ONIGERR_INVALID_CODE_POINT_VALUE;
else if ((code & 0xff00) != 0) return 2;
else {
if (EncLen_EUCTW[(int )(code & 0xff)] == 1)
return 1;
return ONIGERR_INVALID_CODE_POINT_VALUE;
}
}
static int
is_valid_mbc_string(const UChar* p, const UChar* end)
{
@ -155,7 +169,7 @@ OnigEncodingType OnigEncodingEUC_TW = {
1, /* min enc length */
onigenc_is_mbc_newline_0x0a,
euctw_mbc_to_code,
onigenc_mb4_code_to_mbclen,
euctw_code_to_mbclen,
euctw_code_to_mbc,
euctw_mbc_case_fold,
onigenc_ascii_apply_all_case_fold,

View File

@ -3,7 +3,7 @@
**********************************************************************/
/*-
* Copyright (c) 2005-2019 KUBO Takehiro <kubo AT jiubao DOT org>
* K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -33,6 +33,7 @@
#if 1
#define DEBUG_GB18030(arg)
#else
#include <stdio.h>
#define DEBUG_GB18030(arg) printf arg
#endif
@ -75,6 +76,20 @@ gb18030_mbc_enc_len(const UChar* p)
return 2;
}
static int
gb18030_code_to_mbclen(OnigCodePoint code)
{
if ((code & 0xff000000) != 0) return 4;
else if ((code & 0xff0000) != 0) return ONIGERR_INVALID_CODE_POINT_VALUE;
else if ((code & 0xff00) != 0) return 2;
else {
if (GB18030_MAP[(int )(code & 0xff)] == CM)
return ONIGERR_INVALID_CODE_POINT_VALUE;
return 1;
}
}
static int
is_valid_mbc_string(const UChar* p, const UChar* end)
{
@ -135,15 +150,6 @@ gb18030_mbc_case_fold(OnigCaseFoldType flag, const UChar** pp, const UChar* end,
pp, end, lower);
}
#if 0
static int
gb18030_is_mbc_ambiguous(OnigCaseFoldType flag,
const UChar** pp, const UChar* end)
{
return onigenc_mbn_is_mbc_ambiguous(ONIG_ENCODING_GB18030, flag, pp, end);
}
#endif
static int
gb18030_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{
@ -522,7 +528,7 @@ OnigEncodingType OnigEncodingGB18030 = {
1, /* min enc length */
onigenc_is_mbc_newline_0x0a,
gb18030_mbc_to_code,
onigenc_mb4_code_to_mbclen,
gb18030_code_to_mbclen,
gb18030_code_to_mbc,
gb18030_mbc_case_fold,
onigenc_ascii_apply_all_case_fold,

View File

@ -2,7 +2,7 @@
iso8859_1.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -216,32 +216,6 @@ mbc_case_fold(OnigCaseFoldType flag, const UChar** pp,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_1_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xdf, 0xaa, 0xb5, 0xba are lower case letter, but can't convert. */
if (*p >= 0xaa && *p <= 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_10.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,28 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_10_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_11.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
iso8859_13.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,32 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_13_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xdf, 0xb5 are lower case letter, but can't convert. */
if (*p == 0xb5)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_14.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,29 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1; /* return byte length of converted char to lower */
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag,
const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_14_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_15.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,32 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1; /* return byte length of converted char to lower */
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_15_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xdf etc.. are lower case letter, but can't convert. */
if (*p == 0xaa || *p == 0xb5 || *p == 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_16.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,28 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1; /* return byte length of converted char to lower */
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_16_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_2.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,28 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1; /* return byte length of converted char to lower */
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_2_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static const OnigPairCaseFoldCodes CaseFoldMap[] = {
{ 0xa1, 0xb1 },
{ 0xa3, 0xb3 },

View File

@ -2,7 +2,7 @@
iso8859_3.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,32 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag, const UChar** pp,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_3_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xaa, 0xb5, 0xba are lower case letter, but can't convert. */
if (*p == 0xb5)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_4.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,31 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1; /* return byte length of converted char to lower */
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_4_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
if (*p == 0xa2)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_5.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -114,19 +114,6 @@ mbc_case_fold(OnigCaseFoldType flag ARG_UNUSED,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
(*pp)++;
v = (EncISO_8859_5_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_6.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
iso8859_7.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -114,26 +114,6 @@ mbc_case_fold(OnigCaseFoldType flag ARG_UNUSED,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
(*pp)++;
v = (EncISO_8859_7_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
if (*p == 0xc0 || *p == 0xe0)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
iso8859_8.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
iso8859_9.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,32 +121,6 @@ mbc_case_fold(OnigCaseFoldType flag,
return 1;
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
(*pp)++;
return TRUE;
}
(*pp)++;
v = (EncISO_8859_9_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xdf etc.. are lower case letter, but can't convert. */
if (*p >= 0xaa && *p <= 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
koi8.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -115,25 +115,6 @@ koi8_mbc_case_fold(OnigCaseFoldType flag ARG_UNUSED,
return 1;
}
#if 0
static int
koi8_is_mbc_ambiguous(OnigAmbigType flag, const OnigUChar** pp, const OnigUChar* end)
{
const OnigUChar* p = *pp;
(*pp)++;
if (((flag & ONIGENC_CASE_FOLD_ASCII_CASE) != 0 &&
ONIGENC_IS_MBC_ASCII(p)) ||
((flag & ONIGENC_CASE_FOLD_NONASCII_CASE) != 0 &&
!ONIGENC_IS_MBC_ASCII(p))) {
int v = (EncKOI8_CtypeTable[*p] &
(BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
return (v != 0 ? TRUE : FALSE);
}
return FALSE;
}
#endif
static int
koi8_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
koi8_r.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -114,19 +114,6 @@ koi8_r_mbc_case_fold(OnigCaseFoldType flag ARG_UNUSED,
return 1;
}
#if 0
static int
koi8_r_is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
int v;
const UChar* p = *pp;
(*pp)++;
v = (EncKOI8_R_CtypeTable[*p] & (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
return (v != 0 ? TRUE : FALSE);
}
#endif
static int
koi8_r_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{

View File

@ -2,7 +2,7 @@
mktable.c
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
onig_init.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2016-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2016-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -4,7 +4,7 @@
oniggnu.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2005 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -4,7 +4,7 @@
onigposix.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -95,6 +95,7 @@ typedef struct {
#endif
#endif
#ifndef ONIG_STATIC
#ifndef ONIG_EXTERN
#if defined(_WIN32) && !defined(__GNUC__)
#if defined(ONIGURUMA_EXPORT)
@ -108,6 +109,9 @@ typedef struct {
#ifndef ONIG_EXTERN
#define ONIG_EXTERN extern
#endif
#else
#define ONIG_EXTERN extern
#endif
#ifndef ONIGURUMA_H
typedef unsigned int OnigOptionType;

View File

@ -4,7 +4,7 @@
oniguruma.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -36,9 +36,9 @@ extern "C" {
#define ONIGURUMA
#define ONIGURUMA_VERSION_MAJOR 6
#define ONIGURUMA_VERSION_MINOR 9
#define ONIGURUMA_VERSION_TEENY 3
#define ONIGURUMA_VERSION_TEENY 4
#define ONIGURUMA_VERSION_INT 60903
#define ONIGURUMA_VERSION_INT 60904
#ifndef P_
#if defined(__STDC__) || defined(_WIN32)
@ -687,6 +687,14 @@ typedef OnigRegexType* OnigRegex;
typedef OnigRegexType regex_t;
#endif
struct OnigRegSetStruct;
typedef struct OnigRegSetStruct OnigRegSet;
typedef enum {
ONIG_REGSET_POSITION_LEAD = 0,
ONIG_REGSET_REGEX_LEAD = 1,
ONIG_REGSET_PRIORITY_TO_REGEX_ORDER = 2
} OnigRegSetLead;
typedef struct {
int num_of_elements;
@ -797,6 +805,26 @@ ONIG_EXTERN
int onig_match P_((OnigRegex, const OnigUChar* str, const OnigUChar* end, const OnigUChar* at, OnigRegion* region, OnigOptionType option));
ONIG_EXTERN
int onig_match_with_param P_((OnigRegex, const OnigUChar* str, const OnigUChar* end, const OnigUChar* at, OnigRegion* region, OnigOptionType option, OnigMatchParam* mp));
ONIG_EXTERN
int onig_regset_new P_((OnigRegSet** rset, int n, regex_t* regs[]));
ONIG_EXTERN
int onig_regset_add P_((OnigRegSet* set, regex_t* reg));
ONIG_EXTERN
int onig_regset_replace P_((OnigRegSet* set, int at, regex_t* reg));
ONIG_EXTERN
void onig_regset_free P_((OnigRegSet* set));
ONIG_EXTERN
int onig_regset_number_of_regex P_((OnigRegSet* set));
ONIG_EXTERN
regex_t* onig_regset_get_regex P_((OnigRegSet* set, int at));
ONIG_EXTERN
OnigRegion* onig_regset_get_region P_((OnigRegSet* set, int at));
ONIG_EXTERN
int onig_regset_search P_((OnigRegSet* set, const OnigUChar* str, const OnigUChar* end, const OnigUChar* start, const OnigUChar* range, OnigRegSetLead lead, OnigOptionType option, int* rmatch_pos));
ONIG_EXTERN
int onig_regset_search_with_param P_((OnigRegSet* set, const OnigUChar* str, const OnigUChar* end, const OnigUChar* start, const OnigUChar* range, OnigRegSetLead lead, OnigOptionType option, OnigMatchParam* mps[], int* rmatch_pos));
ONIG_EXTERN
OnigRegion* onig_region_new P_((void));
ONIG_EXTERN

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,7 @@
regenc.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -182,7 +182,8 @@ onigenc_get_right_adjust_char_head_with_prev(OnigEncoding enc,
p += enclen(enc, p);
}
else {
if (prev) *prev = (const UChar* )NULL; /* Sorry */
if (prev)
*prev = onigenc_get_prev_char_head(enc, start, p);
}
return p;
}
@ -208,20 +209,6 @@ onigenc_step_back(OnigEncoding enc, const UChar* start, const UChar* s, int n)
return (UChar* )s;
}
#if 0
extern int
onigenc_mbc_enc_len_end(OnigEncoding enc, const UChar* p, const UChar* end)
{
int len;
int n;
len = ONIGENC_MBC_ENC_LEN(enc, p);
n = (int )(end - p);
return (n < len ? n : len);
}
#endif
extern UChar*
onigenc_step(OnigEncoding enc, const UChar* p, const UChar* end, int n)
{
@ -705,18 +692,6 @@ onigenc_ascii_mbc_case_fold(OnigCaseFoldType flag ARG_UNUSED, const UChar** p,
return 1; /* return byte length of converted char to lower */
}
#if 0
extern int
onigenc_ascii_is_mbc_ambiguous(OnigCaseFoldType flag,
const UChar** pp, const UChar* end)
{
const UChar* p = *pp;
(*pp)++;
return ONIGENC_IS_ASCII_CODE_CASE_AMBIG(*p);
}
#endif
extern int
onigenc_single_byte_mbc_enc_len(const UChar* p ARG_UNUSED)
{
@ -833,41 +808,6 @@ onigenc_mbn_mbc_case_fold(OnigEncoding enc, OnigCaseFoldType flag ARG_UNUSED,
}
}
#if 0
extern int
onigenc_mbn_is_mbc_ambiguous(OnigEncoding enc, OnigCaseFoldType flag,
const UChar** pp, const UChar* end)
{
const UChar* p = *pp;
if (ONIGENC_IS_MBC_ASCII(p)) {
(*pp)++;
return ONIGENC_IS_ASCII_CODE_CASE_AMBIG(*p);
}
(*pp) += enclen(enc, p);
return FALSE;
}
#endif
extern int
onigenc_mb2_code_to_mbclen(OnigCodePoint code)
{
if ((code & (~0xffff)) != 0) return ONIGERR_INVALID_CODE_POINT_VALUE;
if ((code & 0xff00) != 0) return 2;
else return 1;
}
extern int
onigenc_mb4_code_to_mbclen(OnigCodePoint code)
{
if ((code & 0xff000000) != 0) return 4;
else if ((code & 0xff0000) != 0) return 3;
else if ((code & 0xff00) != 0) return 2;
else return 1;
}
extern int
onigenc_mb2_code_to_mbc(OnigEncoding enc, OnigCodePoint code, UChar *buf)
{

View File

@ -4,7 +4,7 @@
regenc.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -163,13 +163,11 @@ extern int onigenc_length_check_is_valid_mbc_string P_((OnigEncoding enc, const
/* methods for multi byte encoding */
extern OnigCodePoint onigenc_mbn_mbc_to_code P_((OnigEncoding enc, const UChar* p, const UChar* end));
extern int onigenc_mbn_mbc_case_fold P_((OnigEncoding enc, OnigCaseFoldType flag, const UChar** p, const UChar* end, UChar* lower));
extern int onigenc_mb2_code_to_mbclen P_((OnigCodePoint code));
extern int onigenc_mb2_code_to_mbc P_((OnigEncoding enc, OnigCodePoint code, UChar *buf));
extern int onigenc_minimum_property_name_to_ctype P_((OnigEncoding enc, UChar* p, UChar* end));
extern int onigenc_unicode_property_name_to_ctype P_((OnigEncoding enc, UChar* p, UChar* end));
extern int onigenc_is_mbc_word_ascii P_((OnigEncoding enc, UChar* s, const UChar* end));
extern int onigenc_mb2_is_code_ctype P_((OnigEncoding enc, OnigCodePoint code, unsigned int ctype));
extern int onigenc_mb4_code_to_mbclen P_((OnigCodePoint code));
extern int onigenc_mb4_code_to_mbc P_((OnigEncoding enc, OnigCodePoint code, UChar *buf));
extern int onigenc_mb4_is_code_ctype P_((OnigEncoding enc, OnigCodePoint code, unsigned int ctype));
extern struct PropertyNameCtype* onigenc_euc_jp_lookup_property_name P_((register const char *str, register size_t len));

View File

@ -2,7 +2,7 @@
regerror.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,7 @@
regext.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -41,7 +41,6 @@ conv_ext0be32(const UChar* s, const UChar* end, UChar* conv)
}
}
#if 0
static void
conv_ext0le32(const UChar* s, const UChar* end, UChar* conv)
{
@ -92,7 +91,6 @@ conv_swap2bytes(const UChar* s, const UChar* end, UChar* conv)
s += 2;
}
}
#endif
static int
conv_encoding(OnigEncoding from, OnigEncoding to, const UChar* s, const UChar* end,

View File

@ -2,7 +2,7 @@
reggnu.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -4,7 +4,7 @@
regint.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -47,16 +47,11 @@
#endif
#endif
#if defined(__i386) || defined(__i386__) || defined(_M_IX86) || \
(defined(__ppc__) && defined(__APPLE__)) || \
defined(__x86_64) || defined(__x86_64__) || \
defined(__mc68020__)
#define PLATFORM_UNALIGNED_WORD_ACCESS
#endif
#ifndef ONIG_DISABLE_DIRECT_THREADING
#ifdef __GNUC__
#define USE_GOTO_LABELS_AS_VALUES
#endif
#endif
/* config */
/* spec. config */
@ -82,6 +77,8 @@
#define USE_VARIABLE_META_CHARS
#define USE_POSIX_API_REGION_OPTION
#define USE_FIND_LONGEST_SEARCH_ALL_OF_RANGE
/* #define USE_REPEAT_AND_EMPTY_CHECK_LOCAL_VAR */
#include "regenc.h"
@ -197,49 +194,16 @@ typedef unsigned int uintptr_t;
#define CHAR_MAP_SIZE 256
#define INFINITE_LEN ONIG_INFINITE_DISTANCE
#ifdef PLATFORM_UNALIGNED_WORD_ACCESS
#define PLATFORM_GET_INC(val,p,type) do{\
val = *(type* )p;\
(p) += sizeof(type);\
} while(0)
#else
#define PLATFORM_GET_INC(val,p,type) do{\
xmemcpy(&val, (p), sizeof(type));\
(p) += sizeof(type);\
} while(0)
/* sizeof(OnigCodePoint) */
#ifdef SIZEOF_SIZE_T
# define WORD_ALIGNMENT_SIZE SIZEOF_SIZE_T
#else
# define WORD_ALIGNMENT_SIZE SIZEOF_LONG
#endif
#define GET_ALIGNMENT_PAD_SIZE(addr,pad_size) do {\
(pad_size) = WORD_ALIGNMENT_SIZE - ((uintptr_t )(addr) % WORD_ALIGNMENT_SIZE);\
if ((pad_size) == WORD_ALIGNMENT_SIZE) (pad_size) = 0;\
} while (0)
#define ALIGNMENT_RIGHT(addr) do {\
(addr) += (WORD_ALIGNMENT_SIZE - 1);\
(addr) -= ((uintptr_t )(addr) % WORD_ALIGNMENT_SIZE);\
} while (0)
#endif /* PLATFORM_UNALIGNED_WORD_ACCESS */
#ifdef USE_CALLOUT
typedef struct {
int flag;
OnigCalloutOf of;
int in;
int name_id;
const UChar* tag_start;
const UChar* tag_end;
int flag;
OnigCalloutOf of;
int in;
int name_id;
const UChar* tag_start;
const UChar* tag_end;
OnigCalloutType type;
OnigCalloutFunc start_func;
OnigCalloutFunc end_func;
@ -272,7 +236,6 @@ enum OptimizeType {
OPTIMIZE_STR, /* Slow Search */
OPTIMIZE_STR_FAST, /* Sunday quick search / BMH */
OPTIMIZE_STR_FAST_STEP_FORWARD, /* Sunday quick search / BMH */
OPTIMIZE_STR_CASE_FOLD_FAST, /* Sunday quick search / BMH (ignore case) */
OPTIMIZE_STR_CASE_FOLD, /* Slow Search (ignore case) */
OPTIMIZE_MAP /* char map */
};
@ -288,6 +251,8 @@ typedef unsigned int MemStatusType;
#define MEM_STATUS_AT0(stats,n) \
((n) > 0 && (n) < (int )MEM_STATUS_BITS_NUM ? ((stats) & ((MemStatusType )1 << n)) : ((stats) & 1))
#define MEM_STATUS_IS_ALL_ON(stats) (((stats) & 1) != 0)
#define MEM_STATUS_ON(stats,n) do {\
if ((n) < (int )MEM_STATUS_BITS_NUM) {\
if ((n) != 0)\
@ -302,8 +267,14 @@ typedef unsigned int MemStatusType;
(stats) |= ((MemStatusType )1 << (n));\
} while (0)
#define MEM_STATUS_LIMIT_AT(stats,n) \
((n) < (int )MEM_STATUS_BITS_NUM ? ((stats) & ((MemStatusType )1 << n)) : 0)
#define MEM_STATUS_LIMIT_ON(stats,n) do {\
if ((n) < (int )MEM_STATUS_BITS_NUM && (n) != 0) {\
(stats) |= ((MemStatusType )1 << (n));\
}\
} while (0)
#define INT_MAX_LIMIT ((1UL << (SIZEOF_INT * 8 - 1)) - 1)
#define IS_CODE_WORD_ASCII(enc,code) \
(ONIGENC_IS_CODE_ASCII(code) && ONIGENC_IS_CODE_WORD(enc,code))
@ -354,16 +325,12 @@ typedef unsigned int MemStatusType;
/* bitset */
#define BITS_PER_BYTE 8
#define SINGLE_BYTE_SIZE (1 << BITS_PER_BYTE)
#define BITS_IN_ROOM (sizeof(Bits) * BITS_PER_BYTE)
#define BITS_IN_ROOM 32 /* 4 * BITS_PER_BYTE */
#define BITSET_SIZE (SINGLE_BYTE_SIZE / BITS_IN_ROOM)
#ifdef PLATFORM_UNALIGNED_WORD_ACCESS
typedef unsigned int Bits;
#else
typedef unsigned char Bits;
#endif
typedef Bits BitSet[BITSET_SIZE];
typedef Bits* BitSetRef;
typedef uint32_t Bits;
typedef Bits BitSet[BITSET_SIZE];
typedef Bits* BitSetRef;
#define SIZE_BITSET sizeof(BitSet)
@ -372,8 +339,8 @@ typedef Bits* BitSetRef;
for (i = 0; i < (int )BITSET_SIZE; i++) { (bs)[i] = 0; } \
} while (0)
#define BS_ROOM(bs,pos) (bs)[pos / BITS_IN_ROOM]
#define BS_BIT(pos) (1 << (pos % BITS_IN_ROOM))
#define BS_ROOM(bs,pos) (bs)[(unsigned int )(pos) >> 5]
#define BS_BIT(pos) (1u << ((unsigned int )(pos) & 0x1f))
#define BITSET_AT(bs, pos) (BS_ROOM(bs,pos) & BS_BIT(pos))
#define BITSET_SET_BIT(bs, pos) BS_ROOM(bs,pos) |= BS_BIT(pos)
@ -389,11 +356,13 @@ typedef struct _BBuf {
#define BB_INIT(buf,size) bbuf_init((BBuf* )(buf), (size))
/*
#define BB_SIZE_INC(buf,inc) do{\
(buf)->alloc += (inc);\
(buf)->p = (UChar* )xrealloc((buf)->p, (buf)->alloc);\
if (IS_NULL((buf)->p)) return(ONIGERR_MEMORY);\
} while (0)
*/
#define BB_EXPAND(buf,low) do{\
do { (buf)->alloc *= 2; } while ((buf)->alloc < (unsigned int )low);\
@ -491,39 +460,34 @@ typedef struct _BBuf {
/* operation code */
enum OpCode {
OP_FINISH = 0, /* matching process terminator (no more alternative) */
OP_END = 1, /* pattern code terminator (success end) */
OP_EXACT1 = 2, /* single byte, N = 1 */
OP_EXACT2, /* single byte, N = 2 */
OP_EXACT3, /* single byte, N = 3 */
OP_EXACT4, /* single byte, N = 4 */
OP_EXACT5, /* single byte, N = 5 */
OP_EXACTN, /* single byte */
OP_EXACTMB2N1, /* mb-length = 2 N = 1 */
OP_EXACTMB2N2, /* mb-length = 2 N = 2 */
OP_EXACTMB2N3, /* mb-length = 2 N = 3 */
OP_EXACTMB2N, /* mb-length = 2 */
OP_EXACTMB3N, /* mb-length = 3 */
OP_EXACTMBN, /* other length */
OP_EXACT1_IC, /* single byte, N = 1, ignore case */
OP_EXACTN_IC, /* single byte, ignore case */
OP_FINISH = 0, /* matching process terminator (no more alternative) */
OP_END = 1, /* pattern code terminator (success end) */
OP_STR_1 = 2, /* single byte, N = 1 */
OP_STR_2, /* single byte, N = 2 */
OP_STR_3, /* single byte, N = 3 */
OP_STR_4, /* single byte, N = 4 */
OP_STR_5, /* single byte, N = 5 */
OP_STR_N, /* single byte */
OP_STR_MB2N1, /* mb-length = 2 N = 1 */
OP_STR_MB2N2, /* mb-length = 2 N = 2 */
OP_STR_MB2N3, /* mb-length = 2 N = 3 */
OP_STR_MB2N, /* mb-length = 2 */
OP_STR_MB3N, /* mb-length = 3 */
OP_STR_MBN, /* other length */
OP_STR_1_IC, /* single byte, N = 1, ignore case */
OP_STR_N_IC, /* single byte, ignore case */
OP_CCLASS,
OP_CCLASS_MB,
OP_CCLASS_MIX,
OP_CCLASS_NOT,
OP_CCLASS_MB_NOT,
OP_CCLASS_MIX_NOT,
OP_ANYCHAR, /* "." */
OP_ANYCHAR_ML, /* "." multi-line */
OP_ANYCHAR_STAR, /* ".*" */
OP_ANYCHAR_ML_STAR, /* ".*" multi-line */
OP_ANYCHAR_STAR_PEEK_NEXT,
OP_ANYCHAR_ML_STAR_PEEK_NEXT,
OP_WORD,
OP_WORD_ASCII,
OP_NO_WORD,
@ -532,16 +496,13 @@ enum OpCode {
OP_NO_WORD_BOUNDARY,
OP_WORD_BEGIN,
OP_WORD_END,
OP_TEXT_SEGMENT_BOUNDARY,
OP_BEGIN_BUF,
OP_END_BUF,
OP_BEGIN_LINE,
OP_END_LINE,
OP_SEMI_END_BUF,
OP_BEGIN_POSITION,
OP_BACKREF1,
OP_BACKREF2,
OP_BACKREF_N,
@ -552,34 +513,35 @@ enum OpCode {
OP_BACKREF_WITH_LEVEL_IC, /* \k<xxx+n>, \k<xxx-n> */
OP_BACKREF_CHECK, /* (?(n)), (?('name')) */
OP_BACKREF_CHECK_WITH_LEVEL, /* (?(n-level)), (?('name-level')) */
OP_MEMORY_START,
OP_MEMORY_START_PUSH, /* push back-tracker to stack */
OP_MEMORY_END_PUSH, /* push back-tracker to stack */
OP_MEMORY_END_PUSH_REC, /* push back-tracker to stack */
OP_MEMORY_END,
OP_MEMORY_END_REC, /* push marker to stack */
OP_MEM_START,
OP_MEM_START_PUSH, /* push back-tracker to stack */
OP_MEM_END_PUSH, /* push back-tracker to stack */
#ifdef USE_CALL
OP_MEM_END_PUSH_REC, /* push back-tracker to stack */
#endif
OP_MEM_END,
#ifdef USE_CALL
OP_MEM_END_REC, /* push marker to stack */
#endif
OP_FAIL, /* pop stack and move */
OP_JUMP,
OP_PUSH,
OP_PUSH_SUPER,
OP_POP_OUT,
#ifdef USE_OP_PUSH_OR_JUMP_EXACT
OP_PUSH_OR_JUMP_EXACT1, /* if match exact then push, else jump. */
OP_PUSH_OR_JUMP_EXACT1, /* if match exact then push, else jump. */
#endif
OP_PUSH_IF_PEEK_NEXT, /* if match exact then push, else none. */
OP_REPEAT, /* {n,m} */
OP_REPEAT_NG, /* {n,m}? (non greedy) */
OP_PUSH_IF_PEEK_NEXT, /* if match exact then push, else none. */
OP_REPEAT, /* {n,m} */
OP_REPEAT_NG, /* {n,m}? (non greedy) */
OP_REPEAT_INC,
OP_REPEAT_INC_NG, /* non greedy */
OP_REPEAT_INC_SG, /* search and get in stack */
OP_REPEAT_INC_NG_SG, /* search and get in stack (non greedy) */
OP_REPEAT_INC_NG, /* non greedy */
OP_EMPTY_CHECK_START, /* null loop checker start */
OP_EMPTY_CHECK_END, /* null loop checker end */
OP_EMPTY_CHECK_END_MEMST, /* null loop checker end (with capture status) */
#ifdef USE_CALL
OP_EMPTY_CHECK_END_MEMST_PUSH, /* with capture status and push check-end */
#endif
OP_PREC_READ_START, /* (?=...) start */
OP_PREC_READ_END, /* (?=...) end */
OP_PREC_READ_NOT_START, /* (?!...) start */
@ -589,11 +551,12 @@ enum OpCode {
OP_LOOK_BEHIND, /* (?<=...) start (no needs end opcode) */
OP_LOOK_BEHIND_NOT_START, /* (?<!...) start */
OP_LOOK_BEHIND_NOT_END, /* (?<!...) end */
OP_CALL, /* \g<name> */
OP_RETURN,
OP_PUSH_SAVE_VAL,
OP_UPDATE_VAR,
#ifdef USE_CALL
OP_CALL, /* \g<name> */
OP_RETURN,
#endif
#ifdef USE_CALLOUT
OP_CALLOUT_CONTENTS, /* (?{...}) (?{{...}}) */
OP_CALLOUT_NAME, /* (*name) (*name[tag](args...)) */
@ -601,8 +564,8 @@ enum OpCode {
};
enum SaveType {
SAVE_KEEP = 0, /* SAVE S */
SAVE_S = 1,
SAVE_KEEP = 0, /* SAVE S */
SAVE_S = 1,
SAVE_RIGHT_RANGE = 2,
};
@ -642,116 +605,57 @@ typedef int ModeType;
#define SIZE_UPDATE_VAR_TYPE sizeof(UpdateVarType)
#define SIZE_MODE sizeof(ModeType)
#define GET_RELADDR_INC(addr,p) PLATFORM_GET_INC(addr, p, RelAddrType)
#define GET_ABSADDR_INC(addr,p) PLATFORM_GET_INC(addr, p, AbsAddrType)
#define GET_LENGTH_INC(len,p) PLATFORM_GET_INC(len, p, LengthType)
#define GET_MEMNUM_INC(num,p) PLATFORM_GET_INC(num, p, MemNumType)
#define GET_REPEATNUM_INC(num,p) PLATFORM_GET_INC(num, p, RepeatNumType)
#define GET_OPTION_INC(option,p) PLATFORM_GET_INC(option, p, OnigOptionType)
#define GET_POINTER_INC(ptr,p) PLATFORM_GET_INC(ptr, p, PointerType)
#define GET_SAVE_TYPE_INC(type,p) PLATFORM_GET_INC(type, p, SaveType)
#define GET_UPDATE_VAR_TYPE_INC(type,p) PLATFORM_GET_INC(type, p, UpdateVarType)
#define GET_MODE_INC(mode,p) PLATFORM_GET_INC(mode, p, ModeType)
/* code point's address must be aligned address. */
#define GET_CODE_POINT(code,p) code = *((OnigCodePoint* )(p))
#define GET_BYTE_INC(byte,p) do{\
byte = *(p);\
(p)++;\
} while(0)
/* op-code + arg size */
#if 0
#define SIZE_OP_ANYCHAR_STAR SIZE_OPCODE
#define SIZE_OP_ANYCHAR_STAR_PEEK_NEXT (SIZE_OPCODE + 1)
#define SIZE_OP_JUMP (SIZE_OPCODE + SIZE_RELADDR)
#define SIZE_OP_PUSH (SIZE_OPCODE + SIZE_RELADDR)
#define SIZE_OP_PUSH_SUPER (SIZE_OPCODE + SIZE_RELADDR)
#define SIZE_OP_POP_OUT SIZE_OPCODE
#ifdef USE_OP_PUSH_OR_JUMP_EXACT
#define SIZE_OP_PUSH_OR_JUMP_EXACT1 (SIZE_OPCODE + SIZE_RELADDR + 1)
#endif
#define SIZE_OP_PUSH_IF_PEEK_NEXT (SIZE_OPCODE + SIZE_RELADDR + 1)
#define SIZE_OP_REPEAT_INC (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_REPEAT_INC_NG (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_WORD_BOUNDARY (SIZE_OPCODE + SIZE_MODE)
#define SIZE_OP_PREC_READ_START SIZE_OPCODE
#define SIZE_OP_PREC_READ_NOT_START (SIZE_OPCODE + SIZE_RELADDR)
#define SIZE_OP_PREC_READ_END SIZE_OPCODE
#define SIZE_OP_PREC_READ_NOT_END SIZE_OPCODE
#define SIZE_OP_FAIL SIZE_OPCODE
#define SIZE_OP_MEMORY_START (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_MEMORY_START_PUSH (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_MEMORY_END_PUSH (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_MEMORY_END_PUSH_REC (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_MEMORY_END (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_MEMORY_END_REC (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_ATOMIC_START SIZE_OPCODE
#define SIZE_OP_ATOMIC_END SIZE_OPCODE
#define SIZE_OP_EMPTY_CHECK_START (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_EMPTY_CHECK_END (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_LOOK_BEHIND (SIZE_OPCODE + SIZE_LENGTH)
#define SIZE_OP_LOOK_BEHIND_NOT_START (SIZE_OPCODE + SIZE_RELADDR + SIZE_LENGTH)
#define SIZE_OP_LOOK_BEHIND_NOT_END SIZE_OPCODE
#define SIZE_OP_CALL (SIZE_OPCODE + SIZE_ABSADDR)
#define SIZE_OP_RETURN SIZE_OPCODE
#define SIZE_OP_PUSH_SAVE_VAL (SIZE_OPCODE + SIZE_SAVE_TYPE + SIZE_MEMNUM)
#define SIZE_OP_UPDATE_VAR (SIZE_OPCODE + SIZE_UPDATE_VAR_TYPE + SIZE_MEMNUM)
#ifdef USE_CALLOUT
#define SIZE_OP_CALLOUT_CONTENTS (SIZE_OPCODE + SIZE_MEMNUM)
#define SIZE_OP_CALLOUT_NAME (SIZE_OPCODE + SIZE_MEMNUM + SIZE_MEMNUM)
#endif
#else /* if 0 */
/* for relative address increment to go next op. */
#define SIZE_INC_OP 1
#define SIZE_INC 1
#define SIZE_OP_ANYCHAR_STAR 1
#define SIZE_OP_ANYCHAR_STAR_PEEK_NEXT 1
#define SIZE_OP_JUMP 1
#define SIZE_OP_PUSH 1
#define SIZE_OP_PUSH_SUPER 1
#define SIZE_OP_POP_OUT 1
#define OPSIZE_ANYCHAR_STAR 1
#define OPSIZE_ANYCHAR_STAR_PEEK_NEXT 1
#define OPSIZE_JUMP 1
#define OPSIZE_PUSH 1
#define OPSIZE_PUSH_SUPER 1
#define OPSIZE_POP_OUT 1
#ifdef USE_OP_PUSH_OR_JUMP_EXACT
#define SIZE_OP_PUSH_OR_JUMP_EXACT1 1
#define OPSIZE_PUSH_OR_JUMP_EXACT1 1
#endif
#define SIZE_OP_PUSH_IF_PEEK_NEXT 1
#define SIZE_OP_REPEAT 1
#define SIZE_OP_REPEAT_INC 1
#define SIZE_OP_REPEAT_INC_NG 1
#define SIZE_OP_WORD_BOUNDARY 1
#define SIZE_OP_PREC_READ_START 1
#define SIZE_OP_PREC_READ_NOT_START 1
#define SIZE_OP_PREC_READ_END 1
#define SIZE_OP_PREC_READ_NOT_END 1
#define SIZE_OP_BACKREF 1
#define SIZE_OP_FAIL 1
#define SIZE_OP_MEMORY_START 1
#define SIZE_OP_MEMORY_START_PUSH 1
#define SIZE_OP_MEMORY_END_PUSH 1
#define SIZE_OP_MEMORY_END_PUSH_REC 1
#define SIZE_OP_MEMORY_END 1
#define SIZE_OP_MEMORY_END_REC 1
#define SIZE_OP_ATOMIC_START 1
#define SIZE_OP_ATOMIC_END 1
#define SIZE_OP_EMPTY_CHECK_START 1
#define SIZE_OP_EMPTY_CHECK_END 1
#define SIZE_OP_LOOK_BEHIND 1
#define SIZE_OP_LOOK_BEHIND_NOT_START 1
#define SIZE_OP_LOOK_BEHIND_NOT_END 1
#define SIZE_OP_CALL 1
#define SIZE_OP_RETURN 1
#define SIZE_OP_PUSH_SAVE_VAL 1
#define SIZE_OP_UPDATE_VAR 1
#define OPSIZE_PUSH_IF_PEEK_NEXT 1
#define OPSIZE_REPEAT 1
#define OPSIZE_REPEAT_INC 1
#define OPSIZE_REPEAT_INC_NG 1
#define OPSIZE_WORD_BOUNDARY 1
#define OPSIZE_PREC_READ_START 1
#define OPSIZE_PREC_READ_NOT_START 1
#define OPSIZE_PREC_READ_END 1
#define OPSIZE_PREC_READ_NOT_END 1
#define OPSIZE_BACKREF 1
#define OPSIZE_FAIL 1
#define OPSIZE_MEM_START 1
#define OPSIZE_MEM_START_PUSH 1
#define OPSIZE_MEM_END_PUSH 1
#define OPSIZE_MEM_END_PUSH_REC 1
#define OPSIZE_MEM_END 1
#define OPSIZE_MEM_END_REC 1
#define OPSIZE_ATOMIC_START 1
#define OPSIZE_ATOMIC_END 1
#define OPSIZE_EMPTY_CHECK_START 1
#define OPSIZE_EMPTY_CHECK_END 1
#define OPSIZE_LOOK_BEHIND 1
#define OPSIZE_LOOK_BEHIND_NOT_START 1
#define OPSIZE_LOOK_BEHIND_NOT_END 1
#define OPSIZE_CALL 1
#define OPSIZE_RETURN 1
#define OPSIZE_PUSH_SAVE_VAL 1
#define OPSIZE_UPDATE_VAR 1
#ifdef USE_CALLOUT
#define SIZE_OP_CALLOUT_CONTENTS 1
#define SIZE_OP_CALLOUT_NAME 1
#define OPSIZE_CALLOUT_CONTENTS 1
#define OPSIZE_CALLOUT_NAME 1
#endif
#endif /* if 0 */
#define MC_ESC(syn) (syn)->meta_char_table.esc
@ -882,7 +786,7 @@ typedef struct {
} repeat; /* REPEAT, REPEAT_NG */
struct {
MemNumType id;
} repeat_inc; /* REPEAT_INC, REPEAT_INC_SG, REPEAT_INC_NG, REPEAT_INC_NG_SG */
} repeat_inc; /* REPEAT_INC, REPEAT_INC_NG */
struct {
MemNumType mem;
} empty_check_start;
@ -933,48 +837,58 @@ typedef struct {
#endif
} RegexExt;
typedef struct {
int lower;
int upper;
union {
Operation* pcode; /* address of repeated body */
int offset;
} u;
} RepeatRange;
struct re_pattern_buffer {
/* common members of BBuf(bytes-buffer) */
Operation* ops;
#ifdef USE_DIRECT_THREADED_CODE
enum OpCode* ocs;
#endif
Operation* ops_curr;
unsigned int ops_used; /* used space for ops */
unsigned int ops_alloc; /* allocated space for ops */
Operation* ops_curr;
unsigned int ops_used; /* used space for ops */
unsigned int ops_alloc; /* allocated space for ops */
unsigned char* string_pool;
unsigned char* string_pool_end;
int num_mem; /* used memory(...) num counted from 1 */
int num_repeat; /* OP_REPEAT/OP_REPEAT_NG id-counter */
int num_null_check; /* OP_EMPTY_CHECK_START/END id counter */
int num_call; /* number of subexp call */
unsigned int capture_history; /* (?@...) flag (1-31) */
unsigned int bt_mem_start; /* need backtrack flag */
unsigned int bt_mem_end; /* need backtrack flag */
int stack_pop_level;
int repeat_range_alloc;
OnigRepeatRange* repeat_range;
int num_mem; /* used memory(...) num counted from 1 */
int num_repeat; /* OP_REPEAT/OP_REPEAT_NG id-counter */
int num_empty_check; /* OP_EMPTY_CHECK_START/END id counter */
int num_call; /* number of subexp call */
MemStatusType capture_history; /* (?@...) flag (1-31) */
MemStatusType push_mem_start; /* need backtrack flag */
MemStatusType push_mem_end; /* need backtrack flag */
MemStatusType empty_status_mem;
int stack_pop_level;
int repeat_range_alloc;
RepeatRange* repeat_range;
OnigEncoding enc;
OnigOptionType options;
OnigSyntaxType* syntax;
OnigCaseFoldType case_fold_flag;
void* name_table;
OnigEncoding enc;
OnigOptionType options;
OnigSyntaxType* syntax;
OnigCaseFoldType case_fold_flag;
void* name_table;
/* optimization info (string search, char-map and anchors) */
int optimize; /* optimize flag */
int threshold_len; /* search str-length for apply optimize */
int anchor; /* BEGIN_BUF, BEGIN_POS, (SEMI_)END_BUF */
OnigLen anchor_dmin; /* (SEMI_)END_BUF anchor distance */
OnigLen anchor_dmax; /* (SEMI_)END_BUF anchor distance */
OnigLen anc_dist_min; /* (SEMI_)END_BUF anchor distance */
OnigLen anc_dist_max; /* (SEMI_)END_BUF anchor distance */
int sub_anchor; /* start-anchor for exact or map */
unsigned char *exact;
unsigned char *exact_end;
unsigned char map[CHAR_MAP_SIZE]; /* used as BMH skip or char-map */
int map_offset;
OnigLen dmin; /* min-distance of exact or map */
OnigLen dmax; /* max-distance of exact or map */
OnigLen dist_min; /* min-distance of exact or map */
OnigLen dist_max; /* max-distance of exact or map */
RegexExt* extp;
};

File diff suppressed because it is too large Load Diff

View File

@ -4,7 +4,7 @@
regparse.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -32,7 +32,7 @@
#include "regint.h"
#define NODE_STRING_MARGIN 16
#define NODE_STRING_BUF_SIZE 24 /* sizeof(CClassNode) - sizeof(int)*4 */
#define NODE_STRING_BUF_SIZE 20 /* sizeof(CClassNode) - sizeof(int)*4 */
#define NODE_BACKREFS_SIZE 6
/* node type */
@ -73,20 +73,25 @@ enum BodyEmptyType {
BODY_IS_EMPTY_POSSIBILITY_REC = 3
};
struct _Node;
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
UChar* s;
UChar* end;
unsigned int flag;
int capacity; /* (allocated size - 1) or 0: use buf[] */
UChar buf[NODE_STRING_BUF_SIZE];
int capacity; /* (allocated size - 1) or 0: use buf[] */
int case_min_len;
} StrNode;
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
unsigned int flags;
BitSet bs;
@ -96,6 +101,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
struct _Node* body;
int lower;
@ -104,12 +110,13 @@ typedef struct {
enum BodyEmptyType emptiness;
struct _Node* head_exact;
struct _Node* next_head_exact;
int is_refered; /* include called node. don't eliminate even if {0} */
int include_referred; /* include called node. don't eliminate even if {0} */
} QuantNode;
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
struct _Node* body;
enum BagType type;
@ -152,6 +159,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
struct _Node* body; /* to BagNode : BAG_MEMORY */
int by_number;
@ -166,6 +174,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
int back_num;
int back_static[NODE_BACKREFS_SIZE];
@ -176,6 +185,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
struct _Node* body;
int type;
@ -186,6 +196,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
struct _Node* car;
struct _Node* cdr;
@ -194,6 +205,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
int ctype;
int not;
@ -204,6 +216,7 @@ typedef struct {
typedef struct {
NodeType node_type;
int status;
struct _Node* parent;
enum GimmickType type;
int detail_type;
@ -216,6 +229,7 @@ typedef struct _Node {
struct {
NodeType node_type;
int status;
struct _Node* parent;
struct _Node* body;
} base;
@ -280,26 +294,21 @@ typedef struct _Node {
#define ANCR_ANYCHAR_INF_MASK (ANCR_ANYCHAR_INF | ANCR_ANYCHAR_INF_ML)
#define ANCR_END_BUF_MASK (ANCR_END_BUF | ANCR_SEMI_END_BUF)
#define NODE_STRING_RAW (1<<0) /* by backslashed number */
#define NODE_STRING_AMBIG (1<<1)
#define NODE_STRING_GOOD_AMBIG (1<<2)
#define NODE_STRING_DONT_GET_OPT_INFO (1<<3)
#define NODE_STRING_CRUDE (1<<0)
#define NODE_STRING_CASE_EXPANDED (1<<1)
#define NODE_STRING_CASE_FOLD_MATCH (1<<2)
#define NODE_STRING_LEN(node) (int )((node)->u.str.end - (node)->u.str.s)
#define NODE_STRING_SET_RAW(node) (node)->u.str.flag |= NODE_STRING_RAW
#define NODE_STRING_CLEAR_RAW(node) (node)->u.str.flag &= ~NODE_STRING_RAW
#define NODE_STRING_SET_AMBIG(node) (node)->u.str.flag |= NODE_STRING_AMBIG
#define NODE_STRING_SET_GOOD_AMBIG(node) (node)->u.str.flag |= NODE_STRING_GOOD_AMBIG
#define NODE_STRING_SET_DONT_GET_OPT_INFO(node) \
(node)->u.str.flag |= NODE_STRING_DONT_GET_OPT_INFO
#define NODE_STRING_IS_RAW(node) \
(((node)->u.str.flag & NODE_STRING_RAW) != 0)
#define NODE_STRING_IS_AMBIG(node) \
(((node)->u.str.flag & NODE_STRING_AMBIG) != 0)
#define NODE_STRING_IS_GOOD_AMBIG(node) \
(((node)->u.str.flag & NODE_STRING_GOOD_AMBIG) != 0)
#define NODE_STRING_IS_DONT_GET_OPT_INFO(node) \
(((node)->u.str.flag & NODE_STRING_DONT_GET_OPT_INFO) != 0)
#define NODE_STRING_SET_CRUDE(node) (node)->u.str.flag |= NODE_STRING_CRUDE
#define NODE_STRING_CLEAR_CRUDE(node) (node)->u.str.flag &= ~NODE_STRING_CRUDE
#define NODE_STRING_SET_CASE_EXPANDED(node) (node)->u.str.flag |= NODE_STRING_CASE_EXPANDED
#define NODE_STRING_SET_CASE_FOLD_MATCH(node) (node)->u.str.flag |= NODE_STRING_CASE_FOLD_MATCH
#define NODE_STRING_IS_CRUDE(node) \
(((node)->u.str.flag & NODE_STRING_CRUDE) != 0)
#define NODE_STRING_IS_CASE_EXPANDED(node) \
(((node)->u.str.flag & NODE_STRING_CASE_EXPANDED) != 0)
#define NODE_STRING_IS_CASE_FOLD_MATCH(node) \
(((node)->u.str.flag & NODE_STRING_CASE_FOLD_MATCH) != 0)
#define BACKREFS_P(br) \
(IS_NOT_NULL((br)->back_dynamic) ? (br)->back_dynamic : (br)->back_static)
@ -326,6 +335,7 @@ typedef struct _Node {
#define NODE_ST_FIXED_OPTION (1<<18)
#define NODE_ST_PROHIBIT_RECURSION (1<<19)
#define NODE_ST_SUPER (1<<20)
#define NODE_ST_EMPTY_STATUS_CHECK (1<<21)
#define NODE_STATUS(node) (((Node* )node)->u.base.status)
@ -355,7 +365,10 @@ typedef struct _Node {
((NODE_STATUS(node) & NODE_ST_PROHIBIT_RECURSION) != 0)
#define NODE_IS_STRICT_REAL_REPEAT(node) \
((NODE_STATUS(node) & NODE_ST_STRICT_REAL_REPEAT) != 0)
#define NODE_IS_EMPTY_STATUS_CHECK(node) \
((NODE_STATUS(node) & NODE_ST_EMPTY_STATUS_CHECK) != 0)
#define NODE_PARENT(node) ((node)->u.base.parent)
#define NODE_BODY(node) ((node)->u.base.body)
#define NODE_QUANT_BODY(node) ((node)->body)
#define NODE_BAG_BODY(node) ((node)->body)
@ -368,11 +381,8 @@ typedef struct _Node {
(senv)->mem_env_dynamic : (senv)->mem_env_static)
typedef struct {
Node* node;
#if 0
int in;
int recursion;
#endif
Node* mem_node;
Node* empty_repeat_node;
} MemEnv;
typedef struct {
@ -384,9 +394,8 @@ typedef struct {
OnigCaseFoldType case_fold_flag;
OnigEncoding enc;
OnigSyntaxType* syntax;
MemStatusType capture_history;
MemStatusType bt_mem_start;
MemStatusType bt_mem_end;
MemStatusType cap_history;
MemStatusType backtrack_mem; /* backtrack/recursion */
MemStatusType backrefed_mem;
UChar* pattern;
UChar* pattern_end;
@ -404,7 +413,10 @@ typedef struct {
MemEnv mem_env_static[SCANENV_MEMENV_SIZE];
MemEnv* mem_env_dynamic;
unsigned int parse_depth;
#ifdef ONIG_DEBUG_PARSE
unsigned int max_parse_depth;
#endif
int backref_num;
int keep_num;
int save_num;
int save_alloc_num;
@ -425,9 +437,7 @@ extern int onig_renumber_name_table P_((regex_t* reg, GroupNumRemap* map));
extern int onig_strncmp P_((const UChar* s1, const UChar* s2, int n));
extern void onig_strcpy P_((UChar* dest, const UChar* src, const UChar* end));
extern void onig_scan_env_set_error_string P_((ScanEnv* env, int ecode, UChar* arg, UChar* arg_end));
extern int onig_scan_unsigned_number P_((UChar** src, const UChar* end, OnigEncoding enc));
extern void onig_reduce_nested_quantifier P_((Node* pnode, Node* cnode));
extern void onig_node_conv_to_str_node P_((Node* node, int raw));
extern int onig_reduce_nested_quantifier P_((Node* pnode));
extern int onig_node_str_cat P_((Node* node, const UChar* s, const UChar* end));
extern int onig_node_str_set P_((Node* node, const UChar* s, const UChar* end));
extern void onig_node_free P_((Node* node));
@ -435,13 +445,13 @@ extern Node* onig_node_new_bag P_((enum BagType type));
extern Node* onig_node_new_anchor P_((int type, int ascii_mode));
extern Node* onig_node_new_str P_((const UChar* s, const UChar* end));
extern Node* onig_node_new_list P_((Node* left, Node* right));
extern Node* onig_node_list_add P_((Node* list, Node* x));
extern Node* onig_node_new_alt P_((Node* left, Node* right));
extern void onig_node_str_clear P_((Node* node));
extern int onig_names_free P_((regex_t* reg));
extern int onig_parse_tree P_((Node** root, const UChar* pattern, const UChar* end, regex_t* reg, ScanEnv* env));
extern int onig_free_shared_cclass_table P_((void));
extern int onig_is_code_in_cc P_((OnigEncoding enc, OnigCodePoint code, CClassNode* cc));
extern int onig_new_cclass_with_code_list(Node** rnode, OnigEncoding enc, int n, OnigCodePoint codes[]);
extern OnigLen onig_get_tiny_min_len(Node* node, unsigned int inhibit_node_types, int* invalid_node);
#ifdef USE_CALLOUT
@ -452,16 +462,4 @@ extern int onig_global_callout_names_free(void);
extern int onig_print_names(FILE*, regex_t*);
#endif
#if (defined (__GNUC__) && __GNUC__ > 2 ) && !defined(DARWIN) && !defined(__hpux) && !defined(_AIX)
# define UNEXPECTED(condition) __builtin_expect(condition, 0)
#else
# define UNEXPECTED(condition) (condition)
#endif
#define SAFE_ENC_LEN(enc, p, end, res) do { \
int __res = enclen(enc, p); \
if (UNEXPECTED(p + __res > end)) __res = end - p; \
res = __res; \
} while(0);
#endif /* REGPARSE_H */

View File

@ -2,7 +2,7 @@
regposerr.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
regposix.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
regsyntax.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
regtrav.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2004 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
regversion.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -2,7 +2,7 @@
sjis.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -149,10 +149,6 @@ code_to_mbc(OnigCodePoint code, UChar *buf)
if ((code & 0xff00) != 0) *p++ = (UChar )(((code >> 8) & 0xff));
*p++ = (UChar )(code & 0xff);
#if 0
if (enclen(ONIG_ENCODING_SJIS, buf) != (p - buf))
return REGERR_INVALID_CODE_POINT_VALUE;
#endif
return (int )(p - buf);
}
@ -179,31 +175,6 @@ mbc_case_fold(OnigCaseFoldType flag ARG_UNUSED,
}
}
#if 0
static int
is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
return onigenc_mbn_is_mbc_ambiguous(ONIG_ENCODING_SJIS, flag, pp, end);
}
#endif
#if 0
static int
is_code_ctype(OnigCodePoint code, unsigned int ctype)
{
if (code < 128)
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
else {
if (CTYPE_IS_WORD_GRAPH_PRINT(ctype)) {
return (code_to_mbclen(code) > 1 ? TRUE : FALSE);
}
}
return FALSE;
}
#endif
static UChar*
left_adjust_char_head(const UChar* start, const UChar* s)
{

View File

@ -1,5 +1,5 @@
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -pt -T -L ANSI-C -N onigenc_sjis_lookup_property_name --output-file gperf2.tmp sjis_prop.gperf */
/* Command-line: gperf -pt -T -L ANSI-C -N onigenc_sjis_lookup_property_name --output-file gperf2.tmp sjis_prop.gperf */
/* Computed positions: -k'1,3' */
#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \

View File

@ -2,7 +2,7 @@
unicode.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -356,16 +356,15 @@ onigenc_unicode_get_case_fold_codes_by_str(OnigEncoding enc,
for (fn = 0; fn < 2; fn++) {
int index;
cs[fn][0] = FOLDS2_FOLD(buk->index)[fn];
ncs[fn] = 1;
index = onigenc_unicode_fold1_key(&cs[fn][0]);
if (index >= 0) {
int m = FOLDS1_UNFOLDS_NUM(index);
for (i = 0; i < m; i++) {
cs[fn][i+1] = FOLDS1_UNFOLDS(index)[i];
}
ncs[fn] = m + 1;
ncs[fn] += m;
}
else
ncs[fn] = 1;
}
for (i = 0; i < ncs[0]; i++) {
@ -393,16 +392,15 @@ onigenc_unicode_get_case_fold_codes_by_str(OnigEncoding enc,
for (fn = 0; fn < 3; fn++) {
int index;
cs[fn][0] = FOLDS3_FOLD(buk->index)[fn];
ncs[fn] = 1;
index = onigenc_unicode_fold1_key(&cs[fn][0]);
if (index >= 0) {
int m = FOLDS1_UNFOLDS_NUM(index);
for (i = 0; i < m; i++) {
cs[fn][i+1] = FOLDS1_UNFOLDS(index)[i];
}
ncs[fn] = m + 1;
ncs[fn] += m;
}
else
ncs[fn] = 1;
}
for (i = 0; i < ncs[0]; i++) {

View File

@ -1,6 +1,6 @@
/* unicode_egcb_data.c: Generated by make_unicode_egcb_data.py. */
/*-
* Copyright (c) 2017-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2017-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -25,7 +25,7 @@
* SUCH DAMAGE.
*/
#define GRAPHEME_BREAK_PROPERTY_VERSION 12_1_0
#define GRAPHEME_BREAK_PROPERTY_VERSION 120100
/*
CR

View File

@ -1,7 +1,7 @@
/* This file was converted by gperf_fold_key_conv.py
from gperf output file. */
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold1_key unicode_fold1_key.gperf */
/* Command-line: gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold1_key unicode_fold1_key.gperf */
/* Computed positions: -k'1-3' */
@ -9,7 +9,7 @@
/* This gperf source file was generated by make_unicode_fold_data.py */
/*-
* Copyright (c) 2017-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2017-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -2983,7 +2983,7 @@ onigenc_unicode_fold1_key(OnigCodePoint codes[])
4026
};
if (0 == 0)
{
int key = hash(codes);

View File

@ -1,7 +1,7 @@
/* This file was converted by gperf_fold_key_conv.py
from gperf output file. */
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold2_key unicode_fold2_key.gperf */
/* Command-line: gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold2_key unicode_fold2_key.gperf */
/* Computed positions: -k'3,6' */
@ -9,7 +9,7 @@
/* This gperf source file was generated by make_unicode_fold_data.py */
/*-
* Copyright (c) 2017-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2017-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -211,7 +211,7 @@ onigenc_unicode_fold2_key(OnigCodePoint codes[])
129
};
if (0 == 0)
{
int key = hash(codes);

View File

@ -1,7 +1,7 @@
/* This file was converted by gperf_fold_key_conv.py
from gperf output file. */
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold3_key unicode_fold3_key.gperf */
/* Command-line: gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold3_key unicode_fold3_key.gperf */
/* Computed positions: -k'3,6,9' */
@ -9,7 +9,7 @@
/* This gperf source file was generated by make_unicode_fold_data.py */
/*-
* Copyright (c) 2017-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2017-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -121,7 +121,7 @@ onigenc_unicode_fold3_key(OnigCodePoint codes[])
0
};
if (0 == 0)
{
int key = hash(codes);

View File

@ -1,7 +1,7 @@
/* This file was generated by make_unicode_fold_data.py. */
#include "regenc.h"
#define UNICODE_CASEFOLD_VERSION 12_1_0
#define UNICODE_CASEFOLD_VERSION 120100
OnigCodePoint OnigUnicodeFolds1[] = {

View File

@ -1,5 +1,5 @@
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -T -C -c -t -j1 -L ANSI-C --ignore-case --pic -Q unicode_prop_name_pool -N unicode_lookup_property_name --output-file gperf1.tmp unicode_property_data.gperf */
/* Command-line: gperf -T -C -c -t -j1 -L ANSI-C --ignore-case --pic -Q unicode_prop_name_pool -N unicode_lookup_property_name --output-file gperf1.tmp unicode_property_data.gperf */
/* Computed positions: -k'1-3,5-6,12,16,$' */
#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
@ -29580,7 +29580,8 @@ unicode_lookup_property_name (register const char *str, register size_t len)
#define UNICODE_PROPERTY_VERSION 12_1_0
#define UNICODE_PROPERTY_VERSION 120100
#define UNICODE_EMOJI_VERSION 1201
#define PROPERTY_NAME_MAX_SIZE 59
#define CODE_RANGES_NUM 568

View File

@ -1,5 +1,5 @@
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -T -C -c -t -j1 -L ANSI-C --ignore-case --pic -Q unicode_prop_name_pool -N unicode_lookup_property_name --output-file gperf2.tmp unicode_property_data_posix.gperf */
/* Command-line: gperf -T -C -c -t -j1 -L ANSI-C --ignore-case --pic -Q unicode_prop_name_pool -N unicode_lookup_property_name --output-file gperf2.tmp unicode_property_data_posix.gperf */
/* Computed positions: -k'1,3' */
#if !((' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \

View File

@ -1,7 +1,7 @@
/* This file was converted by gperf_unfold_key_conv.py
from gperf output file. */
/* ANSI-C code produced by gperf version 3.1 */
/* Command-line: /usr/local/bin/gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1,0 -N onigenc_unicode_unfold_key unicode_unfold_key.gperf */
/* Command-line: gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1,0 -N onigenc_unicode_unfold_key unicode_unfold_key.gperf */
/* Computed positions: -k'1-3' */
@ -9,7 +9,7 @@
/* This gperf source file was generated by make_unicode_fold_data.py */
/*-
* Copyright (c) 2017-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2017-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -3288,7 +3288,7 @@ onigenc_unicode_unfold_key(OnigCodePoint code)
{0x1e907, 4005, 1}
};
if (0 == 0)
{
int key = hash(&code);

View File

@ -1,6 +1,6 @@
/* unicode_wb_data.c: Generated by make_unicode_wb_data.py. */
/*-
* Copyright (c) 2019 K.Kosako <kkosako0 AT gmail DOT com>
* Copyright (c) 2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -25,7 +25,7 @@
* SUCH DAMAGE.
*/
#define WORD_BREAK_PROPERTY_VERSION 12_1_0
#define WORD_BREAK_PROPERTY_VERSION 120100
/*
ALetter

View File

@ -2,7 +2,7 @@
utf16_be.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -146,18 +146,16 @@ utf16be_is_mbc_newline(const UChar* p, const UChar* end)
}
static OnigCodePoint
utf16be_mbc_to_code(const UChar* p, const UChar* end)
utf16be_mbc_to_code(const UChar* p, const UChar* end ARG_UNUSED)
{
OnigCodePoint code;
if (UTF16_IS_SURROGATE_FIRST(*p)) {
if (end - p < 4) return 0;
code = ((((p[0] - 0xd8) << 2) + ((p[1] & 0xc0) >> 6) + 1) << 16)
+ ((((p[1] & 0x3f) << 2) + (p[2] - 0xdc)) << 8)
+ p[3];
}
else {
if (end - p < 2) return 0;
code = p[0] * 256 + p[1];
}
return code;
@ -229,39 +227,6 @@ utf16be_mbc_case_fold(OnigCaseFoldType flag,
pp, end, fold);
}
#if 0
static int
utf16be_is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
const UChar* p = *pp;
(*pp) += EncLen_UTF16[*p];
if (*p == 0) {
int c, v;
p++;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
return TRUE;
}
c = *p;
v = ONIGENC_IS_UNICODE_ISO_8859_1_BIT_CTYPE(c, (BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xaa, 0xb5, 0xba are lower case letter, but can't convert. */
if (c >= 0xaa && c <= 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
return FALSE;
}
#endif
static UChar*
utf16be_left_adjust_char_head(const UChar* start, const UChar* s)
{

View File

@ -2,7 +2,7 @@
utf16_le.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -158,14 +158,13 @@ utf16le_is_mbc_newline(const UChar* p, const UChar* end)
}
static OnigCodePoint
utf16le_mbc_to_code(const UChar* p, const UChar* end)
utf16le_mbc_to_code(const UChar* p, const UChar* end ARG_UNUSED)
{
OnigCodePoint code;
UChar c0 = *p;
UChar c1 = *(p+1);
if (UTF16_IS_SURROGATE_FIRST(c1)) {
if (end - p < 4) return 0;
code = ((((c1 - 0xd8) << 2) + ((c0 & 0xc0) >> 6) + 1) << 16)
+ ((((c0 & 0x3f) << 2) + (p[3] - 0xdc)) << 8)
+ p[2];
@ -228,39 +227,6 @@ utf16le_mbc_case_fold(OnigCaseFoldType flag,
fold);
}
#if 0
static int
utf16le_is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp,
const UChar* end)
{
const UChar* p = *pp;
(*pp) += EncLen_UTF16[*(p+1)];
if (*(p+1) == 0) {
int c, v;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
return TRUE;
}
c = *p;
v = ONIGENC_IS_UNICODE_ISO_8859_1_BIT_CTYPE(c,
(BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xaa, 0xb5, 0xba are lower case letter, but can't convert. */
if (c >= 0xaa && c <= 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
return FALSE;
}
#endif
static UChar*
utf16le_left_adjust_char_head(const UChar* start, const UChar* s)
{

View File

@ -2,7 +2,7 @@
utf32_be.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -67,7 +67,6 @@ utf32be_is_mbc_newline(const UChar* p, const UChar* end)
static OnigCodePoint
utf32be_mbc_to_code(const UChar* p, const UChar* end ARG_UNUSED)
{
if (end - p < 4) return 0;
return (OnigCodePoint )(((p[0] * 256 + p[1]) * 256 + p[2]) * 256 + p[3]);
}
@ -120,39 +119,6 @@ utf32be_mbc_case_fold(OnigCaseFoldType flag,
fold);
}
#if 0
static int
utf32be_is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
const UChar* p = *pp;
(*pp) += 4;
if (*(p+2) == 0 && *(p+1) == 0 && *p == 0) {
int c, v;
p += 3;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
return TRUE;
}
c = *p;
v = ONIGENC_IS_UNICODE_ISO_8859_1_BIT_CTYPE(c,
(BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xaa, 0xb5, 0xba are lower case letter, but can't convert. */
if (c >= 0xaa && c <= 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
return FALSE;
}
#endif
static UChar*
utf32be_left_adjust_char_head(const UChar* start, const UChar* s)
{

View File

@ -2,7 +2,7 @@
utf32_le.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2018 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -67,7 +67,6 @@ utf32le_is_mbc_newline(const UChar* p, const UChar* end)
static OnigCodePoint
utf32le_mbc_to_code(const UChar* p, const UChar* end ARG_UNUSED)
{
if (end - p < 4) return 0;
return (OnigCodePoint )(((p[3] * 256 + p[2]) * 256 + p[1]) * 256 + p[0]);
}
@ -121,38 +120,6 @@ utf32le_mbc_case_fold(OnigCaseFoldType flag,
fold);
}
#if 0
static int
utf32le_is_mbc_ambiguous(OnigCaseFoldType flag, const UChar** pp, const UChar* end)
{
const UChar* p = *pp;
(*pp) += 4;
if (*(p+1) == 0 && *(p+2) == 0 && *(p+3) == 0) {
int c, v;
if (*p == 0xdf && (flag & INTERNAL_ONIGENC_CASE_FOLD_MULTI_CHAR) != 0) {
return TRUE;
}
c = *p;
v = ONIGENC_IS_UNICODE_ISO_8859_1_BIT_CTYPE(c,
(BIT_CTYPE_UPPER | BIT_CTYPE_LOWER));
if ((v | BIT_CTYPE_LOWER) != 0) {
/* 0xaa, 0xb5, 0xba are lower case letter, but can't convert. */
if (c >= 0xaa && c <= 0xba)
return FALSE;
else
return TRUE;
}
return (v != 0 ? TRUE : FALSE);
}
return FALSE;
}
#endif
static UChar*
utf32le_left_adjust_char_head(const UChar* start, const UChar* s)
{

View File

@ -2,7 +2,7 @@
utf8.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako <sndgk393 AT ybb DOT ne DOT jp>
* Copyright (c) 2002-2019 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -97,33 +97,6 @@ is_valid_mbc_string(const UChar* p, const UChar* end)
return TRUE;
}
#if 0
static int
is_mbc_newline(const UChar* p, const UChar* end)
{
if (p < end) {
if (*p == 0x0a) return 1;
#ifdef USE_UNICODE_ALL_LINE_TERMINATORS
#ifndef USE_CRNL_AS_LINE_TERMINATOR
if (*p == 0x0d) return 1;
#endif
if (p + 1 < end) {
if (*(p+1) == 0x85 && *p == 0xc2) /* U+0085 */
return 1;
if (p + 2 < end) {
if ((*(p+2) == 0xa8 || *(p+2) == 0xa9)
&& *(p+1) == 0x80 && *p == 0xe2) /* U+2028, U+2029 */
return 1;
}
}
#endif
}
return 0;
}
#endif
static OnigCodePoint
mbc_to_code(const UChar* p, const UChar* end)
{

View File

@ -22,10 +22,10 @@ array(7) {
string(6) "中国"
[3]=>
string(3) ""
["punct"]=>
string(3) ""
["wsp"]=>
string(2) " "
["word"]=>
string(6) "中国"
["punct"]=>
string(3) ""
}