php-src

mirror of https://github.com/php/php-src.git synced 2024-11-27 11:53:33 +08:00

Author	SHA1	Message	Date
Máté Kocsis	d7383ed807	Declare ext/tidy constants in stubs (#9383 )	2022-08-20 17:08:28 +02:00
Máté Kocsis	e6e26b444d	Declare ext/curl constants in stubs (#9384 )	2022-08-20 11:01:40 +02:00
Christoph M. Becker	742b4bac2c	Merge branch 'PHP-8.1' * PHP-8.1: Fix #79451: DOMDocument->replaceChild on doctype causes double free	2022-08-19 18:14:48 +02:00
Christoph M. Becker	9bd9e9a867	Merge branch 'PHP-8.0' into PHP-8.1 * PHP-8.0: Fix #79451: DOMDocument->replaceChild on doctype causes double free	2022-08-19 18:13:48 +02:00
NathanFreeman	6027d441c1	Fix #79451 : DOMDocument->replaceChild on doctype causes double free We have to reset intSubset if replacing doctype with another doctype node. Closes GH-9201. Closes GH-9376.	2022-08-19 18:10:06 +02:00
David Carlier	5a9411d086	Merge branch 'PHP-8.1'	2022-08-19 16:41:06 +01:00
David Carlier	9360cd6add	Merge branch 'PHP-8.0' into PHP-8.1	2022-08-19 16:40:54 +01:00
David Carlier	52e312afb8	opcache jit fix message format for OpenBSD. like macOs it requires `ll`. Closes #9380.	2022-08-19 16:40:29 +01:00
George Peter Banyard	d766e91681	Merge branch 'PHP-8.1'	2022-08-19 13:57:59 +01:00
George Peter Banyard	eb8ea14c66	Merge branch 'PHP-8.0' into PHP-8.1	2022-08-19 13:57:19 +01:00
George Peter Banyard	d6831e9a5c	Revert Fixed bug #79451 The fix for 8.1 and above is not identical and I don't know how to fix without breaking the whole build apparently	2022-08-19 13:54:54 +01:00
Christoph M. Becker	a1f5c8a587	Fix GH-9227: Trailing dots and spaces in filenames are ignored Given that Windows ignores trailing dots and spaces in filenames, we catch that ourselves to avoid confusion with the respective filenames without these characters. Closes GH-9229.	2022-08-19 14:23:57 +02:00
George Peter Banyard	1109989bbd	Merge branch 'PHP-8.1'	2022-08-19 13:18:12 +01:00
George Peter Banyard	5739dd0030	Fix bad merge	2022-08-19 13:17:57 +01:00
George Peter Banyard	6a7935351b	Merge branch 'PHP-8.1'	2022-08-19 12:55:12 +01:00
George Peter Banyard	c36a1ea1ae	Merge branch 'PHP-8.0' into PHP-8.1	2022-08-19 12:52:58 +01:00
Tim Starling	ba029fce68	Fix GH-9323: crash when the VM enters userspace code via the GC Closes GH-9323	2022-08-19 12:50:02 +01:00
Tim Starling	410e5d48a3	Fix GCC 9.4 uninitialized variable warning ext/opcache/zend_accelerator_blacklist.c:295:4: error: ‘blacklist_path_length’ may be used uninitialized in this function [-Werror=maybe-uninitialized]	2022-08-19 12:46:55 +01:00
NathanFreeman	1d4300d870	Fix bug #79451 : Using DOMDocument->replaceChild on doctype causes double free Closes GH-9201	2022-08-19 12:46:23 +01:00
Remi Collet	aa702c5459	add compat stuff for function attributes	2022-08-18 13:46:55 +02:00
Christoph M. Becker	45a3f4cab0	Merge branch 'PHP-8.1' * PHP-8.1: Fix GH-9316: $http_response_header is wrong for long status line	2022-08-18 12:31:56 +02:00
Christoph M. Becker	5d196d9e7c	Merge branch 'PHP-8.0' into PHP-8.1 * PHP-8.0: Fix GH-9316: $http_response_header is wrong for long status line	2022-08-18 12:30:45 +02:00
Christoph M. Becker	72da418719	Fix GH-9316: $http_response_header is wrong for long status line While the reason-phrase in a HTTP response status line is usually short, there is no actual limit specified by the RFCs. As such, we must not assume that the line fits into the buffer (which is currently 128 bytes large). Since there is no real need to present the complete status line, we simply read and discard the rest of a long line. Co-authored-by: Tim Düsterhus <timwolla@googlemail.com> Closes GH-9319.	2022-08-18 12:27:54 +02:00
Jakub Zelenka	cb5d5d885c	Merge branch 'PHP-8.1'	2022-08-17 19:50:58 +01:00
Jakub Zelenka	93bed982e8	Merge branch 'PHP-8.0' into PHP-8.1	2022-08-17 19:50:16 +01:00
Jakub Zelenka	84dcf578b1	Fix GH-9339: OpenSSL oid_file path check warning contains uninitialized path	2022-08-17 19:49:36 +01:00
Christoph M. Becker	25de2c6f89	Merge branch 'PHP-8.1' * PHP-8.1: Prepare for 8.0.24	2022-08-17 12:55:46 +02:00
Christoph M. Becker	bf84ea0f48	Merge branch 'PHP-8.0' into PHP-8.1 * PHP-8.0: Prepare for 8.0.24	2022-08-17 12:54:50 +02:00
Gabriel Caruso	7c6316ad1c	Prepare for 8.0.24	2022-08-17 11:56:42 +02:00
Alex Dowad	5f8993bc28	Merge branch 'PHP-8.1' * PHP-8.1: Reintroduce legacy 'SJIS-win' text encoding in mbstring	2022-08-16 20:47:04 +02:00
Alex Dowad	371367ce3e	Reintroduce legacy 'SJIS-win' text encoding in mbstring In `e2459857af`, I combined mbstring's "SJIS-win" text encoding into CP932. This was done after doing some testing which appeared to show that the mappings for "SJIS-win" were the same as those for "CP932". Later, it was found that there was actually a small difference prior to `e2459857af` when converting Unicode to CP932. The mappings for the following two codepoints were different: CP932 SJIS-win U+203E 0x7E 0x81 0x50 U+00A5 0x5C 0x81 0x8F As shown, mbstring's "CP932" mapped Unicode's 'OVERLINE' and 'YEN SIGN' to the ASCII bytes which have conflicting uses in most legacy Japanese text encodings. "SJIS-win" mapped these to equivalent JIS X 0208 fullwidth characters. Since e2459867af was not intended to cause any user-visible change in behavior, I am rolling back the merge of "CP932" and "SJIS-win". It seems doubtful whether these two text encodings should be kept separate or merged in a future release. An extensive discussion of the related historical background and compatibility issues involved can be found in this GitHub thread: https://github.com/php/php-src/issues/8308	2022-08-16 20:18:54 +02:00
Ben Ramsey	c8f6c7def8	Merge branch 'PHP-8.1'	2022-08-16 10:49:08 -05:00
Ben Ramsey	7f26661993	PHP-8.1 is now for PHP 8.1.11-dev	2022-08-16 10:45:29 -05:00
Pierrick Charron	a0455fe716	[ci skip] Update NEWS for PHP 8.2.0 RC1	2022-08-16 11:39:53 -04:00
Alex Dowad	93207535fa	Add test to exercise _php_mb_encoding_handler_ex with multiple possible input encodings Thanks to Kamil Tekiela for pointing out that there was no test case for this.	2022-08-16 16:43:38 +02:00
Alex Dowad	d617fcaae2	Fix legacy text conversion filter for 'HTML-ENTITIES' Because this routine used a signed char buffer to hold the bytes in a (possible) HTML entity, any bytes with the MSB set would be sign-extended when converting to int; for example, 0x86 would become 0xFFFFFF86 (or -121). Codepoints with huge values, like 0xFFFFFF86, are not valid and if any were passed to the output filter, it would treat them as errors and emit error markers.	2022-08-16 16:43:27 +02:00
Alex Dowad	d9269becca	Fix problems with ISO-2022-KR conversion • The legacy conversion code did not emit an error marker if an escape sequence was truncated. • BOTH old and new conversion code would shift from KSC5601 (KS X 1001) mode to ASCII mode on an invalid escape sequence. This doesn't make any sense.	2022-08-16 16:43:27 +02:00
Alex Dowad	bfccdbd858	SJIS-Mobile#SOFTBANK string can end immediately after special escape sequence SJIS-Mobile#SOFTBANK text encoding supports special escape sequences, which shift the decoder into a mode where each single byte represents an emoji. To get out of this mode, a 0xF (SHIFT OUT) byte can be used. After one of these special escape sequences, the new conversion code expected to see at least one more byte. However, there doesn't seem to be any particular reason why it should be treated as an error condition if a string ends abruptly after one of these escapes. Well, the escape sequence is useless in that case, but it is a complete and valid escape sequence. The legacy conversion code did allow a string to end immediately after one of these escape sequences. Amend the new code to allow the same.	2022-08-16 16:43:27 +02:00
Alex Dowad	983a29d3c0	Legacy conversion code for '7bit' to '8bit' inserts error markers The use of a special 'vtbl' for converting between '7bit' and '8bit' text meant that '7bit' text would not be converted to wchars before going to '8bit'. This meant that the special value MBFL_BAD_INPUT, which we use to flag an erroneous byte sequence in input text (and which is required by functions like mb_check_encoding), would pass directly to the output, instead of being converted to the error marker specified by mb_substitute_character. This issue dates back to the time when I removed the mbfl 'identify filters' and made encoding validity checking and encoding detection rely only on the conversion filters.	2022-08-16 16:43:27 +02:00
Alex Dowad	f3c8efd711	In legacy text conversion filters, reset filter state in 'flush' function Up until now, I believed that mbstring had been designed such that (legacy) text conversion filter objects should not be re-used after the 'flush' function is called to complete a text conversion operation. However, it turns out that the implementation of _php_mb_encoding_handler_ex DID re-use filter objects after flush. That means that functions which were based on _php_mb_encoding_handler_ex, including mb_parse_str and php_mb_post_handler, would break in some cases; state left over from converting one substring (perhaps a variable name) would affect the results of converting another substring (perhaps the value of the same variable), and could cause extraneous characters to get inserted into the output. All this code should be deleted soon, but fixing it helps me to avoid spurious failures when fuzzing the new/old code to look for differences in behavior.	2022-08-16 16:43:27 +02:00
Alex Dowad	18e526cb51	Fix legacy text conversion filter for SJIS-2004 EUC-JP-2004 includes special byte sequences starting with 0x8E for kana. The legacy output routine for EUC-JP-2004 emits these sequences if the value of the output variable `s` is between 0x80 and 0xFF. Since the same routine was also used for SJIS-2004 and ISO-2022-JP-2004, before `8a915ed26c`, the same 0x8E sequences would be emitted when converting to those text encodings as well. But that is completely wrong. 0x8E 0x__ does not mean the same in SJIS-2004 or ISO-2022-JP-2004 as it does in EUC-JP-2004. Therefore, in `8a915ed26c`, I fixed the legacy conversion routine by checking whether the output encoding is EUC-JP-2004 or not. If it's not, and `s` is 0x80-0xFF, I made it emit an error. Well, it turns out that single bytes with values from 0xA1 to 0xDF are meaningful in SJIS-2004. To emit these bytes when appropriate, I had to amend the legacy conversion routine again. (For clarity, this does NOT mean reverting to the behavior prior to `8a915ed26c`. We were right not to emit sequences starting with 0x8E in SJIS-2004. But in SJIS-2004, we do sometimes need to emit single bytes from 0xA1-0xDF.)	2022-08-16 16:43:27 +02:00
Alex Dowad	3517a70f93	Fix legacy text conversion filter for CP50220 CP50220 converts some codepoints which represent kana (hiragana/katakana) to a different form. This is the only difference between CP50220 and CP50221 (which doesn't perform such conversion). In some cases, this conversion means collapsing two codepoints to a single output byte sequence. Since the legacy text conversion filters only worked a byte at a time, the legacy filter had to cache a byte, then wait until it was called again with the next byte to compare the cached byte with the following one. That was all fine, but it didn't work as intended when there were errors (invalid byte sequences) in the input. Our code (both old and new) for emitting error markers recursively calls the same conversion filter. When the old CP50220 filter was called recursively, the logic for managing cached bytes did not behave as intended. As a result, the error markers could be reordered with other characters in the output. I used an ugly hack to fix this in 6938e3512; when making a recursive call to emit an error marker, temporarily swap out `filter->filter_function` to bypass the byte-caching code, so the error marker immediately goes through to the output. This worked, but I overlooked the fact that the very same problem can occur if an invalid byte sequence is detected in the flush function. Apply the same (ugly) fix.	2022-08-16 16:43:27 +02:00
Alex Dowad	4b370330d4	Ensure that Base64 output always wraps lines in the same manner as legacy implementation The legacy Base64 conversion code in mbstring automatically wrapped the output to 72 columns, and the new code imitates this behavior. Frankly, I'm not sure if this is a good idea or not (people could easily manually wrap it if they want to), but have stuck with this behavior for backwards compatibility. However, fuzzing revealed one case where we were not wrapping to 72 columns; if the input string is not a multiple of 3 characters, meaning that the output must be padded, and the point where we must add the final (padded) output happens to be just beyond 72 columns.	2022-08-16 16:43:27 +02:00
Alex Dowad	c6bd08530e	Adjust number of error markers emitted for truncated ISO-2022-JP escape sequence Fuzzing revealed a small difference between the number of error markers which the legacy ISO-2022-JP and JIS7/8 conversion code emitted for truncated escape sequences and those emitted by the new code. The behavior of the old code seems more reasonable here, so we will imitate it.	2022-08-16 16:43:27 +02:00
Alex Dowad	128768a450	Adjust number of error markers emitted for truncated UTF-8 code units In `04e59c916f`, I amended the UTF-8 conversion code, so that when given invalid input, it would emit a number of errors markers harmonizing with the WHATWG's specification of the standard UTF-8 decoding algorithm. (Which, gentle reader of commit logs, you can find online at https://encoding.spec.whatwg.org/#utf-8-decoder.) However, the code in `04e59c916f` was faulty in the case that a truncated UTF-8 code unit starts with 0xF1. Then, in `dc1ba61d09`, when making a small refactoring to a different part of the UTF-8 conversion code, I inexplicably broke part of the working code, causing the same fault which was already present with truncated UTF-8 code units starting with 0xF1 to also occur with 0xF2 and 0xF3 as well. I don't remember what inane thoughts I was thinking when I pulled off this feat of utter mental confusion. None of these cases were covered by unit tests, by the way. Thankfully, my trusty fuzzer picked up on this when testing the new implementation of mb_parse_str (since the legacy UTF-8 conversion filter did not suffer from the same problem, and I was fuzzing to find any differences in behavior between the old and new implementations). Fortuitously, the fuzzer also picked up another issue which was present in `04e59c916f`. I was emitting only one error marker for truncated code units starting with 0xE0 or 0xED, in cases where the WHATWG standard indicates two should be emitted. Examples are 0xE0 0x9F <END OF STRING> or 0xED 0xA0 <END OF STRING>. Code units starting with 0xE0-0xED should have 3 bytes. If the first byte is 0xE0, the second MUST be 0xA0 or greater. (Otherwise, the codepoint could have fit in a two-byte code unit.) And if the first byte is 0xED, the second MUST be 0x9F or less. According to the WHATWG algorithm, step 4, if the second byte is outside the legal range, then the decoder should emit an error... AND reprocess the out-of-range byte. The reprocessing will then cause another error. That's why the decoder should indicate two errors and not one.	2022-08-16 16:43:27 +02:00
Alex Dowad	a4656895dd	Imitate legacy behavior when converting non-encodings using mbstring Fuzzing revealed that something was missed here when making the new encoding conversion code match the behavior of the old code. In the next major release of PHP, support for these non-encodings will be dropped, but in the meantime, it is better to match the legacy behavior.	2022-08-16 16:43:27 +02:00
Alex Dowad	88d13491de	Make control flow in mb_wchar_to_cp50220 a bit clearer	2022-08-16 16:43:26 +02:00
Alex Dowad	8df515555b	Remove unused 'to_language' and 'from_language' struct fields	2022-08-16 16:43:26 +02:00
Alex Dowad	aeccb139c3	Use new encoding conversion filters for mb_parse_str and php_mb_post_handler When micro-benchmarking on relatively short ASCII strings, the new implementation was about 30% faster than the old one.	2022-08-16 16:43:26 +02:00
Máté Kocsis	98e5c4e3a3	Declare ext/sockets constants in stubs (#9349 )	2022-08-16 13:18:31 +02:00

... 3 4 5 6 7 ...

129617 Commits