- Fixed bug #52981 (Unicode casing table was out-of-date).

Updated with UnicodeData-6.0.0d7.txt and included the
  source of the generator program with the distribution.
#The replaced tables, generated circa 2002, seem to reflect
#Unicode 3.2. I was unable to generate the same property
#offsets with Unicode 3.2 data, but all the tests I made
#indicate php_unicode_is_prop() is returning the correct
#values. The replaced file merely says it used a "modified
#version" of ucgendat, which is not very helpful. The results
#I got were not significantly different, only slightly higher
#offsets at two properties, which were carried over to the
#subsequent properties.
#I was, however, able to replicate precisely the casing table.
#The extent of the "modifications" besides omitting most of
#the tables, a slightly different layout and the casing table
#offsets having been multiplied by 3 is unclear.
#The test suite showed no regressions; however, it's very poor
#in testing the modified portion of the extension.
This commit is contained in:
Gustavo André dos Santos Lopes 2010-10-05 01:54:17 +00:00
parent f1d905a417
commit 42dae97fd4
3 changed files with 6277 additions and 2735 deletions

View File

@ -0,0 +1,23 @@
--TEST--
Bug #52981 (Unicode properties are outdated (from Unicode 3.2))
--SKIPIF--
<?php extension_loaded('mbstring') or die('skip mbstring not available'); ?>
--FILE--
<?php
function test($str)
{
$upper = mb_strtoupper($str, 'UTF-8');
$len = strlen($upper);
for ($i = 0; $i < $len; ++$i) echo dechex(ord($upper[$i])) . ' ';
echo "\n";
}
// OK
test("\xF0\x90\x90\xB8");// U+10438 DESERET SMALL LETTER H (added in 3.1.0, March 2001)
// not OK
test("\xE2\xB0\xB0"); // U+2C30 GLAGOLITIC SMALL LETTER AZU (added in 4.1.0, March 2005)
test("\xD4\xA5"); // U+0525 CYRILLIC SMALL LETTER PE WITH DESCENDER (added in 5.2.0, October 2009)
--EXPECTF--
f0 90 90 90
e2 b0 80
d4 a4

1985
ext/mbstring/ucgendat.c Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff