cpython/Modules/cjkcodecs
Christian Heimes 77c02ebf38 Merged revisions 60481,60485,60489-60492,60494-60496,60498-60499,60501-60503,60505-60506,60508-60509,60523-60524,60532,60543,60545,60547-60548,60552,60554,60556-60559,60561-60562,60569,60571-60572,60574,60576-60583,60585-60586,60589,60591,60594-60595,60597-60598,60600-60601,60606-60612,60615,60617-60678 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r60618 | walter.doerwald | 2008-02-06 15:31:55 +0100 (Wed, 06 Feb 2008) | 6 lines

  Remove month parameter from Calendar.yeardatescalendar(),
  Calendar.yeardays2calendar() and Calendar.yeardayscalendar() as the methods
  don't have such a parameter. Fixes issue #2017.

  Rewrap content to 80 chars.
........
  r60622 | facundo.batista | 2008-02-06 20:28:49 +0100 (Wed, 06 Feb 2008) | 4 lines


  Fixes issue 1959. Converted tests to unittest.
  Thanks Giampaolo Rodola.
........
  r60626 | thomas.heller | 2008-02-06 21:29:17 +0100 (Wed, 06 Feb 2008) | 3 lines

  Fixed refcounts and error handling.

  Should not be merged to py3k branch.
........
  r60630 | mark.dickinson | 2008-02-06 23:10:50 +0100 (Wed, 06 Feb 2008) | 4 lines

  Issue 1979: Make Decimal comparisons (other than !=, ==) involving NaN
  raise InvalidOperation (and return False if InvalidOperation is trapped).
........
  r60632 | mark.dickinson | 2008-02-06 23:25:16 +0100 (Wed, 06 Feb 2008) | 2 lines

  Remove incorrect usage of :const: in documentation.
........
  r60634 | georg.brandl | 2008-02-07 00:45:51 +0100 (Thu, 07 Feb 2008) | 2 lines

  Revert accidental changes to test_queue in r60605.
........
  r60636 | raymond.hettinger | 2008-02-07 01:54:20 +0100 (Thu, 07 Feb 2008) | 1 line

  Issue 2025:  Add tuple.count() and tuple.index() to follow the ABC in collections.Sequence.
........
  r60637 | mark.dickinson | 2008-02-07 02:14:23 +0100 (Thu, 07 Feb 2008) | 2 lines

  Fix broken link in decimal documentation.
........
  r60638 | mark.dickinson | 2008-02-07 02:42:06 +0100 (Thu, 07 Feb 2008) | 3 lines

  IEEE 754 should be IEEE 854;  give precise reference for
  comparisons involving NaNs.
........
  r60639 | raymond.hettinger | 2008-02-07 03:12:52 +0100 (Thu, 07 Feb 2008) | 1 line

  Return ints instead of longs for tuple.count() and tuple.index().
........
  r60640 | raymond.hettinger | 2008-02-07 04:10:33 +0100 (Thu, 07 Feb 2008) | 1 line

  Merge 60627.
........
  r60641 | raymond.hettinger | 2008-02-07 04:25:46 +0100 (Thu, 07 Feb 2008) | 1 line

  Merge r60628, r60631, and r60633.  Register UserList and UserString will the appropriate ABCs.
........
  r60642 | brett.cannon | 2008-02-07 08:47:31 +0100 (Thu, 07 Feb 2008) | 3 lines

  Cast a struct to a void pointer so as to do a type-safe pointer comparison
  (mistmatch found by clang).
........
  r60643 | brett.cannon | 2008-02-07 09:04:07 +0100 (Thu, 07 Feb 2008) | 2 lines

  Remove unnecessary curly braces around an int literal.
........
  r60644 | andrew.kuchling | 2008-02-07 12:43:47 +0100 (Thu, 07 Feb 2008) | 1 line

  Update URL
........
  r60645 | facundo.batista | 2008-02-07 17:16:29 +0100 (Thu, 07 Feb 2008) | 4 lines


  Fixes issue 2026.  Tests converted to unittest.  Thanks
  Giampaolo Rodola.
........
  r60646 | christian.heimes | 2008-02-07 18:15:30 +0100 (Thu, 07 Feb 2008) | 1 line

  Added some statistics code to dict and list object code. I wanted to test how a larger freelist affects the reusage of freed objects. Contrary to my gut feelings 80 objects is more than fine for small apps. I haven't profiled a large app yet.
........
  r60648 | facundo.batista | 2008-02-07 20:06:52 +0100 (Thu, 07 Feb 2008) | 6 lines


  Fixes Issue 1401. When redirected, a possible POST get converted
  to GET, so it loses its payload. So, it also must lose the
  headers related to the payload (if it has no content any more,
  it shouldn't indicate content length and type).
........
  r60649 | walter.doerwald | 2008-02-07 20:30:22 +0100 (Thu, 07 Feb 2008) | 3 lines

  Clarify that the output of TextCalendar.formatmonth() and
  TextCalendar.formatyear() for custom instances won't be influenced by calls
  to the module global setfirstweekday() function. Fixes #2018.
........
  r60651 | walter.doerwald | 2008-02-07 20:48:34 +0100 (Thu, 07 Feb 2008) | 3 lines

  Fix documentation for Calendar.iterweekdays(): firstweekday is a property.
  Fixes second part of #2018.
........
  r60653 | walter.doerwald | 2008-02-07 20:57:32 +0100 (Thu, 07 Feb 2008) | 2 lines

  Fix typo in docstring for Calendar.itermonthdays().
........
  r60655 | raymond.hettinger | 2008-02-07 21:04:37 +0100 (Thu, 07 Feb 2008) | 1 line

  The float conversion recipe is simpler in Py2.6
........
  r60657 | raymond.hettinger | 2008-02-07 21:10:49 +0100 (Thu, 07 Feb 2008) | 1 line

  Fix typo
........
  r60660 | brett.cannon | 2008-02-07 23:27:10 +0100 (Thu, 07 Feb 2008) | 3 lines

  Make sure a switch statement does not have repetitive case statements.
  Error found through LLVM post-2.1 svn.
........
  r60661 | christian.heimes | 2008-02-08 01:11:31 +0100 (Fri, 08 Feb 2008) | 1 line

  Deallocate content of the dict free list on interpreter shutdown
........
  r60662 | christian.heimes | 2008-02-08 01:14:34 +0100 (Fri, 08 Feb 2008) | 1 line

  Use prefix decrement
........
  r60663 | amaury.forgeotdarc | 2008-02-08 01:56:02 +0100 (Fri, 08 Feb 2008) | 5 lines

  issue 2045: Infinite recursion when printing a subclass of defaultdict,
  if default_factory is set to a bound method.

  Will backport.
........
  r60667 | jeffrey.yasskin | 2008-02-08 07:45:40 +0100 (Fri, 08 Feb 2008) | 2 lines

  Oops! 2.6's Rational.__ne__ didn't work.
........
  r60671 | hyeshik.chang | 2008-02-08 18:10:20 +0100 (Fri, 08 Feb 2008) | 2 lines

  Update big5hkscs codec to conform to the HKSCS:2004 revision.
........
  r60673 | raymond.hettinger | 2008-02-08 23:30:04 +0100 (Fri, 08 Feb 2008) | 4 lines

  Remove unnecessary modulo division.
  The preceding test guarantees that 0 <= i < len.
........
  r60674 | raymond.hettinger | 2008-02-09 00:02:27 +0100 (Sat, 09 Feb 2008) | 1 line

  Speed-up __iter__() mixin method.
........
  r60675 | raymond.hettinger | 2008-02-09 00:34:21 +0100 (Sat, 09 Feb 2008) | 1 line

  Fill-in missing Set comparisons
........
  r60677 | raymond.hettinger | 2008-02-09 00:57:06 +0100 (Sat, 09 Feb 2008) | 1 line

  Add advice on choosing between DictMixin and MutableMapping
........
2008-02-09 02:18:51 +00:00
..
_codecs_cn.c Merged revisions 56753-56781 via svnmerge from 2007-08-06 23:33:07 +00:00
_codecs_hk.c Merged revisions 60481,60485,60489-60492,60494-60496,60498-60499,60501-60503,60505-60506,60508-60509,60523-60524,60532,60543,60545,60547-60548,60552,60554,60556-60559,60561-60562,60569,60571-60572,60574,60576-60583,60585-60586,60589,60591,60594-60595,60597-60598,60600-60601,60606-60612,60615,60617-60678 via svnmerge from 2008-02-09 02:18:51 +00:00
_codecs_iso2022.c Merged revisions 59488-59511 via svnmerge from 2007-12-15 01:27:15 +00:00
_codecs_jp.c Merge the rest of the trunk. 2006-06-08 15:35:45 +00:00
_codecs_kr.c Merged revisions 57152-57220 via svnmerge from 2007-08-20 19:06:03 +00:00
_codecs_tw.c - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
alg_jisx0201.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
cjkcodecs.h Merged revisions 59666-59679 via svnmerge from 2008-01-03 23:01:04 +00:00
emu_jisx0213_2000.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_cn.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_hk.h Merged revisions 60481,60485,60489-60492,60494-60496,60498-60499,60501-60503,60505-60506,60508-60509,60523-60524,60532,60543,60545,60547-60548,60552,60554,60556-60559,60561-60562,60569,60571-60572,60574,60576-60583,60585-60586,60589,60591,60594-60595,60597-60598,60600-60601,60606-60612,60615,60617-60678 via svnmerge from 2008-02-09 02:18:51 +00:00
mappings_jisx0213_pair.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_jp.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_kr.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_tw.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
multibytecodec.c #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and Py_REFCNT. 2007-12-19 02:45:37 +00:00
multibytecodec.h Merge p3yk branch with the trunk up to revision 45595. This breaks a fair 2006-04-21 10:40:58 +00:00
README - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00

To generate or modify mapping headers
-------------------------------------
Mapping headers are imported from CJKCodecs as pre-generated form.
If you need to tweak or add something on it, please look at tools/
subdirectory of CJKCodecs' distribution.



Notes on implmentation characteristics of each codecs
-----------------------------------------------------

1) Big5 codec

  The big5 codec maps the following characters as cp950 does rather
  than conforming Unicode.org's that maps to 0xFFFD.

    BIG5        Unicode     Description

    0xA15A      0x2574      SPACING UNDERSCORE
    0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
    0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
    0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
    0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
    0xA2CC      0x5341      HANGZHOU NUMERAL TEN
    0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY

  Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
  big5 codes already, a roundtrip compatibility is not guaranteed for
  them.


2) cp932 codec

  To conform to Windows's real mapping, cp932 codec maps the following
  codepoints in addition of the official cp932 mapping.

    CP932     Unicode     Description

    0x80      0x80        UNDEFINED
    0xA0      0xF8F0      UNDEFINED
    0xFD      0xF8F1      UNDEFINED
    0xFE      0xF8F2      UNDEFINED
    0xFF      0xF8F3      UNDEFINED


3) euc-jisx0213 codec

  The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
  unicode U+FF3C instead of U+005C as on unicode.org's mapping.
  Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
  is shown as a full width character, mapping to U+FF3C can make
  more sense.

  The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
  codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
  overlapped by each other, it doesn't bother standard conformations
  (and JIS X 0213 Plane 2 is intended to use so.) On encoding
  sessions, the codec will try to encode kanji characters in this
  order:

    JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212


4) euc-jp codec

  The euc-jp codec is a compatibility instance on these points:
   - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
   - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
   - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)


5) shift-jis codec

  The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
  instead of using JIS X 0201 for compatibility. The differences are:
   - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
   - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
   - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.