Commit Graph

5 Commits

Author SHA1 Message Date
Mike Ryan
d1580a62ed tools: add more entities and better error handling to parse_companies
Add the remaining lowercase acute accented vowel HTML entities to
parse_companies.pl. On unknown entity, print an error to STDERR so the
maintainer can more clearly understand the failure.
2016-09-15 22:05:15 +02:00
Mike Ryan
1b94ec589e tools: escape double quotes in company names
Double quotes in company names are properly escaped so that they are
valid C string literals.
2016-09-14 14:51:05 +02:00
Mike Ryan
8a459053d6 tools: add more entities to company ID parser
This patch adds lower and uppercase u with umlaut.
2016-05-17 18:50:40 +02:00
Mike Ryan
2e45ec6319 tools: make parse_companies.pl more forgiving of weird HTML
Several company identifier lines do not end in a </td> but rather <br/>
followed by newline followed by </td>. This dirty hack is more forgiving
of HTML weirdnesses in the SIGs company identifiers page.
2016-03-20 04:17:15 +01:00
Mike Ryan
59dd6dc1b9 tools: fix update_compids to parse newly formatted page from SIG
This patch adds tools/parse_companies.pl, a twisted Perl script that
parses the SIG's HTML page in poor taste using regex. Improvements also
include support for non-ASCII entities such as &eacute; as well as full
unicode support for Chinese names.
2015-12-27 22:55:41 +01:00