Commit Graph

14 Commits

Author SHA1 Message Date
Nishanth Menon
40751c6c9b scripts/spdxcheck.py: Strictly read license files in utf-8
Commit bc41a7f364 ("LICENSES: Add the CC-BY-4.0 license")
unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
While python will barf at it with:

FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 244, in <module>
    spdx = read_spdxdata(repo)
  File "scripts/spdxcheck.py", line 47, in read_spdxdata
    for l in open(el.path).readlines():
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)

While it is indeed debatable if 'Licensor.' used in the license file
needs unicode quotes, instead, force spdxcheck to read utf-8.

Reported-by: Rahul T R <r-ravikumar@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210707204840.30891-1-nm@ti.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-07-12 09:56:50 -06:00
Bhaskar Chowdhury
40635128fe scripts/spdxcheck.py: Fix a typo
s/Initilize/Initialize/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Link: https://lore.kernel.org/r/20210326091443.26525-1-unixbhaskar@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-28 14:41:49 +02:00
Bert Vermeulen
d0259c42ab spdxcheck.py: Use Python 3
Python 2.x has been officially EOL'ed for some time, and in any case
the git module for it is hard to come by.

Signed-off-by: Bert Vermeulen <bert@biot.com>
Link: https://lore.kernel.org/r/20210121085412.265400-1-bert@biot.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 14:50:12 +01:00
Lukas Bulwahn
c5c5538508 scripts/spdxcheck.py: handle license identifiers in XML comments
Commit cc9539e788 ("media: docs: use the new SPDX header for GFDL-1.1 on
*.svg files") adds SPDX-License-Identifiers enclosed in XML comments,
i.e., <!-- ... -->, for svg files.

Unfortunately, ./scripts/spdxcheck.py does not handle
SPDX-License-Identifiers in XML comments, so it simply fails on checking
these files with 'Invalid License ID: --'.

Strip the XML comment ending simply by copying how it was done for comments
in C. With that, ./scripts/spdxcheck.py handles the svg files properly.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 11:31:26 +02:00
Vincenzo Frascino
8d7a7abfc6 spdxcheck.py: fix directory structures
The LICENSE directory has recently changed structure and this makes
spdxcheck fails as per below:

FAIL: "Blob or Tree named 'other' not found"
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 240, in <module>
spdx = read_spdxdata(repo)
  File "scripts/spdxcheck.py", line 41, in read_spdxdata
for el in lictree[d].traverse():
[...]
KeyError: "Blob or Tree named 'other' not found"

Fix the script to restore the correctness on checkpatch License checking.

References: 62be257e98 ("LICENSES: Rename other to deprecated")
References: 8ea8814fcd ("LICENSES: Clearly mark dual license only licenses")
Link: http://lkml.kernel.org/r/20190523084755.56739-1-vincenzo.frascino@arm.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-01 15:51:31 -07:00
Sven Eckelmann
29077bc5b7 scripts/spdxcheck.py: Add dual license subdirectory
The licenses from the other directory were partially moved to the dual
directory in commit 8ea8814fcd ("LICENSES: Clearly mark dual license only
licenses"). checkpatch therefore rejected files like
drivers/staging/android/ashmem.h with

  WARNING: 'SPDX-License-Identifier: GPL-2.0 OR Apache-2.0 */' is not supported in LICENSES/...
  #1: FILE: drivers/staging/android/ashmem.h:1:
  +/* SPDX-License-Identifier: GPL-2.0 OR Apache-2.0 */

Fixes: 8ea8814fcd ("LICENSES: Clearly mark dual license only licenses")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-21 09:29:41 -06:00
Sven Eckelmann
e6d319f68d scripts/spdxcheck.py: Fix path to deprecated licenses
The directory name for other licenses was changed to "deprecated" in
commit 62be257e98 ("LICENSES: Rename other to deprecated"). But it was
not changed for spdxcheck.py. As result, checkpatch failed with

  FAIL: "Blob or Tree named 'other' not found"
  Traceback (most recent call last):
    File "scripts/spdxcheck.py", line 240, in <module>
      spdx = read_spdxdata(repo)
    File "scripts/spdxcheck.py", line 41, in read_spdxdata
      for el in lictree[d].traverse():
    File "/usr/lib/python2.7/dist-packages/git/objects/tree.py", line 298, in __getitem__
      return self.join(item)
    File "/usr/lib/python2.7/dist-packages/git/objects/tree.py", line 244, in join
      raise KeyError(msg % file)
  KeyError: "Blob or Tree named 'other' not found"

Fixes: 62be257e98 ("LICENSES: Rename other to deprecated")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-05-20 13:25:21 -06:00
Aurélien Cedeyn
a5f4cb4288 scripts/spdxcheck.py: fix C++ comment style detection
With the last commit to support the SuperH boot code files, we have the
following regression:

$ ./scripts/checkpatch.pl -f <(echo '/* SPDX-License-Identifier: MIT */')
WARNING: 'SPDX-License-Identifier: MIT */' is not supported in LICENSES/..
+/* SPDX-License-Identifier: MIT */

total: 0 errors, 1 warnings, 1 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

/dev/fd/63 has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

This is not obvious, but spdxcheck.py is launched in checkpatch.pl with :
    ...
    } elsif ($rawline =~ /(SPDX-License-Identifier: .*)/) {
        my $spdx_license = $1;
        if (!is_SPDX_License_valid($spdx_license)) {
            WARN("SPDX_LICENSE_TAG",
                 "'$spdx_license' is not supported in LICENSES/...\n" . \
                 $herecurr);
        }
    ...
    sub is_SPDX_License_valid {
        my ($license) = @_;
        ...
        my $status = `cd "$root_path"; echo "$license" |
                      python scripts/spdxcheck.py -`;
        ...
    }

The first chars before 'SPDX-License-Identifier:' are ignored.
This commit fixes this regression.

Fixes:959b49687838 (scripts/spdxcheck.py: Handle special quotation mark comments)
Signed-off-by:Aurélien Cedeyn <aurelien.cedeyn@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-02-22 08:47:05 -07:00
Thomas Gleixner
959b496878 scripts/spdxcheck.py: Handle special quotation mark comments
The SuperH boot code files use a magic format for the SPDX identifier
comment:

  LIST "SPDX-License-Identifier: .... "

The trailing quotation mark is not stripped before the token parser is
invoked and causes the scan to fail. Handle it gracefully.

Fixes: 6a0abce4c4 ("sh: include: convert to SPDX identifiers")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-01-16 14:54:51 -07:00
Thierry Reding
3a6ab5c7dc scripts/spdxcheck.py: always open files in binary mode
The spdxcheck script currently falls over when confronted with a binary
file (such as Documentation/logo.gif).  To avoid that, always open files
in binary mode and decode line-by-line, ignoring encoding errors.

One tricky case is when piping data into the script and reading it from
standard input.  By default, standard input will be opened in text mode,
so we need to reopen it in binary mode.

The breakage only happens with python3 and results in a
UnicodeDecodeError (according to Uwe).

Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com
Fixes: 6f4d29df66 ("scripts/spdxcheck.py: make python3 compliant")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jeremy Cline <jcline@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joe Perches <joe@perches.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14 15:05:45 -08:00
Uwe Kleine-König
6f4d29df66 scripts/spdxcheck.py: make python3 compliant
Without this change the following happens when using Python3 (3.6.6):

	$ echo "GPL-2.0" | python3 scripts/spdxcheck.py -
	FAIL: 'str' object has no attribute 'decode'
	Traceback (most recent call last):
	  File "scripts/spdxcheck.py", line 253, in <module>
	    parser.parse_lines(sys.stdin, args.maxlines, '-')
	  File "scripts/spdxcheck.py", line 171, in parse_lines
	    line = line.decode(locale.getpreferredencoding(False), errors='ignore')
	AttributeError: 'str' object has no attribute 'decode'

So as the line is already a string, there is no need to decode it and
the line can be dropped.

/usr/bin/python on Arch is Python 3.  So this would indeed be worth
going into 4.19.

Link: http://lkml.kernel.org/r/20181023070802.22558-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Jeremy Cline
bed95c43c1 scripts: add Python 3 compatibility to spdxcheck.py
"dict.has_key(key)" on dictionaries has been replaced with "key in
dict".  Additionally, when run under Python 3 some files don't decode
with the default encoding (tested with UTF-8).  To handle that, don't
open the file in text mode and decode text line-by-line, ignoring
encoding errors.

This remains compatible with Python 2 and should have no functional
change.

Link: http://lkml.kernel.org/r/20180717190635.29467-1-jcline@redhat.com
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:27 -07:00
Joe Perches
fde5e903fb scripts/spdxcheck.py: work with current HEAD LICENSES/ directory
Depending on how old your -next tree is, it may not have a master that
has the LICENSES directory.

Change the lookup to HEAD and find whatever LICENSE directory files are
used in that branch.

Miscellanea:

 - Remove the checkpatch test as it will have its own SPDX license
   identifier.

Link: http://lkml.kernel.org/r/7eeefc862194930c773e662cb2152e178441d3b8.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:27 -07:00
Thomas Gleixner
5385a295ec scripts: Add SPDX checker script
The SPDX-License-Identifiers are growing in the kernel and so grow
expression failures and license IDs are used which have no corresponding
license text file in the LICENSES directory.

Add a script which gathers information from the LICENSES directory,
i.e. the various tags in the licenses and exception files and then scans
either input from stdin, which it treats as a single file or if started
without arguments it scans the full kernel tree.

It checks whether the license expression syntax is correct and also
validates whether the license identifiers used in the expressions are
available in the LICENSES files.

scripts/spdxcheck.py -h
usage: spdxcheck.py [-h] [-m MAXLINES] [-v] [path [path ...]]

SPDX expression checker

positional arguments:
  path                  Check path or file. If not given full git tree scan.
                        For stdin use "-"

optional arguments:
  -h, --help            show this help message and exit
  -m MAXLINES, --maxlines MAXLINES
                        Maximum number of lines to scan in a file. Default 15
  -v, --verbose         Verbose statistics output

include/dt-bindings/reset/amlogic,meson-axg-reset.h: 9:41 Invalid License ID: BSD

drivers/pinctrl/sh-pfc/pfc-r8a77965.c: 1:28 Invalid License ID: GPL-2.
include/dt-bindings/reset/amlogic,meson-axg-reset.h: 9:41 Invalid License ID: BSD

arch/x86/kernel/jailhouse.c: 1:28 Invalid License ID: GPL2.0
include/dt-bindings/reset/amlogic,meson-axg-reset.h: 9:41 Invalid License ID: BSD

arch/arm/mach-s3c24xx/h1940-bluetooth.c: 1:28 Invalid License ID: GPL-1.0
arch/x86/kernel/jailhouse.c: 1:28 Invalid License ID: GPL2.0
drivers/pinctrl/sh-pfc/pfc-r8a77965.c: 1:28 Invalid License ID: GPL-2.
include/dt-bindings/reset/amlogic,meson-axg-reset.h: 9:41 Invalid License ID: BSD
arch/x86/include/asm/jailhouse_para.h: 1:28 Invalid License ID: GPL2.0

arch/arm/mach-s3c24xx/h1940-bluetooth.c: 1:28 Invalid License ID: GPL-1.0
arch/x86/kernel/jailhouse.c: 1:28 Invalid License ID: GPL2.0
drivers/pinctrl/sh-pfc/pfc-r8a77965.c: 1:28 Invalid License ID: GPL-2.
include/dt-bindings/reset/amlogic,meson-axg-reset.h: 9:41 Invalid License ID: BSD
arch/x86/include/asm/jailhouse_para.h: 1:28 Invalid License ID: GPL2.0

License files:               14
Exception files:              1
License IDs                  19
Exception IDs                 1

Files checked:            61332
Lines checked:           669181
Files with SPDX:          16169
Files with errors:            5

real	0m2.642s
user	0m2.231s
sys	0m0.467s

That's a full tree sweep on my laptop. Note, this runs single threaded.

It scans by default the first 15 lines for a SPDX identifier where the
current max inside a top comment is at line 10. But that's going to be
faster once the identifiers are all in the first two lines as documented.

The python wizards will surely know how to do that smarter and faster, but
its at least better than no tool at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[jc: Fixed ironically erroneous SPDX tag and did chmod +x ]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-04-27 16:45:49 -06:00