libcpp: Add -Wleading-whitespace= warning

The following patch on top of the r15-4346 patch adds
-Wleading-whitespace= warning option.
This warning doesn't care how much one actually indents which line
in the source (that is something that can't be easily done in the
preprocessor without doing syntactic analysis), but just simple checks
on what kind of whitespace is used in the indentation.
I think it is still useful to get warnings about such issues early,
while git diagnoses some of it in patches (e.g. the tab after space
case), getting the warnings earlier might help avoiding such issues
sooner.

There are projects which ban use of tabs and require just spaces,
others which require indentation just with horizontal tabs, and finally
projects which want indentation with tabs for multiples of tabstop size
followed by spaces (fewer than tabstop size), like GCC.
For all 3 kinds the warning diagnoses indentation with '\v' or '\f'
characters (unless line contains just whitespace), and for the last one
also cases where a space in the indentation is followed by horizontal
tab or where there are N or more consecutive spaces in the indentation
(for -ftabstop=N).

BTW, for additional testing I've enabled the warnings (without -Werror
for them) in stage3.  There are many warnings (both trailing and leading
whitespace), some of them something that can be easily fixed in the headers
or source files, but others with whitespace issues in generated sources,
so if we enable the warnings, either we'd need to adjust the generators
or disable the warnings in (some of the) generated files.

2024-10-23  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* include/cpplib.h (struct cpp_options): Add
	cpp_warn_leading_whitespace and cpp_tabstop members.
	(enum cpp_warning_reason): Add CPP_W_LEADING_WHITESPACE.
	* internal.h (struct _cpp_line_note): Document new
	line note kinds.
	* init.cc (cpp_create_reader): Set cpp_tabstop to 8.
	* lex.cc (find_leading_whitespace_issues): New function.
	(_cpp_clean_line): Use it.
	(_cpp_process_line_notes): Handle 'L', 'S' and 'T' line notes.
	(lex_raw_string): Clear type on 'L', 'S' and 'T' line notes
	inside of raw string literals.
gcc/
	* doc/invoke.texi (Wleading-whitespace=): Document.
gcc/c-family/
	* c.opt (Wleading-whitespace=): New option.
	* c-opts.cc (c_common_post_options): Set cpp_opts->cpp_tabstop
	to global_dc->m_tabstop.
gcc/testsuite/
	* c-c++-common/cpp/Wleading-whitespace-1.c: New test.
	* c-c++-common/cpp/Wleading-whitespace-2.c: New test.
	* c-c++-common/cpp/Wleading-whitespace-3.c: New test.
	* c-c++-common/cpp/Wleading-whitespace-4.c: New test.
This commit is contained in:
Jakub Jelinek 2024-10-23 09:58:06 +02:00 committed by Jakub Jelinek
parent ee030b2800
commit d4499a232a
11 changed files with 387 additions and 3 deletions

View File

@ -1147,6 +1147,8 @@ c_common_post_options (const char **pfilename)
flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc23;
cpp_opts->unsigned_utf8char = flag_char8_t ? 1 : cpp_opts->unsigned_char;
cpp_opts->cpp_tabstop = global_dc->m_tabstop;
if (flag_extern_tls_init)
{
if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)

View File

@ -928,6 +928,25 @@ Wjump-misses-init
C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
Warn when a jump misses a variable initialization.
Enum
Name(warn_leading_whitespace_kind) Type(int) UnknownError(argument %qs to %<-Wleading-whitespace=%> not recognized)
EnumValue
Enum(warn_leading_whitespace_kind) String(none) Value(0)
EnumValue
Enum(warn_leading_whitespace_kind) String(spaces) Value(1)
EnumValue
Enum(warn_leading_whitespace_kind) String(tabs) Value(2)
EnumValue
Enum(warn_leading_whitespace_kind) String(blanks) Value(3)
Wleading-whitespace=
C ObjC C++ ObjC++ CPP(cpp_warn_leading_whitespace) CppReason(CPP_W_LEADING_WHITESPACE) Enum(warn_leading_whitespace_kind) Joined RejectNegative Var(warn_leading_whitespace) Init(0) Warning
Warn about leading whitespace style issues on lines except when in raw string literals.
Wliteral-suffix
C++ ObjC++ CPP(warn_literal_suffix) CppReason(CPP_W_LITERAL_SUFFIX) Var(cpp_warn_literal_suffix) Init(1) Warning
Warn when a string or character literal is followed by a ud-suffix which does not begin with an underscore.

View File

@ -380,7 +380,8 @@ Objective-C and Objective-C++ Dialects}.
-Winit-self -Winline -Wno-int-conversion -Wint-in-bool-context
-Wno-int-to-pointer-cast -Wno-invalid-memory-model
-Winvalid-pch -Winvalid-utf8 -Wno-unicode -Wjump-misses-init
-Wlarger-than=@var{byte-size} -Wlogical-not-parentheses -Wlogical-op
-Wlarger-than=@var{byte-size} -Wleading-whitespace=@var{kind}
-Wlogical-not-parentheses -Wlogical-op
-Wlong-long -Wno-lto-type-mismatch -Wmain -Wmaybe-uninitialized
-Wmemset-elt-size -Wmemset-transposed-args
-Wmisleading-indentation -Wmissing-attributes -Wmissing-braces
@ -8790,6 +8791,28 @@ those or trailing form feed or vertical tab characters.
disables the warning, which is the default.
This is a coding style warning.
@opindex Wleading-whitespace=
@item -Wleading-whitespace=@var{kind}
Warn about style issues in leading whitespace, but not about the amount of
indentation. Some projects use coding styles where only spaces are used
for indentation, others use only tabs, others use zero or more tabs (for
multiples of @code{-ftabstop=@var{n}}) followed by zero or fewer than @var{n}
spaces. No warning is emitted on lines which contain solely whitespace
(although @code{-Wtrailing-whitespace=} warning might be emitted), no
warnings are emitted inside of raw string literals. Warnings are also emitted
for leading whitespace inside of multi-line comments.
@code{-Wleading-whitespace=spaces} warns about leading whitespace other than
spaces for projects which want to indent just by spaces.
@code{-Wleading-whitespace=tabs} warns about leading whitespace other than
horizontal tabs for projects which want to indent just by horizontal tabs.
@code{-Wleading-whitespace=blanks} warns about leading whitespace other than
spaces and horizontal tabs, or about horizontal tab after a space in the
leading whitespace, or about @var{n} or more consecutive spaces in leading
whitespace (where @var{n} is argument of @code{-ftabstop=@var{n}}, 8 by
default).
@code{-Wleading-whitespace=none} disables the warning, which is the default.
This is a coding style warning.
@opindex Wtrampolines
@opindex Wno-trampolines
@item -Wtrampolines

View File

@ -0,0 +1,59 @@
/* { dg-do compile { target { c || c++11 } } } */
/* { dg-options "-Wleading-whitespace=blanks" } */
int i1; /* 4 spaces ok for -ftabstop=8 */
int i2; /* 2 tabs 7 spaces ok for -ftabstop=8 */
int i3; /* 8 spaces not ok */
/* { dg-warning "too many consecutive spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i4; /* tab 8 spaces not ok */
/* { dg-warning "too many consecutive spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i5; /* 4 spaces tab not ok */
/* { dg-warning "tab after space in leading whitespace" "" { target *-*-* } .-1 } */
int i6; /* space tab not ok */
/* { dg-warning "tab after space in leading whitespace" "" { target *-*-* } .-1 } */
int i7; /* tab vtab not ok */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i8; /* tab form-feed not ok */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i9; /* 4 spaces vtab not ok */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i10; /* 2 spaces form-feed not ok */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-1 } */
/* Just whitespace on a line is something for -Wtrailing-whitespace. */
int \
i11, \
i12, \
i13, \
i14, \
i15, \
i16, \
i17;
/* { dg-warning "too many consecutive spaces in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "too many consecutive spaces in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "tab after space in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "too many consecutive spaces in leading whitespace" "" { target *-*-* } .+1 } */
const char *p = R"*|*(
a
b
c
d
e
f
g
)*|*";
/* This is a comment with leading whitespace non-issues and issues
a
b
c
d
e
f
g
*/
/* { dg-warning "too many consecutive spaces in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "tab after space in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "whitespace other than spaces and tabs in leading whitespace" "" { target *-*-* } .-5 } */

View File

@ -0,0 +1,60 @@
/* { dg-do compile { target { c || c++11 } } } */
/* { dg-options "-Wleading-whitespace=spaces" } */
int i1; /* 4 spaces ok for */
int i2; /* 2 tabs 7 spaces not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i3; /* 8 spaces ok */
int i4; /* tab 8 spaces not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i5; /* 4 spaces tab not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i6; /* space tab not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i7; /* tab vtab not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i8; /* tab form-feed not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i9; /* 4 spaces vtab not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
int i10; /* 2 spaces form-feed not ok */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-1 } */
/* Just whitespace on a line is something for -Wtrailing-whitespace. */
int \
i11, \
i12, \
i13, \
i14, \
i15, \
i16, \
i17;
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-5 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-5 } */
const char *p = R"*|*(
a
b
c
d
e
f
g
)*|*";
/* This is a comment with leading whitespace non-issues and issues
a
b
c
d
e
f
g
*/
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-8 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than spaces in leading whitespace" "" { target *-*-* } .-7 } */

View File

@ -0,0 +1,64 @@
/* { dg-do compile { target { c || c++11 } } } */
/* { dg-options "-Wleading-whitespace=tabs" } */
int i1; /* tab ok */
int i2; /* 3 tabs ok */
int i3; /* 8 spaces not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i4; /* tab 8 spaces not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i5; /* 4 spaces tab not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i6; /* space tab not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i7; /* tab vtab not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i8; /* tab form-feed not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i9; /* 4 spaces vtab not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i10; /* 2 spaces form-feed not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
int i20; /* 2 spaces not ok */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-1 } */
/* Just whitespace on a line is something for -Wtrailing-whitespace. */
int \
i11, \
i12, \
i13, \
i14, \
i15, \
i16, \
i17;
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-6 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .+1 } */
const char *p = R"*|*(
a
b
c
d
e
f
g
)*|*";
/* This is a comment with leading whitespace non-issues and issues
a
b
c
d
e
f
g
*/
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-7 } */
/* { dg-warning "whitespace other than tabs in leading whitespace" "" { target *-*-* } .-7 } */

View File

@ -0,0 +1,41 @@
/* { dg-do compile { target { c || c++11 } } } */
/* { dg-options "-Wleading-whitespace=none" } */
int i1; /* tab ok */
int i2; /* 3 tabs ok */
int i3; /* 8 spaces ok */
int i4; /* tab 8 spaces ok */
int i5; /* 4 spaces tab ok */
int i6; /* space tab ok */
int i7; /* tab vtab ok */
int i8; /* tab form-feed ok */
int i9; /* 4 spaces vtab ok */
int i10; /* 2 spaces form-feed ok */
/* Just whitespace on a line is something for -Wtrailing-whitespace. */
int \
i11, \
i12, \
i13, \
i14, \
i15, \
i16, \
i17;
const char *p = R"*|*(
a
b
c
d
e
f
g
)*|*";
/* This is a comment with leading whitespace non-issues and issues
a
b
c
d
e
f
g
*/

View File

@ -609,9 +609,15 @@ struct cpp_options
/* True if -finput-charset= option has been used explicitly. */
bool cpp_input_charset_explicit;
/* -Wleading-whitespace= value. */
unsigned char cpp_warn_leading_whitespace;
/* -Wtrailing-whitespace= value. */
unsigned char cpp_warn_trailing_whitespace;
/* -ftabstop= value. */
unsigned int cpp_tabstop;
/* Dependency generation. */
struct
{
@ -738,6 +744,7 @@ enum cpp_warning_reason {
CPP_W_UNICODE,
CPP_W_HEADER_GUARD,
CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER,
CPP_W_LEADING_WHITESPACE,
CPP_W_TRAILING_WHITESPACE
};

View File

@ -261,6 +261,7 @@ cpp_create_reader (enum c_lang lang, cpp_hash_table *table,
CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
CPP_OPTION (pfile, cpp_warn_unicode) = 1;
CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
CPP_OPTION (pfile, cpp_tabstop) = 8;
/* Default CPP arithmetic to something sensible for the host for the
benefit of dumb users like fix-header. */

View File

@ -318,7 +318,8 @@ struct _cpp_line_note
/* Type of note. The 9 'from' trigraph characters represent those
trigraphs, '\\' an escaped newline, ' ' an escaped newline with
intervening space, 'W' trailing whitespace, 0 represents a note that
intervening space, 'W' trailing whitespace, 'L', 'S' and 'T' for
leading whitespace issues, 0 represents a note that
has already been handled, and anything else is invalid. */
unsigned int type;
};

View File

@ -818,6 +818,59 @@ _cpp_init_lexer (void)
#endif
}
/* Look for leading whitespace style issues on lines which don't contain
just whitespace.
For -Wleading-whitespace=spaces report if such lines contain leading
whitespace other than spaces.
For -Wleading-whitespace=tabs report if such lines contain leading
whitespace other than tabs.
For -Wleading-whitespace=blanks report if such lines contain leading
whitespace other than spaces+tabs, or contain in it tab after space,
or -ftabstop= or more consecutive spaces. */
static void
find_leading_whitespace_issues (cpp_reader *pfile, const uchar *s)
{
const unsigned char *p = NULL;
uchar type = 'L';
switch (CPP_OPTION (pfile, cpp_warn_leading_whitespace))
{
case 1: /* spaces */
while (*s == ' ')
++s;
break;
case 2: /* tabs */
while (*s == '\t')
++s;
break;
case 3: /* blanks */
while (*s == '\t')
++s;
int n;
n = CPP_OPTION (pfile, cpp_tabstop);
while (*s == ' ')
{
if (--n == 0)
break;
++s;
}
if (*s == '\t')
type = 'T'; /* Tab after space. */
else if (*s == ' ')
type = 'S'; /* Too many spaces. */
break;
default:
abort ();
}
if (!IS_NVSPACE (*s))
return;
p = s++;
while (IS_NVSPACE (*s))
++s;
if (*s != '\n' && *s != '\r')
add_line_note (pfile->buffer, p, type);
}
/* Returns with a logical line that contains no escaped newlines or
trigraphs. This is a time-critical inner loop. */
void
@ -836,6 +889,10 @@ _cpp_clean_line (cpp_reader *pfile)
if (!buffer->from_stage3)
{
const uchar *pbackslash = NULL;
bool leading_ws_done = true;
if (CPP_OPTION (pfile, cpp_warn_leading_whitespace))
find_leading_whitespace_issues (pfile, s);
/* Fast path. This is the common case of an un-escaped line with
no trigraphs. The primary win here is by not writing any
@ -906,6 +963,7 @@ _cpp_clean_line (cpp_reader *pfile)
add_line_note (buffer, p - 1, p != d ? ' ' : '\\');
d = p - 2;
buffer->next_line = p - 1;
leading_ws_done = false;
slow_path:
while (1)
@ -915,6 +973,10 @@ _cpp_clean_line (cpp_reader *pfile)
if (c == '\n' || c == '\r')
{
if (CPP_OPTION (pfile, cpp_warn_leading_whitespace)
&& !leading_ws_done)
find_leading_whitespace_issues (pfile, buffer->next_line);
/* Handle DOS line endings. */
if (c == '\r' && s != buffer->rlimit && s[1] == '\n')
s++;
@ -931,9 +993,17 @@ _cpp_clean_line (cpp_reader *pfile)
add_line_note (buffer, p - 1, p != d ? ' ' : '\\');
d = p - 2;
buffer->next_line = p - 1;
leading_ws_done = false;
}
else if (c == '?' && s[1] == '?' && _cpp_trigraph_map[s[2]])
{
if (CPP_OPTION (pfile, cpp_warn_leading_whitespace)
&& !leading_ws_done)
{
find_leading_whitespace_issues (pfile, buffer->next_line);
leading_ws_done = true;
}
/* Add a note regardless, for the benefit of -Wtrigraphs. */
add_line_note (buffer, d, s[2]);
if (CPP_OPTION (pfile, trigraphs))
@ -1073,6 +1143,39 @@ _cpp_process_line_notes (cpp_reader *pfile, int in_comment)
cpp_warning_with_line (pfile, CPP_W_TRAILING_WHITESPACE,
pfile->line_table->highest_line, col,
"trailing whitespace");
else if (note->type == 'S')
cpp_warning_with_line (pfile, CPP_W_LEADING_WHITESPACE,
pfile->line_table->highest_line, col,
"too many consecutive spaces in leading "
"whitespace");
else if (note->type == 'T')
cpp_warning_with_line (pfile, CPP_W_LEADING_WHITESPACE,
pfile->line_table->highest_line, col,
"tab after space in leading whitespace");
else if (note->type == 'L')
switch (CPP_OPTION (pfile, cpp_warn_leading_whitespace))
{
case 1:
cpp_warning_with_line (pfile, CPP_W_LEADING_WHITESPACE,
pfile->line_table->highest_line, col,
"whitespace other than spaces in leading "
"whitespace");
break;
case 2:
cpp_warning_with_line (pfile, CPP_W_LEADING_WHITESPACE,
pfile->line_table->highest_line, col,
"whitespace other than tabs in leading "
"whitespace");
break;
case 3:
cpp_warning_with_line (pfile, CPP_W_LEADING_WHITESPACE,
pfile->line_table->highest_line, col,
"whitespace other than spaces and tabs in "
"leading whitespace");
break;
default:
abort ();
}
else if (note->type == 0)
/* Already processed in lex_raw_string. */;
else
@ -2532,7 +2635,11 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base)
break;
case 'W':
/* Don't warn about trailing whitespace in raw string literals. */
case 'L':
case 'S':
case 'T':
/* Don't warn about leading or trailing whitespace in raw string
literals. */
note->type = 0;
note++;
break;