For now, task/taskloop constructs aren't handled and C/C++ array reductions
and reductions with task or inscan modifiers need further work.
Instead of calling omp_alloc/omp_free (where the former doesn't have
alignment argument and omp_aligned_alloc is 5.1 only feature), this calls
GOMP_alloc/GOMP_free, so that the library can fail if it would fall back
into NULL (exception is zero length allocations).
2020-11-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* builtin-types.def (BT_FN_PTR_SIZE_SIZE_PTRMODE): New function type.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): Move earlier.
(BUILT_IN_GOMP_ALLOC, BUILT_IN_GOMP_FREE): New builtins.
* gimplify.c (gimplify_scan_omp_clauses): Force allocator into a
decl if it is not NULL, INTEGER_CST or decl.
(gimplify_adjust_omp_clauses): Clear GOVD_EXPLICIT on explicit clauses
which are being removed. Remove allocate clauses for variables not seen
if they are private, firstprivate or linear too. Call
omp_notice_variable on the allocator otherwise.
(gimplify_omp_for): Handle iterator vars mentioned in allocate clauses
similarly to non-is_gimple_reg iterators.
* omp-low.c (struct omp_context): Add allocate_map field.
(delete_omp_context): Delete it.
(scan_sharing_clauses): Fill it from allocate clauses. Remove it
if mentioned also in shared clause.
(lower_private_allocate): New function.
(lower_rec_input_clauses): Handle allocate clause for privatized
variables, except for task/taskloop, C/C++ array reductions for now
and task/inscan variables.
(lower_send_shared_vars): Don't consider variables in allocate_map
as shared.
* omp-expand.c (expand_omp_for_generic, expand_omp_for_static_nochunk,
expand_omp_for_static_chunk): Use expand_omp_build_assign instead of
gimple_build_assign + gsi_insert_after.
* builtins.c (builtin_fnspec): Handle BUILTIN_GOMP_ALLOC and
BUILTIN_GOMP_FREE.
* tree-ssa-ccp.c (evaluate_stmt): Handle BUILTIN_GOMP_ALLOC.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Handle
BUILTIN_GOMP_ALLOC.
(mark_all_reaching_defs_necessary_1): Handle BUILTIN_GOMP_ALLOC
and BUILTIN_GOMP_FREE.
(propagate_necessity): Likewise.
gcc/fortran/
* f95-lang.c (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Define.
(gfc_init_builtin_functions): Add alloc_size and warn_unused_result
attributes to __builtin_GOMP_alloc.
* types.def (BT_PTRMODE): New primitive type.
(BT_FN_VOID_PTR_PTRMODE, BT_FN_PTR_SIZE_SIZE_PTRMODE): New function
types.
libgomp/
* libgomp.map (GOMP_alloc, GOMP_free): Export at GOMP_5.0.1.
* omp.h.in (omp_alloc): Add malloc and alloc_size attributes.
* libgomp_g.h (GOMP_alloc, GOMP_free): Declare.
* allocator.c (omp_aligned_alloc): New for now static function,
add alignment argument and handle it.
(omp_alloc): Reimplement using omp_aligned_alloc.
(GOMP_alloc, GOMP_free): New functions.
(omp_free): Add ialias.
* testsuite/libgomp.c-c++-common/allocate-1.c: New test.
* testsuite/libgomp.c++/allocate-1.C: New test.
cgraph_node::materialize_clone segfaulted when I tried compiling
Tramp3D with -fdump-ipa-all because there was no clone_info - IPA-CP
created a clone only for an aggregate constant, adding a note to its
transformation summary but not creating any tree_map nor
param_adjustements.
Fixed with the following obvious extra checks which has passed
bootstrap and testing on x86_64-linux.
gcc/ChangeLog:
2020-11-12 Martin Jambor <mjambor@suse.cz>
* cgraphclones.c (cgraph_node::materialize_clone): Check that clone
info is not NULL before attempting to dump it.
This patch converts the variables that hold time benefits and
frequencies in IPA-CP from plain integers to sreals, avoiding the need
to cap them to avoid overflows and also fixing a potential underflows.
Size costs corresponding to individual constants are left as ints so
that they do not take up too much space. Care must be taken that
adding it up does not overflow, especially in the case of
prop_size_cost, because in cases of extremely long chains of lattice
dependencies it can overflow (e.g. in testsuite/gcc.dg/ipa/pr50744.c).
The overall size is already tracked in long ints.
gcc/ChangeLog:
2020-11-11 Martin Jambor <mjambor@suse.cz>
* ipa-cp.c (class ipcp_value_base): Change the type of
local_time_benefit and prop_time_benefit to sreal. Adjust the
constructor initializer.
(ipcp_lattice::print): Dump sreals.
(struct caller_statistics): Change the type of freq_sum to sreal.
(gather_caller_stats): Work with sreal freq_sum.
(incorporate_penalties): Work with sreal evaluation.
(good_cloning_opportunity_p): Adjusted for sreal sreal time_benefit
and freq_sum. Bail out if size_cost is INT_MAX.
(perform_estimation_of_a_value): Work with sreal time_benefit. Avoid
unnecessary capping.
(estimate_local_effects): Pass sreal time benefit to
good_cloning_opportunity_p without capping it. Adjust dumping.
(safe_add): If there can be overflow, return INT_MAX.
(propagate_effects): Work with sreal times.
(get_info_about_necessary_edges): Work with sreal frequencies.
(decide_about_value): Likewise and with sreal time benefits.
I'd like to have the option of marking functions with
__attribute__ ((__warn_unused_result__)), so this patch adds a macro.
And use it for maybe_wrap_with_location, it's always a bug if the
return value is not used, which happened to me and got me confused.
gcc/ChangeLog:
* system.h (WARN_UNUSED_RESULT): Define for GCC >= 3.4.
* tree.h (maybe_wrap_with_location): Add WARN_UNUSED_RESULT.
* fold-const.c (operand_compare::operand_equal_p): Compare field
offsets in operand_equal_p and OEP_ADDRESS_OF.
(operand_compare::hash_operand): Update.
This changes the __numeric_traits primary template to assume its
argument is an integer type. For the three floating point types that are
supported by __numeric_traits_floating an explicit specialization of
__numeric_traits chooses the right base class.
This improves the failure mode for using __numeric_traits with an
unsupported type. Previously it would use __numeric_traits_floating as
the base class, and give somewhat obscure errors for trying to access
the static data members. Now it will use __numeric_traits_integer which
has a static_assert to check for supported types.
As a side effect of this change there is no need to instantiate
__conditional_type to decide which base class to use.
libstdc++-v3/ChangeLog:
* include/ext/numeric_traits.h (__numeric_traits): Change
primary template to always derive from __numeric_traits_integer.
(__numeric_traits<float>, __numeric_traits<double>)
(__numeric_traits<long double>): Add explicit specializations.
This fixes a bug in bitmap_list_view which could end up with
a NULL head->current which makes followup searches fail. Oops.
It also further optimizes the PRE DFS walk by removing useless
stuff and special-casing bitmaps with just one element for
EXECUTE_IF_AND_IN_BITMAP which makes a quite big difference.
2020-11-12 Richard Biener <rguenther@suse.de>
* bitmap.c (bitmap_list_view): Restore head->current.
* tree-ssa-pre.c (pre_expr_DFS): Elide expr_visited bitmap.
Special-case value expression bitmaps with one element.
(bitmap_find_leader): Likewise.
(sorted_array_from_bitmap_set): Elide expr_visited bitmap.
gcc/c-family
PR pch/86674
* c-pch.c (c_common_valid_pch): Use cpp_warning with CPP_W_INVALID_PCH
reason to fix -Werror=invalid-pch and -Wno-error=invalid-pch switches.
libcpp
PR pch/86674
* files.c (_cpp_find_file): Use CPP_DL_NOTE not CPP_DL_ERROR in call to
cpp_error.
The expression used to calculate the maximum value for an integer type
assumes that the number of bits in the value representation is always
sizeof(T) * CHAR_BIT. This is not true for the __int20 type on msp430,
which has only 20 bits in the value representation but 32 bits in the
object representation. This causes an integer overflow in a constant
expression, which is ill-formed.
This problem was already solved by DJ for std::numeric_limits<__int20>
by generalizing the helper macros to use a specified number of bits
instead of assuming sizeof(T) * CHAR_BIT. Then the INT_N_n types can
specify the number of bits using the __GLIBCXX_BITSIZE_INT_N_n macros
that the compiler defines.
I'm using a slightly different approach here. I've replaced the helper
macros entirely, and just expanded the calculations in the initializers
for the static data members. By reordering the data members we can reuse
__is_signed and __digits in the other initializers. This removes the
repetition of expanding __glibcxx_signed(T) and __glibcxx_digits(T)
multiple times in each initializer.
The __is_integer_nonstrict trait now defines a new constant, __width,
which is sizeof(T) * CHAR_BIT by default (defined as an enumerator so
that no storage is needed for a static data member). By specializing
__is_integer_nonstrict for the INT_N types that have padding bits, we
can provide the correct width via the __GLIBCXX_BITSIZE_INT_N_n macros.
libstdc++-v3/ChangeLog:
PR libstdc++/97798
* include/ext/numeric_traits.h (__glibcxx_signed)
(__glibcxx_digits, __glibcxx_min, __glibcxx_max): Remove
macros.
(__is_integer_nonstrict::__width): Define new constant.
(__numeric_traits_integer): Define constants in terms of each
other and __is_integer_nonstrict::__width, rather than the
removed macros.
(_GLIBCXX_INT_N_TRAITS): Macro to define explicit
specializations for non-standard integer types.
The following make sure to only iterate PRE insertion when
necessary - which is when AVAIL_OUT of a predecessor of a
block we already visited changed (that's backedge destinations).
To not regress this also makes sure to locally iterate insertion
since even topological sort of expressions isn't enough to
guarantee we get all opportunities of a block in one iteration.
This avoids costly re-compute of the topologically sorted expression
array (more micro-optimization is possible here).
2020-11-12 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c (bitmap_value_replace_in_set): Return
whether we have changed anything.
(do_pre_regular_insertion): Get topologically sorted array
of expressions from caller.
(do_pre_partial_partial_insertion): Likewise.
(insert): Compute topologically sorted arrays of expressions
here and locally iterate actual insertion. Iterate only
when AVAIL_OUT of an already visited block source changed.
This patch adds a missing not to the SVE2 BCAX (Bitwise clear and
exclusive or) pattern, fixing the PR. Since SVE doesn't have an
unpredicated not instruction, we need to use a (vacuously) predicated
not here.
To ensure that the predicate is instantiated correctly (to all 1s) for
the intrinsics, we pull out a separate expander from the define_insn.
From the ISA reference [1]:
> Bitwise AND elements of the second source vector with the
> corresponding inverted elements of the third source vector, then
> exclusive OR the results with corresponding elements of the first
> source vector.
[1] : https://developer.arm.com/docs/ddi0602/g/a64-sve-instructions-alphabetic-order/bcax-bitwise-clear-and-exclusive-or
gcc/ChangeLog:
PR target/97730
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_bcax<mode>):
Change to define_expand, add missing (trivially-predicated) not
rtx to fix wrong code bug.
(*aarch64_sve2_bcax<mode>): New.
gcc/testsuite/ChangeLog:
PR target/97730
* gcc.target/aarch64/sve2/bcax_1.c (OP): Add missing bitwise not
to match correct bcax semantics.
* gcc.dg/vect/pr97730.c: New test.
This fixes the postorder compute for the case of multiple
expression leaders for a value.
2020-11-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/97806
* tree-ssa-pre.c (pre_expr_DFS): New overload for visiting
values, visiting all leaders for a value. Use a bitmap
for visited values.
(sorted_array_from_bitmap_set): Walk over values and adjust.
* gcc.dg/pr97806.c: New testcase.
As the testcase shows, CLEANUP_POINT_EXPR (and I think TRY_FINALLY_EXPR too)
suffer from the same problem that I was trying to fix in
r10-3597-g1006c9d4395a939820df76f37c7b085a4a1a003f
for CLEANUP_STMT, namely that if in the middle of the body expression of
those stmts is e.g. return stmt, goto, break or continue (something that
changes *jump_target and makes it start skipping stmts), we then skip the
cleanups too, which is not appropriate - the cleanups were either queued up
during the non-skipping execution of the body (for CLEANUP_POINT_EXPR), or
for TRY_FINALLY_EXPR are relevant already after entering the body block.
> Would it make sense to always use a NULL jump_target when evaluating
> cleanups?
I was afraid of that, especially for TRY_FINALLY_EXPR, but it seems that
during constexpr evaluation the cleanups will most often be just very simple
destructor calls (or calls to cleanup attribute functions).
Furthermore, for neither of these 3 tree codes we'll reach that code if
jump_target && *jump_target initially (there is a return NULL_TREE much
earlier for those except for trees that could embed labels etc. in it and
clearly these 3 don't count in that).
2020-11-12 Jakub Jelinek <jakub@redhat.com>
PR c++/97790
* constexpr.c (cxx_eval_constant_expression) <case CLEANUP_POINT_EXPR,
case TRY_FINALLY_EXPR, case CLEANUP_STMT>: Don't pass jump_target to
cxx_eval_constant_expression when evaluating the cleanups.
* g++.dg/cpp2a/constexpr-dtor9.C: New test.
Just a preparation to add a lower-case tointvec.
gcc/ChangeLog:
* config/s390/vector.md: Rename tointvec to TOINTVEC.
* config/s390/vx-builtins.md: Likewise.
If DECL_INITIAL isn't set, we can't emit anything about the body of the
function, so add the declaration attribute.
gcc/ChangeLog:
PR debug/97060
* dwarf2out.c (gen_subprogram_die): It's a declaration
if DECL_INITIAL isn't set.
gcc/testsuite/ChangeLog:
PR debug/97060
* gcc.dg/debug/dwarf2/pr97060.c: New test.
New test gcc.dg/tree-ssa/pr96789.c fails on
arm-none-linux-gnueabihf since loop vectorizer is able to optimize
the two loops which operate on array tmp with load_lanes feature
support, it make dse3 fail to find expected inputs.
As Richard suggested, this patch is to replace option
-ftree-vectorize to -ftree-slp-vectorize -fno-tree-loop-vectorize.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr96789.c: Adjusted by disabling loop
vectorization.
This patch adds a custom event to paths emitted by
-Wanalyzer-stale-setjmp-buffer highlighting the place where the
pertinent stack frame is popped, and updates the final event in
the path to reference this.
gcc/analyzer/ChangeLog:
* checker-path.h (checker_event::get_id_ptr): New.
* diagnostic-manager.cc (path_builder::path_builder): Add "sd"
param and use it to initialize new field "m_sd".
(path_builder::get_pending_diagnostic): New.
(path_builder::m_sd): New field.
(diagnostic_manager::emit_saved_diagnostic): Pass sd to
path_builder ctor.
(diagnostic_manager::add_events_for_superedge): Call new
maybe_add_custom_events_for_superedge vfunc.
* engine.cc (stale_jmp_buf::stale_jmp_buf): Add "setjmp_point"
param and use it to initialize new field "m_setjmp_point".
Initialize new field "m_stack_pop_event".
(stale_jmp_buf::maybe_add_custom_events_for_superedge): New vfunc
implementation.
(stale_jmp_buf::describe_final_event): New vfunc implementation.
(stale_jmp_buf::m_setjmp_point): New field.
(stale_jmp_buf::m_stack_pop_event): New field.
(exploded_node::on_longjmp): Pass setjmp_point to stale_jmp_buf
ctor.
* pending-diagnostic.h
(pending_diagnostic::maybe_add_custom_events_for_superedge): New
vfunc.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/setjmp-5.c: Update expected path output to show
an event where the pertinent stack frame is popped. Update
expected message from final event to reference this event.
This patch implements -Wanalyzer-shift-count-negative
and -Wanalyzer-shift-count-overflow, analogous to the C/C++
warnings -Wshift-count-negative and -Wshift-count-overflow, but
implemented via interprocedural path analysis rather than via parsing
in a front end, and thus capable of detecting interprocedural cases that the
warnings implemented in the front ends can miss.
gcc/analyzer/ChangeLog:
PR tree-optimization/97424
* analyzer.opt (Wanalyzer-shift-count-negative): New.
(Wanalyzer-shift-count-overflow): New.
* region-model.cc (class shift_count_negative_diagnostic): New.
(class shift_count_overflow_diagnostic): New.
(region_model::get_gassign_result): Complain about shift counts that
are negative or are >= the operand's type's width.
gcc/ChangeLog:
PR tree-optimization/97424
* doc/invoke.texi (Static Analyzer Options): Add
-Wno-analyzer-shift-count-negative and
-Wno-analyzer-shift-count-overflow.
(-Wno-analyzer-shift-count-negative): New.
(-Wno-analyzer-shift-count-overflow): New.
gcc/testsuite/ChangeLog:
PR tree-optimization/97424
* gcc.dg/analyzer/invalid-shift-1.c: New test.
At present, the output of .cfi_personality and .cfi_lsda assumes
ELF semantics for indirections. This isn't suitable for all targets
and is one blocker to moving Darwin to use .cfi_xxxx.
The patch adds a target hook that allows non-ELF targets to use
indirections appropriate to their needs.
gcc/ChangeLog:
* config/darwin-protos.h (darwin_make_eh_symbol_indirect): New.
* config/darwin.c (darwin_make_eh_symbol_indirect): New. Use
Mach-O semantics for personality and ldsa indirections.
* config/darwin.h (TARGET_ASM_MAKE_EH_SYMBOL_INDIRECT): New.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add TARGET_ASM_MAKE_EH_SYMBOL_INDIRECT hook.
* dwarf2out.c (dwarf2out_do_cfi_startproc): If the target defines
a hook for indirecting personality and ldsa references, use that
otherwise default to ELF semantics.
* target.def (make_eh_symbol_indirect): New target hook.
For Objective-C++, this combines prefix attributes from before and
after top level linkage specs. The "reference implementation" for
Objective-C++ allows this, and system headers depend on it.
e.g.
__attribute__((__deprecated__))
extern "C" __attribute__((__visibility__("default")))
@interface MyClass
...
@end
Would consider the list of prefix attributes to the interface for
MyClass to include both the visibility and deprecated ones.
When we are compiling regular C++, this emits a warning and discards
any prefix attributes before a linkage spec.
gcc/cp/ChangeLog:
* parser.c (cp_parser_declaration): Unless we are compiling for
Ojective-C++, warn about and discard any attributes that prefix
a linkage specification.
This patch changes the mangling of __alignof__ to v111__alignof__,
making its mangling distinct from that of alignof(type) and
alignof(expr).
How we mangle ALIGNOF_EXPR now depends on its ALIGNOF_EXPR_STD_P flag,
which after the previous patch gets consistently set for alignof(type)
as well as alignof(expr).
gcc/c-family/ChangeLog:
PR c++/88115
* c-opts.c (c_common_post_options): Update latest_abi_version.
gcc/ChangeLog:
PR c++/88115
* common.opt (-fabi-version): Document =15.
* doc/invoke.texi (C++ Dialect Options): Likewise.
gcc/cp/ChangeLog:
PR c++/88115
* mangle.c (write_expression): Mangle __alignof_ differently
from alignof when the ABI version is at least 15.
libiberty/ChangeLog:
PR c++/88115
* cp-demangle.c (d_print_comp_inner)
<case DEMANGLE_COMPONENT_EXTENDED_OPERATOR>: Don't print the
"operator " prefix for __alignof__.
<case DEMANGLE_COMPONENT_UNARY>: Always print parens around the
operand of __alignof__.
* testsuite/demangle-expected: Test demangling for __alignof__.
gcc/testsuite/ChangeLog:
PR c++/88115
* g++.dg/abi/macro0.C: Adjust.
* g++.dg/cpp0x/alignof7.C: New test.
* g++.dg/cpp0x/alignof8.C: New test.
We're currently neglecting to set the ALIGNOF_EXPR_STD_P flag on an
ALIGNOF_EXPR when its operand is an expression. This leads to us
handling alignof(expr) as if it were written __alignof__(expr), and
returning the preferred alignment instead of the ABI alignment. In the
testcase below, this causes the first and third static_assert to fail on
x86.
gcc/cp/ChangeLog:
PR c++/88115
* cp-tree.h (cxx_sizeof_or_alignof_expr): Add bool parameter.
* decl.c (fold_sizeof_expr): Pass false to
cxx_sizeof_or_alignof_expr.
* parser.c (cp_parser_unary_expression): Pass std_alignof to
cxx_sizeof_or_alignof_expr.
* pt.c (tsubst_copy): Pass false to cxx_sizeof_or_alignof_expr.
(tsubst_copy_and_build): Pass std_alignof to
cxx_sizeof_or_alignof_expr.
* typeck.c (cxx_alignof_expr): Add std_alignof bool parameter
and pass it to cxx_sizeof_or_alignof_type. Set ALIGNOF_EXPR_STD_P
appropriately.
(cxx_sizeof_or_alignof_expr): Add std_alignof bool parameter
and pass it to cxx_alignof_expr. Assert op is either
SIZEOF_EXPR or ALIGNOF_EXPR.
libcc1/ChangeLog:
PR c++/88115
* libcp1plugin.cc (plugin_build_unary_expr): Pass true to
cxx_sizeof_or_alignof_expr.
gcc/testsuite/ChangeLog:
PR c++/88115
* g++.dg/cpp0x/alignof6.C: New test.
Retain the location when tsubstituting a qualified-id so that our
static_assert diagnostic can benefit. Don't create useless location
wrappers for temporary variables.
gcc/ChangeLog:
PR c++/97518
* tree.c (maybe_wrap_with_location): Don't add a location
wrapper around an artificial and ignored decl.
gcc/cp/ChangeLog:
PR c++/97518
* pt.c (tsubst_qualified_id): Use EXPR_LOCATION of the qualified-id.
Use it to maybe_wrap_with_location the final expression.
gcc/testsuite/ChangeLog:
PR c++/97518
* g++.dg/diagnostic/static_assert3.C: New test.
Accesses to NEW_SETS should be properly guarded. Committed
as obvious.
2020-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/97623
* tree-ssa-pre.c (create_expression_by_pieces): Guard
NEW_SETS access.
(insert_into_preds_of_block): Likewise.
PE format does not have ELF style relro linker support, exclude
from checking. If the host linker supports ELF format, configure
may get confused.
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_CHECK_LINKER_FEATURES): Exclude
cygwin and mingw from relro linker test.
* configure: Regenerate.
This fixes sorted_array_from_bitmap_set to do a topological sort
as required by re-using what PHI-translation does, namely a DFS
walk with the help of bitmap_find_leader. The proper result
is verified by extra checking in clean () (which would have tripped
before) and for the testcase I'm working at during the last
patches (PR97623) it is neutral in compile-time cost.
2020-11-11 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c (pre_expr_DFS): New function.
(sorted_array_from_bitmap_set): Use it to properly
topologically sort the expression set.
(clean): Verify we've cleaned everything we should.
The added (?:_ull) match on 32-bit targets, but are equivalent to just
adding _ull into the strings, i.e. require the _ull substrings, while
the intent is that they are optional, so we should use (?:_ull)? instead.
2020-11-11 Jakub Jelinek <jakub@redhat.com>
* gfortran.dg/gomp/workshare-reduction-3.f90: Use (?:_ull)? instead
of (?:_ull) in the scan-tree-dump-times directives.
* gfortran.dg/gomp/workshare-reduction-26.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-27.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-28.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-36.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-37.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-38.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-39.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-40.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-41.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-42.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-43.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-44.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-45.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-46.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-47.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-56.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-57.f90: Likewise.
gcc/ada/ChangeLog:
* gcc-interface/gigi.h: Remove ^L characters throughout.
* gcc-interface/decl.c: Likewise.
* gcc-interface/utils.c: Likewise.
* gcc-interface/utils2.c: Likewise.
* gcc-interface/trans.c (gnat_to_gnu) <N_Allocator>: Do not explicitly
go to the base type for the Has_Constrained_Partial_View flag.
The Ada compiler uses a biased representation when a size clause reserves
fewer bits than normal either for the lower or for the upper bound.
gcc/ada/ChangeLog:
* gcc-interface/trans.c (build_binary_op_trapv): Convert operands
to the result type before doing generic overflow checking.
gcc/testsuite/ChangeLog:
* gnat.dg/bias2.adb: New test.
This is a rather obscure case where the elaboration of an empty array
whose base type is an array type of length at most 1 goes awry when
the code is compiled with optimization.
gcc/ada/ChangeLog:
* gcc-interface/trans.c (can_be_lower_p): Remove.
(Regular_Loop_to_gnu): Add ENTRY_COND unconditionally if
BOTTOM_COND is non-zero.
gcc/testsuite/ChangeLog:
* gnat.dg/opt89.adb: New test.
gcc/ada/ChangeLog:
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Constant>: In case
the constant is not being defined, get the expression in type
annotation mode only if its type is elementary.
This is a regression present on the mainline and 10 branch in the form
of an ICE with a shift operator applied to a variable of a signed type,
and which is caused by a type mismatch.
gcc/ada/ChangeLog:
* gcc-interface/trans.c (gnat_to_gnu) <N_Op_Shift>: Also convert
GNU_MAX_SHIFT if the type of the operation has been changed.
* gcc-interface/utils.c (can_materialize_object_renaming_p): Add
pair of missing parentheses.
gcc/testsuite/ChangeLog:
* gnat.dg/shift1.adb: New test.
Tested on x86_64-unknown-linux-gnu, pushed.
2020-11-11 Richard Biener <rguenther@suse.de>
PR testsuite/97797
* gcc.dg/torture/ssa-fre-5.c: Use __SIZETYPE__ where
appropriate.
* gcc.dg/torture/ssa-fre-6.c: Likewise.
The recent previous change in this area limited hoist insertion
iteration via a param but the following is IMHO better since
we are not really interested in PRE opportunities exposed by
hoisting but only the other way around. So this moves hoist
insertion after PRE iteration finished and removes hoist
insertion iteration alltogether.
2020-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/97623
* params.opt (-param=max-pre-hoist-insert-iterations): Remove
again.
* doc/invoke.texi (max-pre-hoist-insert-iterations): Likewise.
* tree-ssa-pre.c (insert): Move hoist insertion after PRE
insertion iteration and do not iterate it.
* gcc.dg/tree-ssa/ssa-hoist-3.c: Adjust.
* gcc.dg/tree-ssa/ssa-hoist-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-30.c: Likewise.
This patch adds support for comparing unpacked SVE integer vectors,
such as byte elements stored in the bottom bytes of halfword
containers. It also adds support for selects between unpacked
SVE vectors (both integer and floating-point), since selects and
compares are closely tied via the vcond optab interface.
gcc/
* config/aarch64/aarch64-sve.md (@vcond_mask_<mode><vpred>): Extend
from SVE_FULL to SVE_ALL.
(*vcond_mask_<mode><vpred>): Likewise.
(@aarch64_sel_dup<mode>): Likewise.
(vcond<SVE_FULL:mode><v_int_equiv>): Extend to...
(vcond<SVE_ALL:mode><SVE_I:mode>): ...this, but requiring the
sizes of the container modes to match.
(vcondu<SVE_FULL:mode><v_int_equiv>): Extend to...
(vcondu<SVE_ALL:mode><SVE_I:mode>): ...this.
(vec_cmp<SVE_FULL_I:mode><vpred>): Extend to...
(vec_cmp<SVE_I:mode><vpred>): ...this.
(vec_cmpu<SVE_FULL_I:mode><vpred>): Extend to...
(vec_cmpu<SVE_I:mode><vpred>): ...this.
(@aarch64_pred_cmp<cmp_op><SVE_FULL_I:mode>): Extend to...
(@aarch64_pred_cmp<cmp_op><SVE_I:mode>): ...this.
(*cmp<cmp_op><SVE_FULL_I:mode>_cc): Extend to...
(*cmp<cmp_op><SVE_I:mode>_cc): ...this.
(*cmp<cmp_op><SVE_FULL_I:mode>_ptest): Extend to...
(*cmp<cmp_op><SVE_I:mode>_ptest): ...this.
(*cmp<cmp_op><SVE_FULL_I:mode>_and): Extend to...
(*cmp<cmp_op><SVE_I:mode>_and): ...this.
gcc/testsuite/
* gcc.target/aarch64/sve/cmp_1.c: New test.
* gcc.target/aarch64/sve/cmp_2.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_1.c: Add --param
aarch64-sve-compare-costs=0
* gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_load_slp_1.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
The vcond code requires the compared vectors and the selected
vectors to have both the same size and the same number of elements
as each other. But the operation makes logical sense even for
different vector sizes. E.g. you could compare two V4SIs and
use the result to select between two V4DIs.
The underlying optab already allows the compared mode and the selected
mode to be specified separately. Since the vectoriser now also
supports mixed vector sizes, I think we can simply remove the
equal-size check and just keep the equal-lanes check. It's then
up to the target to decide which (if any) mixtures of sizes it
supports.
gcc/
* optabs-tree.c (expand_vec_cond_expr_p): Allow the compared values
and the selected values to have different mode sizes.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Likewise.
Move assigning to a std::jthread that represents a thread of execution
needs to send a stop request and join that running thread. Otherwise the
std::thread data member will terminate in its assignment operator.
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/thread (jthread::operator=(jthread&&)): Transfer
any existing state to a temporary that will request a stop and
then join.
* testsuite/30_threads/jthread/jthread.cc: Test move assignment.
This encapsulates the storing and checking of the thread ID into a class
type, so that the macro _GLIBCXX_HAS_GTHREADS is only checked in one
place. The code doing the checks just calls member functions of the new
type, without caring whether that really does any work or not.
libstdc++-v3/ChangeLog:
* include/std/stop_token (_Stop_state_t::_M_requester): Define
new struct with members to store and check the thread ID.
(_Stop_state_t::_M_request_stop()): Use _M_requester._M_set().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Use
_M_requester._M_is_current_thread().
The topological sort sorted_array_from_bitmap_set is supposed to
provide isn't one since quite some time since value_ids are
assigned first to SSA names in the order of SSA_NAME_VERSION
and then to hashtable entries in the order they appear in the
table. One can even argue that expression-ids provide a closer
approximation of a topological sort since those are assigned
during AVAIL_OUT computation which is done in a dominator walk.
Now - phi-translation is not even depending on topological sorting
but it essentially does a DFS walk, phi-translating expressions
it depends on and relying on phi-translation caching to avoid
doing redundant work.
So this patch drops the use of sorted_array_from_bitmap_set from
phi_translate_set because this function is quite expensive.
2020-11-11 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c (phi_translate_set): Do not sort the
expression set topologically.