* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (62 commits)
target: Fix compile warning w/ missing module.h include
target: Remove legacy se_task->task_timer and associated logic
target: Fix incorrect transport_sent usage
target: re-use the command S/G list for single-task commands
target: Fix BIDI t_task_cdb handling in transport_generic_new_cmd
target: remove transport_allocate_tasks
target: merge transport_new_cmd_obj into transport_generic_new_cmd
target: remove the task_sg_bidi field se_task and pSCSI BIDI support
target: transport_subsystem_check_init cleanups
target: use a workqueue for I/O completions
target: remove unused TRANSPORT_ states
target: remove TRANSPORT_DEFERRED_CMD state
target: remove the TRANSPORT_REMOVE state
target: move depth_left manipulation out of transport_generic_request_failure
target: stop task timers earlier
target: remove TF_TIMER_STOP
target: factor some duplicate code for stopping a task
target: fix list walking in transport_free_dev_tasks
target: use transport_cmd_check_stop_to_fabric consistently
target: do not pass the queue object to transport_remove_cmd_from_queue
...
* 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits)
TOMOYO: Fix incomplete read after seek.
Smack: allow to access /smack/access as normal user
TOMOYO: Fix unused kernel config option.
Smack: fix: invalid length set for the result of /smack/access
Smack: compilation fix
Smack: fix for /smack/access output, use string instead of byte
Smack: domain transition protections (v3)
Smack: Provide information for UDS getsockopt(SO_PEERCRED)
Smack: Clean up comments
Smack: Repair processing of fcntl
Smack: Rule list lookup performance
Smack: check permissions from user space (v2)
TOMOYO: Fix quota and garbage collector.
TOMOYO: Remove redundant tasklist_lock.
TOMOYO: Fix domain transition failure warning.
TOMOYO: Remove tomoyo_policy_memory_lock spinlock.
TOMOYO: Simplify garbage collector.
TOMOYO: Fix make namespacecheck warnings.
target: check hex2bin result
encrypted-keys: check hex2bin result
...
* 'stable/drivers-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xenbus: don't rely on xen_initial_domain to detect local xenstore
xenbus: Fix loopback event channel assuming domain 0
xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable
xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive
* 'stable/drivers.bugfixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pciback: Check if the device is found instead of blindly assuming so.
xen/pciback: Do not dereference psdev during printk when it is NULL.
xen: remove XEN_PLATFORM_PCI config option
xen: XEN_PVHVM depends on PCI
xen/pciback: double lock typo
xen/pciback: use mutex rather than spinlock in vpci backend
xen/pciback: Use mutexes when working with Xenbus state transitions.
xen/pciback: miscellaneous adjustments
xen/pciback: use mutex rather than spinlock in passthrough backend
xen/pciback: use resource_size()
* 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: support multi-segment systems
xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.
xen/pci: make bus notifier handler return sane values
xen-swiotlb: fix printk and panic args
xen-swiotlb: Fix wrong panic.
xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB
xen-pcifront: Update warning comment to use 'e820_host' option.
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m/debugfs: Make type_name more obvious.
xen/p2m/debugfs: Fix potential pointer exception.
xen/enlighten: Fix compile warnings and set cx to known value.
xen/xenbus: Remove the unnecessary check.
xen/irq: If we fail during msi_capability_init return proper error code.
xen/events: Don't check the info for NULL as it is already done.
xen/events: BUG() when we can't allocate our event->irq array.
* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: Fix selfballooning and ensure it doesn't go too far
xen/gntdev: Fix sleep-inside-spinlock
xen: modify kernel mappings corresponding to granted pages
xen: add an "highmem" parameter to alloc_xenballooned_pages
xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
xen/p2m: Make debug/xen/mmu/p2m visible again.
Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
* 'stable/e820-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: release all pages within 1-1 p2m mappings
xen: allow extra memory to be in multiple regions
xen: allow balloon driver to use more than one memory region
xen/balloon: simplify test for the end of usable RAM
xen/balloon: account for pages released during memory setup
Markers have removed already twice:
1: fc5377668c
2: eb878b3bc0
But a little bit is still here.
Signed-off-by: Tkhai Kirill <tkhai@yandex.ru>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
This patch removes the legacy usage of se_task->task_timer and associated
infrastructure that originally was used as a way to help manage buggy backend
SCSI LLDs that in certain cases would never return back an outstanding task.
This includes the removal of target_complete_timeout_work(), timeout logic
from transport_complete_task(), transport_task_timeout_handler(),
transport_start_task_timer(), the per device task_timeout configfs attribute,
and all task_timeout associated structure members and defines in
target_core_base.h
This is being removed in preparation to make transport_complete_task() run
in lock-less mode.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts target-core to use se_cmd->t_transport_sent instead of
a duplicated se_cmd->transport_sent member in a handful of locations.
It also updates iscsi_target to properly use ->t_transport_sent instead of
it's own iscsi_cmd_t->transport_sent value that was not being assigned.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This field is never used given that BIDI handling happens at the
command and not the task level. Remove it and the dead code in
pscsi that tries to work on it.
It also prevents pSCSI passthrough for the two currently enabled BIDI
commands now that task->task_sg_bidi support has been removed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Remove the now unnecessary extra call to transport_subsystem_check_init() in
target_core_register_fabric(), and also merge transport_subsystem_reqmods()
directly into transport_subsystem_check_init().
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Instead of abusing the target processing thread for offloading I/O
completion in the backends to user context add a new workqueue. This means
completions can be processed as fast as available CPU time allows it,
including in parallel with other completions and more importantly I/O
submission or QUEUE FULL retries. This should give much better performance
especially on loaded systems.
As a fallout we can merge all the completed states into a single
one.
On the downside this change complicates lun reset handling a bit by
requiring us to cancel a work item only for those states that have it
initialized. The alternative would be to either always initialize the work
item to a dummy handler, or always use the same handler and do a switch on
the state. The long term solution will be a flag that says that the command
has an initialized work item, but that's only going to be useful once we
have more users.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We never check for this state, and it makes testing for a completed
state much harder given that it overrides the existing state.
Also remove the unused deferred_t_state which is related to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We never queue an command with this state, and only set it in a completely
bogus place in tcm_fc.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently we stop the timers for all tasks in a command fairly late during
I/O completion, which is fairly pointless and requires all kinds of safety
checks.
Instead delete pending timers early on in transport_complete_task, thus
ensuring no new timers firest after that. We take t_state_lock a bit later
in that function thus making sure currenly running timers are out of the
criticial section. To be completely sure the timer has finished we also
add another del_timer_sync call when freeing the task.
This also allows removing TF_TIMER_RUNNING as it would be equivalent
to TF_ACTIVE now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
TF_TIMER_STOP is useless as it only helps to mitigate a tiny race during
deleting the timer. But given that we have cleared TF_ACTIVE at this point
we already have another mitigation a few lines down the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Add a new boolean at_head parameter to transport_add_cmd_to_queue and thus
obsolete the SCF_EMULATE_QUEUE_FULL flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Remove the need for the transport_qf_callback callback by making
sure we have specific states with specific handlers for the two
queue full cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Remove the dpo_emulated, fua_write_emulated, fua_read_emulated and
write_cache_emulated methods, and replace them with a simple bitfields in
se_subsystem_api in those cases where they ever returned one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Add a switch statement implementing the CDB LBA/len update directly
in target_get_task_cdb and remove the old ->transport_split_cdb
callback and all its implementations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Instead of calling out to the backends from the core to get a per-task
CDB and then modify it for the LBA/len pair used for this CDB provide
a helper that writes the adjusted CDB into a provided buffer and call
this method from ->do_task in pscsi.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Rearrange the fields in se_task to avoid holes. Also increase the
flags field to 16 bits as we have the space for it, and this makes
adding new flags safer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This is a squashed version of the following unnecessary se_task structure
member removal patches:
target: remove the task_execute_queue field in se_task
Instead of using a separate flag we can simply do list_emptry checks
on t_execute_list if we make sure to always use list_del_init to remove
a task from the list. Also factor some duplicate code into a new
__transport_remove_task_from_execute_queue helper.
target: remove the read-only task_no field in se_task
The task_no field never was initialized and only used in debug printks,
so kill it.
target: remove the task_padded_sg field in se_task
This field is only check in one place and not actually needed there.
Rationale:
- transport_do_task_sg_chain asserts that we have task_sg_chaining
set early on
- we only make use of the sg_prev_nents field we calculate based on it
if there is another sg list that gets chained onto this one, which
never happens for the last (or only) task.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Replace various atomic_t variables that were mostly under t_state_lock
with new flags in task_flags. Note that the execution error path
didn't take t_state_lock before, so add it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This is a squashed version of the following se_task cleanup patches:
target: remove the unused task_state_flags field in se_task
target: remove the unused se_obj_ptr field in se_task
target: remove the se_dev field in se_task
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This is a squashed version of the following target_core_base.h
cleanup patches:
target: remove the unused SHUTDOWN_SIGS defintion
target: remove unused se_mem leftovers
target: remove the unused map_func_t typedef
target: move TRANSPORT_IOV_DATA_BUFFER to the iscsi-specific code
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch merges transport_cmd_finish_abort_tmr() logic into a single
transport_cmd_finish_abort() function by adding a cmd->se_tmr_req check
around transport_lun_remove_cmd(), and updates the single caller within
core_tmr_drain_tmr_list().
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts se_cmd->transport_wait_for_tasks(se_cmd, 1) usage to use
transport_generic_free_cmd() directly in target-core and iscsi-target fabric
usage. The includes:
*) Removal of the optional transport_generic_free_cmd() call from within
transport_generic_wait_for_tasks()
*) Usage of existing SCF_SUPPORTED_SAM_OPCODE to determine when
transport_generic_wait_for_tasks() processing may occur instead of
checking se_cmd->transport_wait_for_tasks()
*) Move transport_generic_wait_for_tasks() call ahead of core_dec_lacl_count()
and transport_lun_remove_cmd() in transport_generic_free_cmd() to follow
existing logic for iscsi-target w/ se_cmd->transport_wait_for_tasks(se_cmd, 1)
*) Removal of se_cmd->transport_wait_for_tasks() function pointer
*) Rename transport_generic_wait_for_tasks() -> transport_wait_for_tasks(), and
add docbook comment.
*) Add EXPORT_SYMBOL for transport_wait_for_tasks()
For the case in iscsi_target_erl2.c:iscsit_prepare_cmds_for_realligance()
where se_cmd->transport_wait_for_tasks(se_cmd, 0) is called, this patch
adds a direct call to transport_wait_for_tasks().
(hch: Fix transport_generic_free_cmd() usage in iscsit_release_commands_from_conn)
(nab: Add patch: Ensure that TMRs hit wait_for_tasks logic during release)
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Testing in_interrupt() to know when sleeping is allowed is not really
reliable (since eg it won't be true if the caller is holding a spinlock).
Instead have the caller tell core_tmr_alloc_req() what GFP_xxx to use;
every caller except tcm_qla2xxx can use GFP_KERNEL.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The cdb_none, map_data_SG and map_control_SG methods have no callers left
and can be removed now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch removes the unnecessary session_reinstatement parameter from
se_cmd->transport_wait_for_tasks(), logic in transport_generic_wait_for_tasks,
and usage within iscsi-target code.
This also includes the removal of the 'bool' return from transport_put_cmd() +
transport_generic_free_cmd() that is no longer necessary.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Push session reinstatement out of transport_generic_free_cmd into the only
caller that actually needs it. Clean up transport_generic_free_cmd a bit,
and remove the useless comment. I'd love to add a more useful kerneldoc
comment for it, but as this point I'm still a bit confused in where it
stands in the command release stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
It is only called by transport_release_cmd, so inline it there. Also add
a kerneldoc comment for transport_release_cmd while we are at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Commit 903ab86d19 of 1 March this year ("udp: Add
lockless transmit path") introduced a new fast TX path that broke the checksum
coverage computation of UDP-lite, which so far depended on up->len (only set
if the socket is locked and 0 in the fast path).
Fixed by providing both fast- and slow-path computation of checksum coverage.
The latter can be removed when UDP(-lite)v6 also uses a lockless transmit path.
Reported-by: Thomas Volkert <thomas@homer-conferencing.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_vs_mutext is used by both netns shutdown code and startup
and both implicit uses sk_lock-AF_INET mutex.
cleanup CPU-1 startup CPU-2
ip_vs_dst_event() ip_vs_genl_set_cmd()
sk_lock-AF_INET __ip_vs_mutex
sk_lock-AF_INET
__ip_vs_mutex
* DEAD LOCK *
A new mutex placed in ip_vs netns struct called sync_mutex is added.
Comments from Julian and Simon added.
This patch has been running for more than 3 month now and it seems to work.
Ver. 3
IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex
instead of __ip_vs_mutex as sugested by Julian.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* git://github.com/davem330/net:
pch_gbe: Fixed the issue on which a network freezes
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
net: xen-netback: correctly restart Tx after a VM restore/migrate
bonding: properly stop queuing work when requested
can bcm: fix incomplete tx_setup fix
RDSRDMA: Fix cleanup of rds_iw_mr_pool
net: Documentation: Fix type of variables
ibmveth: Fix oops on request_irq failure
ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
cxgb4: Fix EEH on IBM P7IOC
can bcm: fix tx_setup off-by-one errors
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
dp83640: reduce driver noise
ptp: fix L2 event message recognition
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
irq: Fix check for already initialized irq_domain in irq_domain_add
irq: Add declaration of irq_domain_simple_ops to irqdomain.h
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86/rtc: Don't recursively acquire rtc_lock
* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
posix-cpu-timers: Cure SMP wobbles
sched: Fix up wchan borkage
sched/rt: Migrate equal priority tasks to available CPUs
David reported:
Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.
Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.
I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).
For example:
[davem@boricha build-x86_64-linux]$ ./test
process: before(0.001221967) after(0.498624371) diff(497402404)
thread: before(0.000081692) after(0.498316431) diff(498234739)
self: before(0.001223521) after(0.001240219) diff(16698)
[davem@boricha build-x86_64-linux]$
The diff of 'process' should always be >= the diff of 'thread'.
I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.
---
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
static pthread_barrier_t barrier;
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1)
__asm__ __volatile__("" : : : "memory");
return NULL;
}
int main(void)
{
clockid_t process_clock, my_thread_clock, th_clock;
struct timespec process_before, process_after;
struct timespec me_before, me_after;
struct timespec th_before, th_after;
struct timespec sleeptime;
unsigned long diff;
pthread_t th;
int err;
err = clock_getcpuclockid(0, &process_clock);
if (err)
return 1;
err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
if (err)
return 1;
pthread_barrier_init(&barrier, NULL, 2);
err = pthread_create(&th, NULL, chew_cpu, NULL);
if (err)
return 1;
err = pthread_getcpuclockid(th, &th_clock);
if (err)
return 1;
pthread_barrier_wait(&barrier);
err = clock_gettime(process_clock, &process_before);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_before);
if (err)
return 1;
err = clock_gettime(th_clock, &th_before);
if (err)
return 1;
sleeptime.tv_sec = 0;
sleeptime.tv_nsec = 500000000;
nanosleep(&sleeptime, NULL);
err = clock_gettime(th_clock, &th_after);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_after);
if (err)
return 1;
err = clock_gettime(process_clock, &process_after);
if (err)
return 1;
diff = process_after.tv_nsec - process_before.tv_nsec;
printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
diff = th_after.tv_nsec - th_before.tv_nsec;
printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
diff = me_after.tv_nsec - me_before.tv_nsec;
printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);
return 0;
}
This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.
This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().
Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Allow the xen balloon driver to populate its list of extra pages from
more than one region of memory. This will allow platforms to provide
(for example) a region of low memory and a region of high memory.
The maximum possible number of extra regions is 128 (== E820MAX) which
is quite large so xen_extra_mem is placed in __initdata. This is safe
as both xen_memory_setup() and balloon_init() are in __init.
The balloon regions themselves are not altered (i.e., there is still
only the one region).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In xen_memory_setup() pages that occur in gaps in the memory map are
released back to Xen. This reduces the domain's current page count in
the hypervisor. The Xen balloon driver does not correctly decrease
its initial current_pages count to reflect this. If 'delta' pages are
released and the target is adjusted the resulting reservation is
always 'delta' less than the requested target.
This affects dom0 if the initial allocation of pages overlaps the PCI
memory region but won't affect most domU guests that have been setup
with pseudo-physical memory maps that don't have gaps.
Fix this by accouting for the released pages when starting the balloon
driver.
If the domain's targets are managed by xapi, the domain may eventually
run out of memory and die because xapi currently gets its target
calculations wrong and whenever it is restarted it always reduces the
target by 'delta'.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.
In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.
Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Removed GRANT_FRAME_BIT usage]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add an highmem parameter to alloc_xenballooned_pages, to allow callers to
request lowmem or highmem pages.
Fix the code style of free_xenballooned_pages' prototype.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.
The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>