Improve mb_free_blocks speed by clearing entire range at once instead
of iterating over each bit. Freeing block-by-block also makes buddy
bitmap subtree flip twice making most of the work a no-op. Very few
bits in buddy bitmap require change, e.g. freeing entire group is a 1
bit flip only. As a result, releasing blocks of 60G file now takes
5ms instead of 2.7s. This is especially good for non-preemptive
kernels as there is no rescheduling during release.
Signed-off-by: Andrey Sidorov <qrxd43@motorola.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Values stored in s_freeclusters_counter and s_dirtyclusters_counter
are both in cluster units. Remove the cluster to block conversion
applied to s_freeclusters_counter causing an inflated estimate of
free space because s_dirtyclusters_counter is not similarly
converted. Rename free_blocks and dirty_blocks to better reflect
the units these variables contain to avoid future confusion. This
fix corrects ENOSPC failures for xfstests 127 and 231 on bigalloc
file systems.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We didn't mark hidden quota files with S_NOQUOTA flag and thus quota was
accounted even for quota files. Thus we could recurse back to quota code
when adding new blocks to quota file which can easily deadlock. Mark
hidden quota files properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
existing locking ordering: journal-> i_data_sem, but
ext4_ind_migrate() grab locks in opposite order which may result in
deadlock.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
associated attributes (like i_blocks, i_size, i_flags, ...) from the
specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
typically used to store a boot loader in a secure part of the
filesystem, where it can't be changed by a normal user by accident.
The data blocks of the previous boot loader will be associated with
the given inode.
This usercode program is a simple example of the usage:
int main(int argc, char *argv[])
{
int fd;
int err;
if ( argc != 2 ) {
printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
exit(1);
}
fd = open(argv[1], O_WRONLY);
if ( fd < 0 ) {
perror("open");
exit(1);
}
err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
if ( err < 0 ) {
perror("ioctl");
exit(1);
}
close(fd);
exit(0);
}
[ Modified by Theodore Ts'o to fix a number of bugs in the original code.]
Signed-off-by: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently when inserting extent in ext4_ext_insert_extent() we would
only try to to see if we can append new extent to the found extent. If
we can not, then we proceed with adding new extent into the extent tree,
but then possibly merging it back again.
We can avoid this situation by trying to append and prepend new extent
to the existing ones. However since the new extent can be on either
sides of the existing extent, we have to pick the right extent to try to
append/prepend to.
This patch adds the conditions to pick the right extent to
append/prepend to and adds the actual prepending condition as well. This
will also eliminate the need to use "reserved" block for possibly
growing extent tree.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently when converting extent to initialized we attempt to transfer
initialized block to the left neighbour if possible when certain
criteria are met. However we do not attempt to do the same for the
right neighbor.
This commit adds the possibility to transfer initialized block to the
right neighbour if:
1. We're not converting the whole extent
2. Both extents are stored in the same extent tree node
3. Right neighbor is initialized
4. Right neighbor is logically abutting the current one
5. Right neighbor is physically abutting the current one
6. Right neighbor would not overflow the length limit
This is basically the same logic as with transferring to the left. This
will gain us some performance benefits since it is faster than inserting
extent and then merging it.
It would also prevent some situation in delalloc patch when we might run
out of metadata reservation. This is due to the fact that we would
attempt to split the extent first (possibly allocating new metadata
block) even though we did not counted for that because it can (and will)
be merged again. This commit fix that scenario, because we no longer
need to split the extent in such case.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Currently on many places in ext4 we're using
ext4_get_group_no_and_offset() even though we're only interested in
knowing the block group of the particular block, not the offset within
the block group so we can use more efficient way to compute block
group.
This patch introduces ext4_get_group_number() which computes block
group for a given block much more efficiently. Use this function
instead of ext4_get_group_no_and_offset() everywhere where we're only
interested in knowing the block group.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently in when getting the block group number for a particular
block in ext4_block_in_group() we're using
ext4_get_group_no_and_offset() which uses do_div() to get the block
group and the remainer which is offset within the group.
We don't need all of that in ext4_block_in_group() as we only need to
figure out the group number.
This commit changes ext4_block_in_group() to calculate group number
directly. This shows as a big improvement with regards to cpu
utilization. Measuring fallocate -l 15T on fresh file system with perf
showed that 23% of cpu time was spend in the
ext4_get_group_no_and_offset(). With this change it completely
disappears from the list only bumping the occurrence of
ext4_init_block_bitmap() which is the biggest user of
ext4_block_in_group() by 4%. As the result of this change on my system
the fallocate call was approx. 10% faster.
However since there is '-g' option in mkfs which allow us setting
different groups size (mostly for developers) I've introduced new per
file system flag whether we have a standard block group size or
not. The flag is used to determine whether we can use the bit shift
optimization or not.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
It is incorrect to use list_for_each_entry_safe() for journal callback
traversial because ->next may be removed by other task:
->ext4_mb_free_metadata()
->ext4_mb_free_metadata()
->ext4_journal_callback_del()
This results in the following issue:
WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
[<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
[<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
[<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
[<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
[<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
[<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
[<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
[<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
[<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
[<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
[<ffffffff810ac6be>] kthread+0x10e/0x120
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
This patch fix the issue as follows:
- ext4_journal_commit_callback() make list truly traversial safe
simply by always starting from list_head
- fix race between two ext4_journal_callback_del() and
ext4_journal_callback_try_del()
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.com
The following race is possible:
[kjournald2] other_task
jbd2_journal_commit_transaction()
j_state = T_FINISHED;
spin_unlock(&journal->j_list_lock);
->jbd2_journal_remove_checkpoint()
->jbd2_journal_free_transaction();
->kmem_cache_free(transaction)
->j_commit_callback(journal, transaction);
-> USE_AFTER_FREE
WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
[<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
[<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
[<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
[<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
[<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
[<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
[<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
[<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
[<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
[<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
[<ffffffff810ac6be>] kthread+0x10e/0x120
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk. This makes callback longer and race
window becomes wider.
In order to fix this we should mark transaction as finished only after
callbacks have completed
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
In order to make it simpler to test the code which support
i_blocks/indirect-mapped inodes, support the conversion of inodes
which are less than 12 blocks and which are contained in no more than
a single extent.
The primary intended use of this code is to converting freshly created
zero-length files and empty directories.
Note that the version of chattr in e2fsprogs 1.42.7 and earlier has a
check that prevents the clearing of the extent flag. A simple patch
which allows "chattr -e <file>" to work will be checked into the
e2fsprogs git repository.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Cc: stable@vger.kernel.org
[ Added fixup from Lukáš Czerner which only checks the assertion when
the inode is not new and is not being freed. ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move common code in ext4_ind_truncate() and ext4_ext_truncate() into
ext4_truncate(). This saves over 60 lines of code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move common code in ext4_ind_punch_hole() and ext4_ext_punch_hole()
into ext4_punch_hole(). This saves over 150 lines of code.
This also fixes a potential bug when the punch_hole() code is racing
against indirect-to-extents or extents-to-indirect migation. We are
currently using i_mutex to protect against changes to the inode flag;
specifically, the append-only, immutable, and extents inode flags. So
we need to take i_mutex before deciding whether to use the
extents-specific or indirect-specific punch_hole code.
Also, there was a missing call to ext4_inode_block_unlocked_dio() in
the indirect punch codepath. This was added in commit 02d262dffc
to block DIO readers racing against the punch operation in the
codepath for extent-mapped inodes, but it was missing for
indirect-block mapped inodes. One of the advantages of refactoring
the code is that it makes such oversights much less likely.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The older code was far more complicated than it needed to be because
of how we spliced in the ext4's new multiblock allocator into ext3's
indirect block code. By folding ext4_alloc_blocks() into
ext4_alloc_branch(), we make the code far more understable, shave off
over 130 lines of code and half a kilobyte of compiled object code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
After collapsing the handling of data ordered and data writeback
codepath, ext4_generic_write_end() has only one caller,
ext4_write_end(). So we fold it into ext4_write_end().
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
The only difference between how we handle data=ordered and
data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate
code duplication by factoring out redundant the code paths.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out. So
fix it.
[ Also fix a bug found by Dmitry Monakhov where we were missing
le32_to_cpu() calls in the new indirect punch hole code.
There are a number of other big endian warnings found by static code
analyzers, but we'll wait for the next merge window to fix them all
up. These fixes are designed to be Obviously Correct by code
inspection, and easy to demonstrate that it won't make any
difference (and hence, won't introduce any bugs) on little endian
architectures such as x86. --tytso ]
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Pull slave-dmaengine fixes from Vinod Koul:
"Two fixes for slave-dmaengine.
The first one is for making slave_id value correct for dw_dmac and
the other one fixes the endieness in DT parsing"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dw_dmac: adjust slave_id accordingly to request line base
dmaengine: dw_dma: fix endianess for DT xlate function
Pull media fixes from Mauro Carvalho Chehab:
"For a some fixes for Kernel 3.9:
- subsystem build fix when VIDEO_DEV=y, VIDEO_V4L2=m and I2C=m
- compilation fix for arm multiarch preventing IR_RX51 to be selected
- regression fix at bttv crop logic
- s5p-mfc/m5mols/exynos: a few fixes for cameras on exynos hardware"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] [REGRESSION] bt8xx: Fix too large height in cropcap
[media] fix compilation with both V4L2 and I2C as 'm'
[media] m5mols: Fix bug in stream on handler
[media] s5p-fimc: Do not attempt to disable not enabled media pipeline
[media] s5p-mfc: Fix encoder control 15 issue
[media] s5p-mfc: Fix frame skip bug
[media] s5p-fimc: send valid m2m ctx to fimc_m2m_job_finish
[media] exynos-gsc: send valid m2m ctx to gsc_m2m_job_finish
[media] fimc-lite: Fix the variable type to avoid possible crash
[media] fimc-lite: Initialize 'step' field in fimc_lite_ctrl structure
[media] ir: IR_RX51 only works on OMAP2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJRWHWXAAoJEPfTWPspceCmyGIQANHJlvexkzqkPsxzfA+hKi36
90ramlHmOIGLqxKk8pJLEhAJAEAEmR1sN5FfPBeiI3I7E8RT+vuPHCOCqXhAXgku
5saB294H0OGeaGsw4cxIl4KQFxBwa2PDskFq5irV4AYJd1IMolwUdyELr2wv37g1
d4vJJUeJIUBON47pZjVfV96nQ4utISMjtHLeBmvpeREcmfqn2I1qKyYcEXxDkNeX
DWRIyeJ/UApCxEWbZcxFgaVNVWE/9nGg861HgnuazCu+OiwUVhfMpS+azj/dtl8G
wdZLhokjXZBi9yd70h8mZ9XReIqMbTUP6k4texNrUQXgHaN87OVUiCgbzL5JBfUB
Iq2bmlCkSIUOwxV9qOsv1MfNo9TJTB2ZcOZJH381BAqf/ua1ouGzZu9KLTxmalZi
yIO3oTpifELxgfCV7O/HGEP1jkRTROwpRFjErqPOFx+Jr9vhT+xj/LGZYgAzaVhX
1HCXMtp8xjRBZa7TrHq/FZY2iO4fS3JZNGg0XaIVim8yHiFWfMnGxOg4TSs5rqEy
AyPg3rFVufb7n9zSdRpYfgAg6gYK/pgHZ7OcyFTt44wRrGSWpMlR8TMxJREytbJx
JjKlO2qRuIbBJXnoBS1J3W22Yt8NN/TaaMIoVL4GHD3fUYMbL88NugsjIZ5VKe/N
/sw12PuUld2rTR+FghHV
=u2RH
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20130331' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Alright, this time from 10K up in the air.
Collection of fixes that have been queued up since the merge window
opened, hence postponed until later in the cycle. The pull request
contains:
- A bunch of fixes for the xen blk front/back driver.
- A round of fixes for the new IBM RamSan driver, fixing various
nasty issues.
- Fixes for multiple drives from Wei Yongjun, bad handling of return
values and wrong pointer math.
- A fix for loop properly killing partitions when being detached."
* tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits)
mg_disk: fix error return code in mg_probe()
rsxx: remove unused variable
rsxx: enable error return of rsxx_eeh_save_issued_dmas()
block: removes dynamic allocation on stack
Block: blk-flush: Fixed indent code style
cciss: fix invalid use of sizeof in cciss_find_cfgtables()
loop: cleanup partitions when detaching loop device
loop: fix error return code in loop_add()
mtip32xx: fix error return code in mtip_pci_probe()
xen-blkfront: remove frame list from blk_shadow
xen-blkfront: pre-allocate pages for requests
xen-blkback: don't store dev_bus_addr
xen-blkfront: switch from llist to list
xen-blkback: fix foreach_grant_safe to handle empty lists
xen-blkfront: replace kmalloc and then memcpy with kmemdup
xen-blkback: fix dispatch_rw_block_io() error path
rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas()
Adding in EEH support to the IBM FlashSystem 70/80 device driver
block: IBM RamSan 70/80 error message bug fix.
block: IBM RamSan 70/80 branding changes.
...
This reverts commit 6aa9707099.
Commit 6aa9707099 ("lockdep: check that no locks held at freeze time")
causes problems with NFS root filesystems. The failures were noticed on
OMAP2 and 3 boards during kernel init:
[ BUG: swapper/0/1 still has locks held! ]
3.9.0-rc3-00344-ga937536 #1 Not tainted
-------------------------------------
1 lock held by swapper/0/1:
#0: (&type->s_umount_key#13/1){+.+.+.}, at: [<c011e84c>] sget+0x248/0x574
stack backtrace:
rpc_wait_bit_killable
__wait_on_bit
out_of_line_wait_on_bit
__rpc_execute
rpc_run_task
rpc_call_sync
nfs_proc_get_root
nfs_get_root
nfs_fs_mount_common
nfs_try_mount
nfs_fs_mount
mount_fs
vfs_kern_mount
do_mount
sys_mount
do_mount_root
mount_root
prepare_namespace
kernel_init_freeable
kernel_init
Although the rootfs mounts, the system is unstable. Here's a transcript
from a PM test:
http://www.pwsan.com/omap/testlogs/test_v3.9-rc3/20130317194234/pm/37xxevm/37xxevm_log.txt
Here's what the test log should look like:
http://www.pwsan.com/omap/testlogs/test_v3.8/20130218214403/pm/37xxevm/37xxevm_log.txt
Mailing list discussion is here:
http://lkml.org/lkml/2013/3/4/221
Deal with this for v3.9 by reverting the problem commit, until folks can
figure out the right long-term course of action.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <maciej.rutecki@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ben Chan <benchan@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI target fixes from Nicholas Bellinger:
"This includes the bug-fix for a >= v3.8-rc1 regression specific to
iscsi-target persistent reservation conflict handling (CC'ed to
stable), and a tcm_vhost patch to drop VIRTIO_RING_F_EVENT_IDX usage
so that in-flight qemu vhost-scsi-pci device code can detect the
proper vhost feature bits.
Also, there are two more tcm_vhost patches still being discussed by
MST and Asias for v3.9 that will be required for the in-flight qemu
vhost-scsi-pci device patch to function properly, and that should
(hopefully) be the last target fixes for this round."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
On some hardware configurations we have got the request line with the offset.
The patch introduces convert_slave_id() helper for that cases. The request line
base is came from the driver data provided by the platform_device_id table.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
As reported by Wu Fengguang's build robot tracking sparse warnings, the
dma_spec arguments in the dw_dma_xlate are already byte swapped on
little-endian platforms and must not get swapped again. This code is
currently not used anywhere, but will be used in Linux 3.10 when the
ARM SPEAr platform starts using the generic DMA DT binding.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
The Adam Belay's e-mail address in MAINTAINERS under PNP SUPPORT
is not valid any more and I started to maintain that code in the
meantime as a matter of fact, so list myself as a maintainer of it
along with Bjorn and remove the Adam's entry from it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ceph fix from Sage Weil:
"This fixes a regression introduced during the last merge window when
mapping non-existent images."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: don't zero-fill non-image object requests
A result of ENOENT from a read request for an object that's part of
an rbd image indicates that there is a hole in that portion of the
image. Similarly, a short read for such an object indicates that
the remainder of the read should be interpreted a full read with
zeros filling out the end of the request.
This behavior is not correct for objects that are not backing rbd
image data. Currently rbd_img_obj_request_callback() assumes it
should be done for all objects.
Change rbd_img_obj_request_callback() so it only does this zeroing
for image objects. Encapsulate that special handling in its own
function. Add an assertion that the image object request is a bio
request, since we assume that (and we currently don't support any
other types).
This resolves a problem identified here:
http://tracker.ceph.com/issues/4559
The regression was introduced by bf0d5f503d.
Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
Pull btrfs fixes from Chris Mason:
"We've had a busy two weeks of bug fixing. The biggest patches in here
are some long standing early-enospc problems (Josef) and a very old
race where compression and mmap combine forces to lose writes (me).
I'm fairly sure the mmap bug goes all the way back to the introduction
of the compression code, which is proof that fsx doesn't trigger every
possible mmap corner after all.
I'm sure you'll notice one of these is from this morning, it's a small
and isolated use-after-free fix in our scrub error reporting. I
double checked it here."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: don't drop path when printing out tree errors in scrub
Btrfs: fix wrong return value of btrfs_lookup_csum()
Btrfs: fix wrong reservation of csums
Btrfs: fix double free in the btrfs_qgroup_account_ref()
Btrfs: limit the global reserve to 512mb
Btrfs: hold the ordered operations mutex when waiting on ordered extents
Btrfs: fix space accounting for unlink and rename
Btrfs: fix space leak when we fail to reserve metadata space
Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
Btrfs: fix race between mmap writes and compression
Btrfs: fix memory leak in btrfs_create_tree()
Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
Btrfs: fix missing qgroup reservation before fallocating
Btrfs: handle a bogus chunk tree nicely
Btrfs: update to use fs_state bit
Commit 3e7fc708eb ("ia64 idle: delete pm_idle") in 3.9-rc1 didn't
finish the job, leaving an un-initialized reference to (*idle)().
[ Haven't seen a crash from this - but seems like we are just being
lucky that "idle" is zero so it does get initialized before we jump to
randomland - Len ]
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull arc architecture fixes from Vineet Gupta:
"This includes fix for a serious bug in DMA mapping API, make
allyesconfig wreckage, removal of bogus email-list placeholder in
MAINTAINERS, a typo in ptrace helper code and last remaining changes
for syscall ABI v3 which we are finally starting to transition-to
internally.
The request is late than I intended to - but I was held up with
debugging a timer link list corruption, for which a proposed fix to
generic timer code was sent out to lkml/tglx earlier today."
* 'for-curr' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Fix the typo in event identifier flags used by ptrace
arc: fix dma_address assignment during dma_map_sg()
ARC: Remove SET_PERSONALITY (tracks cross-arch change)
ARC: ABIv3: fork/vfork wrappers not needed in "no-legacy-syscall" ABI
ARC: ABIv3: Print the correct ABI ver
ARC: make allyesconfig build breakages
ARC: MAINTAINERS update for ARC
A user reported a panic where we were panicing somewhere in
tree_backref_for_extent from scrub_print_warning. He only captured the trace
but looking at scrub_print_warning we drop the path right before we mess with
the extent buffer to print out a bunch of stuff, which isn't right. So fix this
by dropping the path after we use the eb if we need to. Thanks,
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch fixes a regression introduced in v3.8-rc1 code where a failed
target_check_reservation() check in target_setup_cmd_from_cdb() was causing
an incorrect SAM_STAT_GOOD status to be returned during a WRITE operation
performed by an unregistered / unreserved iscsi initiator port.
This regression is only effecting iscsi-target due to a special case check
for TCM_RESERVATION_CONFLICT within iscsi_target_erl1.c:iscsit_execute_cmd(),
and was still correctly disallowing WRITE commands from backend submission
for unregistered / unreserved initiator ports, while returning the incorrect
SAM_STAT_GOOD status due to the missing SAM_STAT_RESERVATION_CONFLICT
assignment.
This regression was first introduced with:
commit de103c93af
Author: Christoph Hellwig <hch@lst.de>
Date: Tue Nov 6 12:24:09 2012 -0800
target: pass sense_reason as a return value
Go ahead and re-add the missing SAM_STAT_RESERVATION_CONFLICT assignment
during a target_check_reservation() failure, so that iscsi-target code
sends the correct SCSI status.
All other fabrics using target_submit_cmd_*() with a RESERVATION_CONFLICT
call to transport_generic_request_failure() are not effected by this bug.
Reported-by: Jeff Leung <jleung@curriegrad2004.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds a VHOST_SCSI_FEATURES mask minus VIRTIO_RING_F_EVENT_IDX
so that vhost-scsi-pci userspace will strip this feature bit once
GET_FEATURES reports it as being unsupported on the host.
This is to avoid a bug where ->handle_kicks() are missed when EVENT_IDX
is enabled by default in userspace code.
(mst: Rename to VHOST_SCSI_FEATURES + add comment)
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This reverts commit 1869305009 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").
VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.
Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.
Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here are some USB fixes to resolve issues reported recently, as well as a new
device id for the ftdi_sio driver.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlFUaA0ACgkQMUfUDdst+yk0AACfe9iitBiGERSO4NsyIvypoJ1q
vOgAoKek8fiPmTKrZl18n79oX28qU9x2
=Oee3
-----END PGP SIGNATURE-----
Merge tag 'usb-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are some USB fixes to resolve issues reported recently, as well
as a new device id for the ftdi_sio driver."
* tag 'usb-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: ftdi_sio: Add support for Mitsubishi FX-USB-AW/-BD
usb: Fix compile error by selecting USB_OTG_UTILS
USB: serial: fix hang when opening port
USB: EHCI: fix bug in iTD/siTD DMA pool allocation
xhci: Don't warn on empty ring for suspended devices.
usb: xhci: Fix TRB transfer length macro used for Event TRB.
usb/acpi: binding xhci root hub usb port with ACPI
usb: add find_raw_port_number callback to struct hc_driver()
usb: xhci: fix build warning
Here are some tty/serial driver fixes for 3.9.
The big thing here is the fix for the huge mess we caused renaming the 8250
driver accidentally in the 3.7 kernel release, without realizing that there
were users of the module options that suddenly broke. This is now resolved,
and, to top the injury off, we have a backwards-compatible option for those
users who got used to the new name since 3.7. Ugh, sorry about that.
Other than that, some other minor fixes for issues that have been reported by
users.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlFUdtEACgkQMUfUDdst+ylklACgx47C3qGEXaPiu6yh3W/D/x97
N1kAoMICQC6K1O1Nge3J5uqbNmYDP6Bv
=wO2Z
-----END PGP SIGNATURE-----
Merge tag 'tty-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY/serial fixes from Greg Kroah-Hartman:
"Here are some tty/serial driver fixes for 3.9.
The big thing here is the fix for the huge mess we caused renaming the
8250 driver accidentally in the 3.7 kernel release, without realizing
that there were users of the module options that suddenly broke. This
is now resolved, and, to top the injury off, we have a backwards-
compatible option for those users who got used to the new name since
3.7. Ugh, sorry about that.
Other than that, some other minor fixes for issues that have been
reported by users."
* tag 'tty-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Xilinx: ARM: UART: clear pending irqs before enabling irqs
TTY: 8250, deprecated 8250_core.* options
TTY: 8250, revert module name change
serial: 8250_pci: Add WCH CH352 quirk to avoid Xscale detection
tty: atmel_serial_probe(): index of atmel_ports[] fix
Here are two tiny staging driver fixes to resolve issues that have been
reported.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlFUdV8ACgkQMUfUDdst+ylT7wCg06/tBMtv81D43yyiBBMKo3qy
86gAoNBbunIiInbIOELTxOOdrwZzpANj
=AVoV
-----END PGP SIGNATURE-----
Merge tag 'staging-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg Kroah-Hartman:
"Here are two tiny staging driver fixes to resolve issues that have
been reported."
* tag 'staging-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: comedi: s626: fix continuous acquisition
staging: zcache: fix typo "64_BIT"
Here are two fixes for sysfs that resolve issues that have been found by the
Trinity fuzz tool, causing oopses in sysfs. They both have been in linux-next
for a while to ensure that they do not cause any other problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlFUdHUACgkQMUfUDdst+ykk+ACfWz6U/DW97ibFusDj+Sys1pEt
essAn15ZFy/pT5myhCvxqVH0MHrIftup
=BM+Q
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull sysfs fixes from Greg Kroah-Hartman:
"Here are two fixes for sysfs that resolve issues that have been found
by the Trinity fuzz tool, causing oopses in sysfs. They both have
been in linux-next for a while to ensure that they do not cause any
other problems."
* tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: handle failure path correctly for readdir()
sysfs: fix race between readdir and lseek
Here are some small char/misc driver fixes that resolve issues recently
reported against the 3.9-rc kernels. All have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlFUdOAACgkQMUfUDdst+ykb9QCeKb0WfCxqwPFZDCAbIiyX9AyA
1OMAoJU7WJo1/wpfyyTLr6RuN8E0X0p/
=1+Ic
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg Kroah-Hartman:
"Here are some small char/misc driver fixes that resolve issues
recently reported against the 3.9-rc kernels. All have been in
linux-next for a while."
* tag 'char-misc-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
VMCI: Fix process-to-process DRGAMs.
mei: ME hardware reset needs to be synchronized
mei: add mei_stop function to stop mei device
extcon: max77693: Initialize register of MUIC device to bring up it without platform data
extcon: max77693: Fix bug of wrong pointer when platform data is not used
extcon: max8997: Check the pointer of platform data to protect null pointer error
- Fix for a recent cpufreq regression related to acpi-cpufreq and
suspend/resume from Viresh Kumar.
- cpufreq stats reference counting fix from Viresh Kumar.
- intel_pstate driver fixes from Dirk Brandewie and
Konrad Rzeszutek Wilk.
- New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from
Fabio Valentini.
- ACPI Platform Error Interface (APEI) fix from Chen Gong.
- PCI root bridge hotplug locking fix from Yinghai Lu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRVETOAAoJEKhOf7ml8uNs30kP/3GsKWacHsaIPdhIiHQC3f91
HMLabrW7NE7ldrOoXzj1lTHsIc1TQHm722vyI+aF061HErfkF8Jkdi5rkIai8VMq
IJXe4CtwuuCi0SeKQsV9ymiQanTrgsP/AlGV5x/KM/As8dvAVW/1+Ln/gXAnH0IJ
/Onqf3eA4NBw/1Hjg7AGHGeCmOlDHvcetHF7eX4MaiYZHEwuy/a7jswH4aNOjwgx
GZtbrnwUO6OtDKv6ie//1EbP753VrkHDtK3jzIy2lUA5YyLmr0XOTvy4uQh2n/r7
tVTqsVoNZNA4En0YUspfsWwBruUic3ra9qVTrJqn7Fzymyr+TgyCQQzSUGrOGy2a
wY0vwMAwm1dMwAsZWPhnui6aqvu0bbg0u7sxCZQs8WapdtjxPdD7iIhRk2YU4wOZ
omtejW0thUIwEmHWgBPo9rFvfZmxy9hb044UfhkLI9xBmuTVrDb/HqeVPA767ZoO
k7IVg1DG4Ye6xboCIILfluoUAsc3DvkHpCIvWVujK3pF5j/M9ptt3d8eXDFIzmWD
J6tm9ARkQoUPRAs6751cG1N0nP++ZlErYseU/h6eXoC0rkeC/WbGyxIumii4xJhg
Gs6GGeM8OgQ/7Fat68kA2Z7jriY+MTteLbq1Sl3PBlfdURaceOXkTIVrxXo33Itq
jQiEKa1CbJDi6OBKog8K
=0bjZ
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael J Wysocki:
- Fix for a recent cpufreq regression related to acpi-cpufreq and
suspend/resume from Viresh Kumar.
- cpufreq stats reference counting fix from Viresh Kumar.
- intel_pstate driver fixes from Dirk Brandewie and Konrad Rzeszutek
Wilk.
- New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from Fabio
Valentini.
- ACPI Platform Error Interface (APEI) fix from Chen Gong.
- PCI root bridge hotplug locking fix from Yinghai Lu.
* tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / ACPI: hold acpi_scan_lock during root bus hotplug
ACPI / APEI: fix error status check condition for CPER
ACPI / PM: fix suspend and resume on Sony Vaio VGN-FW21M
cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()
cpufreq: stats: do cpufreq_cpu_put() corresponding to cpufreq_cpu_get()
intel-pstate: Use #defines instead of hard-coded values.
cpufreq / intel_pstate: Fix calculation of current frequency
cpufreq / intel_pstate: Add function to check that all MSRs are valid
Pull crypto fixes from Herbert Xu:
"This removes IPsec ESN support from the talitos/caam drivers since
they were implemented incorrectly, causing interoperability problems
if ESN is used with them."
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "crypto: caam - add IPsec ESN support"
Revert "crypto: talitos - add IPsec ESN support"
* OMAP display fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRVDTpAAoJEPo9qoy8lh716c8P/3FJhLXxG9bWmHHoK+/l4ymg
plAdi6Jz/IgPj2QJ/okhorFSmkEZjJ/HZ1vjEBWhpXapBPWIOB+GbCq0/ErEnxrq
MB35Fikf096Vyl5Iy7nXYM9u6o4IrZZWeNaazqt0quVE+1z0+upBjxfCu/PFOe6+
/zpgBBvWsX/cxaeraZ2esqeqs4UnCGiyd4uNuCmUCHXxwDtNNARQKbZGQvu4sd3Q
ieX7QKVE+vsTneA8PwLlhAl2arbeT84B+1noKJx1V9V+RL45zIKl2HJZIH9RBDMS
3RyBmX10zDGy0cDDoNTH+Wj6+l8l8pZXZbpS5t3KtU+entrNWYrGu3RUaRJfv8Tz
LviBJHSCEbdYYwcJayAm0R4ANSaxZe8yUI80VzFNqVsQgiLckzUTXwUkSRz76vdU
/TPOnCfrwBjDKKgvWTYN59/QKZj7jRzbbQTxf5pQ9HrfxaAmcGmXmbxr+HMFOSff
xf8bINsXEJsqnDolAuf5eV0+QQPbk9kTZrMl6FPLQNqj1vdsrHoB5gz9ru+y9bjh
V9I0LQya6rXhiijsFt/lC1/eA08jXGXl/F/InZAejQKTLpatjMzt190aZ6QXT7YY
TxOdSgf11UmllASNE4WJbo7jGXc8+4BNw76u2E/RedxJ+j/rIIsCB29w+6pUV9+c
YbKtCg5gnfQ41iaU3/Wu
=CIGs
-----END PGP SIGNATURE-----
Merge tag 'fbdev-fixes-3.9-rc4' of git://gitorious.org/linux-omap-dss2/linux
Pull fbdev fixes from Tomi Valkeinen:
"Since Florian is still away/inactive, I volunteered to collect fbdev
fixes for 3.9 and changes for 3.10. I didn't receive any other fbdev
fixes than OMAP yet, but I didn't want to delay this further as
there's a compilation fix for OMAP1. So there could be still some
fbdev fixes on the way a bit later.
This contains:
- Fix OMAP1 compilation
- OMAP display fixes"
* tag 'fbdev-fixes-3.9-rc4' of git://gitorious.org/linux-omap-dss2/linux:
omapdss: features: fix supported outputs for OMAP4
OMAPDSS: tpo-td043 panel: fix data passing between SPI/DSS parts
omapfb: fix broken build on OMAP1