systemd/docs/BLOCK_DEVICE_LOCKING.md

---
title: Locking Block Device Access
category: Interfaces
layout: default
SPDX-License-Identifier: LGPL-2.1-or-later
---

# Locking Block Device Access

*TL;DR: Use BSD file locks
[(`flock(2)`)](https://man7.org/linux/man-pages/man2/flock.2.html) on block
device nodes to synchronize access for partitioning and file system formatting
tools.*

`systemd-udevd` probes all block devices showing up for file system superblock
and partition table information (utilizing `libblkid`). If another program
concurrently modifies a superblock or partition table this probing might be
affected, which is bad in itself, but also might in turn result in undesired
effects in programs subscribing to `udev` events.

Applications manipulating a block device can temporarily stop `systemd-udevd`
from processing rules on it — and thus bar it from probing the device — by
taking a BSD file lock on the block device node. Specifically, whenever
`systemd-udevd` starts processing a block device it takes a `LOCK_SH|LOCK_NB`
lock using [`flock(2)`](https://man7.org/linux/man-pages/man2/flock.2.html) on
the main block device (i.e. never on any partition block device, but on the
device the partition belongs to). If this lock cannot be taken (i.e. `flock()`
returns `EAGAIN`), it refrains from processing the device. If it manages to take
the lock it is kept for the entire time the device is processed.

Note that `systemd-udevd` also watches all block device nodes it manages for
`inotify()` `IN_CLOSE_WRITE` events: whenever such an event is seen, this is
used as trigger to re-run the rule-set for the device.

These two concepts allow tools such as disk partitioners or file system
formatting tools to safely and easily take exclusive ownership of a block
device while operating: before starting work on the block device, they should
take an `LOCK_EX` lock on it. This has two effects: first of all, in case
`systemd-udevd` is still processing the device the tool will wait for it to
finish. Second, after the lock is taken, it can be sure that `systemd-udevd`
will refrain from processing the block device, and thus all other client
applications subscribed to it won't get device notifications from potentially
half-written data either. After the operation is complete the
partitioner/formatter can simply close the device node. This has two effects:
it implicitly releases the lock, so that `systemd-udevd` can process events on
the device node again. Secondly, it results an `IN_CLOSE_WRITE` event, which
causes `systemd-udevd` to immediately re-process the device — seeing all
changes the tool made — and notify subscribed clients about it.

Ideally, `systemd-udevd` would explicitly watch block devices for `LOCK_EX`
locks being released. Such monitoring is not supported on Linux however, which
is why it watches for `IN_CLOSE_WRITE` instead, i.e. for `close()` calls to
writable file descriptors referring to the block device. In almost all cases,
the difference between these two events does not matter much, as any locks
taken are implicitly released by `close()`. However, it should be noted that if
an application unlocks a device after completing its work without closing it,
i.e. while keeping the file descriptor open for further, longer time, then
`systemd-udevd` will not notice this and not retrigger and thus reprobe the
device.

Besides synchronizing block device access between `systemd-udevd` and such
tools this scheme may also be used to synchronize access between those tools
themselves. However, do note that `flock()` locks are advisory only. This means
if one tool honours this scheme and another tool does not, they will of course
not be synchronized properly, and might interfere with each other's work.

Note that the file locks follow the usual access semantics of BSD locks: since
`systemd-udevd` never writes to such block devices it only takes a `LOCK_SH`
*shared* lock. A program intending to make changes to the block device should
take a `LOCK_EX` *exclusive* lock instead. For further details, see the
`flock(2)` man page.

And please keep in mind: BSD file locks (`flock()`) and POSIX file locks
(`lockf()`, `F_SETLK`, …) are different concepts, and in their effect
orthogonal. The scheme discussed above uses the former and not the latter,
because these types of locks more closely match the required semantics.

If multiple devices are to be locked at the same time (for example in order to
format a RAID file system), the devices should be locked in the order of the
the device nodes' major numbers (primary ordering key, ascending) and minor
numbers (secondary ordering key, ditto), in order to avoid ABBA locking issues
between subsystems.

Note that the locks should only be taken while the device is repartitioned,
file systems formatted or `dd`'ed in, and similar cases that
apply/remove/change superblocks/partition information. It should not be held
during normal operation, i.e. while file systems on it are mounted for
application use.

The [`udevadm
lock`](https://www.freedesktop.org/software/systemd/man/udevadm.html) command
is provided to lock block devices following this scheme from the command line,
for the use in scripts and similar. (Note though that it's typically preferable
to use native support for block device locking in tools where that's
available.)

Summarizing: it is recommended to take `LOCK_EX` BSD file locks when
manipulating block devices in all tools that change file system block devices
(`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after
opening the node.

# Example of Locking The Whole Disk

The following is an example to leverage `libsystemd` infrastructure to get the whole disk of a block device and take a BSD lock on it.

## Compile and Execute
**Note that this example requires `libsystemd` version 251 or newer.**

Place the code in a source file, e.g. `take_BSD_lock.c` and run the following commands:
```
$ gcc -o take_BSD_lock -lsystemd take_BSD_lock.c

$ ./take_BSD_lock /dev/sda1
Successfully took a BSD lock: /dev/sda

$ flock -x /dev/sda ./take_BSD_lock /dev/sda1
Failed to take a BSD lock on /dev/sda: Resource temporarily unavailable
```

## Code
```c
/* SPDX-License-Identifier: MIT-0 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/file.h>
#include <systemd/sd-device.h>
#include <unistd.h>

static inline void closep(int *fd) {
    if (*fd >= 0)
        close(*fd);
}

/**
 * lock_whole_disk_from_devname
 * @devname: devname of a block device, e.g., /dev/sda or /dev/sda1
 * @open_flags: the flags to open the device, e.g., O_RDONLY|O_CLOEXEC|O_NONBLOCK|O_NOCTTY
 * @flock_operation: the operation to call flock, e.g., LOCK_EX|LOCK_NB
 *
 * given the devname of a block device, take a BSD lock of the whole disk
 *
 * Returns: negative errno value on error, or non-negative fd if the lock was taken successfully.
 **/
int lock_whole_disk_from_devname(const char *devname, int open_flags, int flock_operation) {
    __attribute__((cleanup(sd_device_unrefp))) sd_device *dev = NULL;
    sd_device *whole_dev;
    const char *whole_disk_devname, *subsystem, *devtype;
    int r;

    // create a sd_device instance from devname
    r = sd_device_new_from_devname(&dev, devname);
    if (r < 0) {
        errno = -r;
        fprintf(stderr, "Failed to create sd_device: %m\n");
        return r;
    }

    // if the subsystem of dev is block, but its devtype is not disk, find its parent
    r = sd_device_get_subsystem(dev, &subsystem);
    if (r < 0) {
        errno = -r;
        fprintf(stderr, "Failed to get the subsystem: %m\n");
        return r;
    }
    if (strcmp(subsystem, "block") != 0) {
        fprintf(stderr, "%s is not a block device, refusing.\n", devname);
        return -EINVAL;
    }

    r = sd_device_get_devtype(dev, &devtype);
    if (r < 0) {
        errno = -r;
        fprintf(stderr, "Failed to get the devtype: %m\n");
        return r;
    }
    if (strcmp(devtype, "disk") == 0)
        whole_dev = dev;
    else {
        r = sd_device_get_parent_with_subsystem_devtype(dev, "block", "disk", &whole_dev);
        if (r < 0) {
            errno = -r;
            fprintf(stderr, "Failed to get the parent device: %m\n");
            return r;
        }
    }

    // open the whole disk device node
    __attribute__((cleanup(closep))) int fd = sd_device_open(whole_dev, open_flags);
    if (fd < 0) {
        errno = -fd;
        fprintf(stderr, "Failed to open the device: %m\n");
        return fd;
    }

    // get the whole disk devname
    r = sd_device_get_devname(whole_dev, &whole_disk_devname);
    if (r < 0) {
        errno = -r;
        fprintf(stderr, "Failed to get the whole disk name: %m\n");
        return r;
    }

    // take a BSD lock of the whole disk device node
    if (flock(fd, flock_operation) < 0) {
        r = -errno;
        fprintf(stderr, "Failed to take a BSD lock on %s: %m\n", whole_disk_devname);
        return r;
    }

    printf("Successfully took a BSD lock: %s\n", whole_disk_devname);

    // take the fd to avoid automatic cleanup
    int ret_fd = fd;
    fd = -EBADF;
    return ret_fd;
}

int main(int argc, char **argv) {
    if (argc != 2) {
        fprintf(stderr, "Invalid number of parameters.\n");
        return EXIT_FAILURE;
    }

    // try to take an exclusive and nonblocking BSD lock
    __attribute__((cleanup(closep))) int fd =
        lock_whole_disk_from_devname(
            argv[1],
            O_RDONLY|O_CLOEXEC|O_NONBLOCK|O_NOCTTY,
            LOCK_EX|LOCK_NB);

    if (fd < 0)
        return EXIT_FAILURE;

    /**
     * The device is now locked until the return below.
     * Now you can safely manipulate the block device.
     **/

    return EXIT_SUCCESS;
}
```
docs: add a "front matter" snippet to our markdown pages It turns out Jekyll (the engine behind GitHub Pages) requires that pages include a "Front Matter" snippet of YAML at the top for proper rendering. Omitting it will still render the pages, but including it opens up new possibilities, such as using a {% for %} loop to generate index.md instead of requiring a separate script. I'm hoping this will also fix the issue with some of the pages (notably CODE_OF_CONDUCT.html) not being available under systemd.io Tested locally by rendering the website with Jekyll. Before this change, the .md files were kept unchanged (so not sure how that even works?!), after this commit, proper .html files were generated from it. 2019-01-03 06:16:34 +08:00			`---`
			`title: Locking Block Device Access`
docs: place all our markdown docs in rough categories 2019-12-11 17:49:28 +08:00			`category: Interfaces`
docs: make it pretty Add custom Jekyll theme, logo, webfont and .gitignore FIXME: the markdown files have some H1 headers which need to be replaced with H2 2019-12-12 00:01:46 +08:00			`layout: default`
docs: add spdx tags to all .md files I have no idea if this is going to cause rendering problems, and it is fairly hard to check. So let's just merge this, and if it github markdown processor doesn't like it, revert. 2021-09-14 22:05:21 +08:00			`SPDX-License-Identifier: LGPL-2.1-or-later`
docs: add a "front matter" snippet to our markdown pages It turns out Jekyll (the engine behind GitHub Pages) requires that pages include a "Front Matter" snippet of YAML at the top for proper rendering. Omitting it will still render the pages, but including it opens up new possibilities, such as using a {% for %} loop to generate index.md instead of requiring a separate script. I'm hoping this will also fix the issue with some of the pages (notably CODE_OF_CONDUCT.html) not being available under systemd.io Tested locally by rendering the website with Jekyll. Before this change, the .md files were kept unchanged (so not sure how that even works?!), after this commit, proper .html files were generated from it. 2019-01-03 06:16:34 +08:00			`---`

docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00			`# Locking Block Device Access`

			`*TL;DR: Use BSD file locks`
Use https for man7.org 2022-06-28 22:05:31 +08:00			[(`flock(2)`)](https://man7.org/linux/man-pages/man2/flock.2.html) on block
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00			`device nodes to synchronize access for partitioning and file system formatting`
			`tools.*`

			`systemd-udevd` probes all block devices showing up for file system superblock
			and partition table information (utilizing `libblkid`). If another program
			`concurrently modifies a superblock or partition table this probing might be`
			`affected, which is bad in itself, but also might in turn result in undesired`
			effects in programs subscribing to `udev` events.

			Applications manipulating a block device can temporarily stop `systemd-udevd`
			`from processing rules on it — and thus bar it from probing the device — by`
			`taking a BSD file lock on the block device node. Specifically, whenever`
			`systemd-udevd` starts processing a block device it takes a `LOCK_SH\|LOCK_NB`
Use https for man7.org 2022-06-28 22:05:31 +08:00			lock using [`flock(2)`](https://man7.org/linux/man-pages/man2/flock.2.html) on
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00			`the main block device (i.e. never on any partition block device, but on the`
			device the partition belongs to). If this lock cannot be taken (i.e. `flock()`
doc: fix error code 2022-03-13 17:34:39 +08:00			returns `EAGAIN`), it refrains from processing the device. If it manages to take
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00			`the lock it is kept for the entire time the device is processed.`

			Note that `systemd-udevd` also watches all block device nodes it manages for
docs: clarify that udev watches for IN_CLOSE_WRITE (and not IN_CLOSE) Also, while we are at it, explain that udev won't reprobe if users just release the lock, they have to close the block device too. 2020-10-09 22:10:40 +08:00			`inotify()` `IN_CLOSE_WRITE` events: whenever such an event is seen, this is
			`used as trigger to re-run the rule-set for the device.`
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00
			`These two concepts allow tools such as disk partitioners or file system`
			`formatting tools to safely and easily take exclusive ownership of a block`
			`device while operating: before starting work on the block device, they should`
			take an `LOCK_EX` lock on it. This has two effects: first of all, in case
			`systemd-udevd` is still processing the device the tool will wait for it to
docs: clarify that udev watches for IN_CLOSE_WRITE (and not IN_CLOSE) Also, while we are at it, explain that udev won't reprobe if users just release the lock, they have to close the block device too. 2020-10-09 22:10:40 +08:00			finish. Second, after the lock is taken, it can be sure that `systemd-udevd`
			`will refrain from processing the block device, and thus all other client`
			`applications subscribed to it won't get device notifications from potentially`
			`half-written data either. After the operation is complete the`
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00			`partitioner/formatter can simply close the device node. This has two effects:`
			it implicitly releases the lock, so that `systemd-udevd` can process events on
docs: clarify that udev watches for IN_CLOSE_WRITE (and not IN_CLOSE) Also, while we are at it, explain that udev won't reprobe if users just release the lock, they have to close the block device too. 2020-10-09 22:10:40 +08:00			the device node again. Secondly, it results an `IN_CLOSE_WRITE` event, which
			causes `systemd-udevd` to immediately re-process the device — seeing all
			`changes the tool made — and notify subscribed clients about it.`

			Ideally, `systemd-udevd` would explicitly watch block devices for `LOCK_EX`
			`locks being released. Such monitoring is not supported on Linux however, which`
			is why it watches for `IN_CLOSE_WRITE` instead, i.e. for `close()` calls to
			`writable file descriptors referring to the block device. In almost all cases,`
			`the difference between these two events does not matter much, as any locks`
			taken are implicitly released by `close()`. However, it should be noted that if
			`an application unlocks a device after completing its work without closing it,`
			`i.e. while keeping the file descriptor open for further, longer time, then`
			`systemd-udevd` will not notice this and not retrigger and thus reprobe the
			`device.`
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00
			Besides synchronizing block device access between `systemd-udevd` and such
			`tools this scheme may also be used to synchronize access between those tools`
			themselves. However, do note that `flock()` locks are advisory only. This means
			`if one tool honours this scheme and another tool does not, they will of course`
			`not be synchronized properly, and might interfere with each other's work.`

			`Note that the file locks follow the usual access semantics of BSD locks: since`
			`systemd-udevd` never writes to such block devices it only takes a `LOCK_SH`
			`shared lock. A program intending to make changes to the block device should`
			take a `LOCK_EX` exclusive lock instead. For further details, see the
			`flock(2)` man page.

			And please keep in mind: BSD file locks (`flock()`) and POSIX file locks
			(`lockf()`, `F_SETLK`, …) are different concepts, and in their effect
			`orthogonal. The scheme discussed above uses the former and not the latter,`
docs: fix typo 2018-11-29 18:17:36 +08:00			`because these types of locks more closely match the required semantics.`
docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00
man: document new udevadm lock tool 2022-03-28 21:10:56 +08:00			`If multiple devices are to be locked at the same time (for example in order to`
			`format a RAID file system), the devices should be locked in the order of the`
			`the device nodes' major numbers (primary ordering key, ascending) and minor`
			`numbers (secondary ordering key, ditto), in order to avoid ABBA locking issues`
			`between subsystems.`

			`Note that the locks should only be taken while the device is repartitioned,`
			file systems formatted or `dd`'ed in, and similar cases that
			`apply/remove/change superblocks/partition information. It should not be held`
			`during normal operation, i.e. while file systems on it are mounted for`
			`application use.`

			The [`udevadm
			lock`](https://www.freedesktop.org/software/systemd/man/udevadm.html) command
			`is provided to lock block devices following this scheme from the command line,`
			`for the use in scripts and similar. (Note though that it's typically preferable`
			`to use native support for block device locking in tools where that's`
			`available.)`

docs: add brief docs explaing udev's flock() block device node synchronization 2018-11-29 04:26:36 +08:00			Summarizing: it is recommended to take `LOCK_EX` BSD file locks when
			`manipulating block devices in all tools that change file system block devices`
			(`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after
			`opening the node.`
doc: add an example code to lock the whole disk add an example to leverage `libsystemd` infrastructure to get the whole disk of a block device and take BSD lock on it #25046 2022-11-24 21:13:17 +08:00
			`# Example of Locking The Whole Disk`

			The following is an example to leverage `libsystemd` infrastructure to get the whole disk of a block device and take a BSD lock on it.

			`## Compile and Execute`
			Note that this example requires `libsystemd` version 251 or newer.

			Place the code in a source file, e.g. `take_BSD_lock.c` and run the following commands:
			```
			`$ gcc -o take_BSD_lock -lsystemd take_BSD_lock.c`

			`$ ./take_BSD_lock /dev/sda1`
			`Successfully took a BSD lock: /dev/sda`

			`$ flock -x /dev/sda ./take_BSD_lock /dev/sda1`
			`Failed to take a BSD lock on /dev/sda: Resource temporarily unavailable`
			```

			`## Code`
doc: add language decorator on the code block Add `c` decorator on the code block for applying syntax highlighting. 2022-12-14 16:27:50 +08:00			```c
doc: add an example code to lock the whole disk add an example to leverage `libsystemd` infrastructure to get the whole disk of a block device and take BSD lock on it #25046 2022-11-24 21:13:17 +08:00			`/* SPDX-License-Identifier: MIT-0 */`

			`#include <stdio.h>`
			`#include <stdlib.h>`
			`#include <string.h>`
			`#include <sys/file.h>`
			`#include <systemd/sd-device.h>`
			`#include <unistd.h>`

			`static inline void closep(int *fd) {`
			`if (*fd >= 0)`
			`close(*fd);`
			`}`

			`/**`
			`* lock_whole_disk_from_devname`
			`* @devname: devname of a block device, e.g., /dev/sda or /dev/sda1`
			`* @open_flags: the flags to open the device, e.g., O_RDONLY\|O_CLOEXEC\|O_NONBLOCK\|O_NOCTTY`
			`* @flock_operation: the operation to call flock, e.g., LOCK_EX\|LOCK_NB`
			`*`
			`* given the devname of a block device, take a BSD lock of the whole disk`
			`*`
			`* Returns: negative errno value on error, or non-negative fd if the lock was taken successfully.`
			`**/`
			`int lock_whole_disk_from_devname(const char *devname, int open_flags, int flock_operation) {`
			`__attribute__((cleanup(sd_device_unrefp))) sd_device *dev = NULL;`
			`sd_device *whole_dev;`
			`const char whole_disk_devname, subsystem, *devtype;`
			`int r;`

			`// create a sd_device instance from devname`
			`r = sd_device_new_from_devname(&dev, devname);`
			`if (r < 0) {`
			`errno = -r;`
			`fprintf(stderr, "Failed to create sd_device: %m\n");`
			`return r;`
			`}`

			`// if the subsystem of dev is block, but its devtype is not disk, find its parent`
			`r = sd_device_get_subsystem(dev, &subsystem);`
			`if (r < 0) {`
			`errno = -r;`
			`fprintf(stderr, "Failed to get the subsystem: %m\n");`
			`return r;`
			`}`
			`if (strcmp(subsystem, "block") != 0) {`
			`fprintf(stderr, "%s is not a block device, refusing.\n", devname);`
			`return -EINVAL;`
			`}`

			`r = sd_device_get_devtype(dev, &devtype);`
			`if (r < 0) {`
			`errno = -r;`
			`fprintf(stderr, "Failed to get the devtype: %m\n");`
			`return r;`
			`}`
			`if (strcmp(devtype, "disk") == 0)`
			`whole_dev = dev;`
			`else {`
			`r = sd_device_get_parent_with_subsystem_devtype(dev, "block", "disk", &whole_dev);`
			`if (r < 0) {`
			`errno = -r;`
			`fprintf(stderr, "Failed to get the parent device: %m\n");`
			`return r;`
			`}`
			`}`

			`// open the whole disk device node`
			`__attribute__((cleanup(closep))) int fd = sd_device_open(whole_dev, open_flags);`
			`if (fd < 0) {`
			`errno = -fd;`
			`fprintf(stderr, "Failed to open the device: %m\n");`
			`return fd;`
			`}`

			`// get the whole disk devname`
			`r = sd_device_get_devname(whole_dev, &whole_disk_devname);`
			`if (r < 0) {`
			`errno = -r;`
			`fprintf(stderr, "Failed to get the whole disk name: %m\n");`
			`return r;`
			`}`

			`// take a BSD lock of the whole disk device node`
			`if (flock(fd, flock_operation) < 0) {`
			`r = -errno;`
			`fprintf(stderr, "Failed to take a BSD lock on %s: %m\n", whole_disk_devname);`
			`return r;`
			`}`

			`printf("Successfully took a BSD lock: %s\n", whole_disk_devname);`

			`// take the fd to avoid automatic cleanup`
			`int ret_fd = fd;`
tree-wide: use -EBADF for fd initialization -1 was used everywhere, but -EBADF or -EBADFD started being used in various places. Let's make things consistent in the new style. Note that there are two candidates: EBADF 9 Bad file descriptor EBADFD 77 File descriptor in bad state Since we're initializating the fd, we're just assigning a value that means "no fd yet", so it's just a bad file descriptor, and the first errno fits better. If instead we had a valid file descriptor that became invalid because of some operation or state change, the other errno would fit better. In some places, initialization is dropped if unnecessary. 2022-12-19 20:07:42 +08:00			`fd = -EBADF;`
doc: add an example code to lock the whole disk add an example to leverage `libsystemd` infrastructure to get the whole disk of a block device and take BSD lock on it #25046 2022-11-24 21:13:17 +08:00			`return ret_fd;`
			`}`

			`int main(int argc, char **argv) {`
			`if (argc != 2) {`
			`fprintf(stderr, "Invalid number of parameters.\n");`
			`return EXIT_FAILURE;`
			`}`

			`// try to take an exclusive and nonblocking BSD lock`
			`__attribute__((cleanup(closep))) int fd =`
			`lock_whole_disk_from_devname(`
			`argv[1],`
			`O_RDONLY\|O_CLOEXEC\|O_NONBLOCK\|O_NOCTTY,`
			`LOCK_EX\|LOCK_NB);`

			`if (fd < 0)`
			`return EXIT_FAILURE;`

			`/**`
			`* The device is now locked until the return below.`
			`* Now you can safely manipulate the block device.`
			`**/`

			`return EXIT_SUCCESS;`
			`}`
			```