mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-11 12:28:41 +08:00
Merge branch 'mauro' into docs-next
Mauro sez: There are lots of plain text documents under Documentation/filesystems. Manually convert several of those to ReST and add them to the index file. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
commit
0150aedda0
@ -1,7 +1,10 @@
|
||||
v9fs: Plan 9 Resource Sharing for Linux
|
||||
=======================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
ABOUT
|
||||
=======================================
|
||||
v9fs: Plan 9 Resource Sharing for Linux
|
||||
=======================================
|
||||
|
||||
About
|
||||
=====
|
||||
|
||||
v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
|
||||
@ -14,32 +17,34 @@ and Maya Gokhale. Additional development by Greg Watson
|
||||
|
||||
The best detailed explanation of the Linux implementation and applications of
|
||||
the 9p client is available in the form of a USENIX paper:
|
||||
|
||||
http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
|
||||
|
||||
Other applications are described in the following papers:
|
||||
* XCPU & Clustering
|
||||
http://xcpu.org/papers/xcpu-talk.pdf
|
||||
* KVMFS: control file system for KVM
|
||||
http://xcpu.org/papers/kvmfs.pdf
|
||||
* CellFS: A New Programming Model for the Cell BE
|
||||
http://xcpu.org/papers/cellfs-talk.pdf
|
||||
* PROSE I/O: Using 9p to enable Application Partitions
|
||||
http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
|
||||
* VirtFS: A Virtualization Aware File System pass-through
|
||||
http://goo.gl/3WPDg
|
||||
|
||||
USAGE
|
||||
* XCPU & Clustering
|
||||
http://xcpu.org/papers/xcpu-talk.pdf
|
||||
* KVMFS: control file system for KVM
|
||||
http://xcpu.org/papers/kvmfs.pdf
|
||||
* CellFS: A New Programming Model for the Cell BE
|
||||
http://xcpu.org/papers/cellfs-talk.pdf
|
||||
* PROSE I/O: Using 9p to enable Application Partitions
|
||||
http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
|
||||
* VirtFS: A Virtualization Aware File System pass-through
|
||||
http://goo.gl/3WPDg
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
For remote file server:
|
||||
For remote file server::
|
||||
|
||||
mount -t 9p 10.10.1.2 /mnt/9
|
||||
|
||||
For Plan 9 From User Space applications (http://swtch.com/plan9)
|
||||
For Plan 9 From User Space applications (http://swtch.com/plan9)::
|
||||
|
||||
mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
|
||||
|
||||
For server running on QEMU host with virtio transport:
|
||||
For server running on QEMU host with virtio transport::
|
||||
|
||||
mount -t 9p -o trans=virtio <mount_tag> /mnt/9
|
||||
|
||||
@ -48,18 +53,22 @@ mount points. Each 9P export is seen by the client as a virtio device with an
|
||||
associated "mount_tag" property. Available mount tags can be
|
||||
seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
|
||||
|
||||
OPTIONS
|
||||
Options
|
||||
=======
|
||||
|
||||
============= ===============================================================
|
||||
trans=name select an alternative transport. Valid options are
|
||||
currently:
|
||||
unix - specifying a named pipe mount point
|
||||
tcp - specifying a normal TCP/IP connection
|
||||
fd - used passed file descriptors for connection
|
||||
(see rfdno and wfdno)
|
||||
virtio - connect to the next virtio channel available
|
||||
(from QEMU with trans_virtio module)
|
||||
rdma - connect to a specified RDMA channel
|
||||
|
||||
======== ============================================
|
||||
unix specifying a named pipe mount point
|
||||
tcp specifying a normal TCP/IP connection
|
||||
fd used passed file descriptors for connection
|
||||
(see rfdno and wfdno)
|
||||
virtio connect to the next virtio channel available
|
||||
(from QEMU with trans_virtio module)
|
||||
rdma connect to a specified RDMA channel
|
||||
======== ============================================
|
||||
|
||||
uname=name user name to attempt mount as on the remote server. The
|
||||
server may override or ignore this value. Certain user
|
||||
@ -69,28 +78,36 @@ OPTIONS
|
||||
offering several exported file systems.
|
||||
|
||||
cache=mode specifies a caching policy. By default, no caches are used.
|
||||
none = default no cache policy, metadata and data
|
||||
|
||||
none
|
||||
default no cache policy, metadata and data
|
||||
alike are synchronous.
|
||||
loose = no attempts are made at consistency,
|
||||
loose
|
||||
no attempts are made at consistency,
|
||||
intended for exclusive, read-only mounts
|
||||
fscache = use FS-Cache for a persistent, read-only
|
||||
fscache
|
||||
use FS-Cache for a persistent, read-only
|
||||
cache backend.
|
||||
mmap = minimal cache that is only used for read-write
|
||||
mmap
|
||||
minimal cache that is only used for read-write
|
||||
mmap. Northing else is cached, like cache=none
|
||||
|
||||
debug=n specifies debug level. The debug level is a bitmask.
|
||||
0x01 = display verbose error messages
|
||||
0x02 = developer debug (DEBUG_CURRENT)
|
||||
0x04 = display 9p trace
|
||||
0x08 = display VFS trace
|
||||
0x10 = display Marshalling debug
|
||||
0x20 = display RPC debug
|
||||
0x40 = display transport debug
|
||||
0x80 = display allocation debug
|
||||
0x100 = display protocol message debug
|
||||
0x200 = display Fid debug
|
||||
0x400 = display packet debug
|
||||
0x800 = display fscache tracing debug
|
||||
|
||||
===== ================================
|
||||
0x01 display verbose error messages
|
||||
0x02 developer debug (DEBUG_CURRENT)
|
||||
0x04 display 9p trace
|
||||
0x08 display VFS trace
|
||||
0x10 display Marshalling debug
|
||||
0x20 display RPC debug
|
||||
0x40 display transport debug
|
||||
0x80 display allocation debug
|
||||
0x100 display protocol message debug
|
||||
0x200 display Fid debug
|
||||
0x400 display packet debug
|
||||
0x800 display fscache tracing debug
|
||||
===== ================================
|
||||
|
||||
rfdno=n the file descriptor for reading with trans=fd
|
||||
|
||||
@ -103,9 +120,12 @@ OPTIONS
|
||||
noextend force legacy mode (no 9p2000.u or 9p2000.L semantics)
|
||||
|
||||
version=name Select 9P protocol version. Valid options are:
|
||||
9p2000 - Legacy mode (same as noextend)
|
||||
9p2000.u - Use 9P2000.u protocol
|
||||
9p2000.L - Use 9P2000.L protocol
|
||||
|
||||
======== ==============================
|
||||
9p2000 Legacy mode (same as noextend)
|
||||
9p2000.u Use 9P2000.u protocol
|
||||
9p2000.L Use 9P2000.L protocol
|
||||
======== ==============================
|
||||
|
||||
dfltuid attempt to mount as a particular uid
|
||||
|
||||
@ -118,22 +138,27 @@ OPTIONS
|
||||
hosts. This functionality will be expanded in later versions.
|
||||
|
||||
access there are four access modes.
|
||||
user = if a user tries to access a file on v9fs
|
||||
user
|
||||
if a user tries to access a file on v9fs
|
||||
filesystem for the first time, v9fs sends an
|
||||
attach command (Tattach) for that user.
|
||||
This is the default mode.
|
||||
<uid> = allows only user with uid=<uid> to access
|
||||
<uid>
|
||||
allows only user with uid=<uid> to access
|
||||
the files on the mounted filesystem
|
||||
any = v9fs does single attach and performs all
|
||||
any
|
||||
v9fs does single attach and performs all
|
||||
operations as one user
|
||||
client = ACL based access check on the 9p client
|
||||
clien
|
||||
ACL based access check on the 9p client
|
||||
side for access validation
|
||||
|
||||
cachetag cache tag to use the specified persistent cache.
|
||||
cache tags for existing cache sessions can be listed at
|
||||
/sys/fs/9p/caches. (applies only to cache=fscache)
|
||||
============= ===============================================================
|
||||
|
||||
RESOURCES
|
||||
Resources
|
||||
=========
|
||||
|
||||
Protocol specifications are maintained on github:
|
||||
@ -158,4 +183,3 @@ http://plan9.bell-labs.com/plan9
|
||||
|
||||
For information on Plan 9 from User Space (Plan 9 applications and libraries
|
||||
ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
|
||||
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============================
|
||||
Acorn Disc Filing System - ADFS
|
||||
===============================
|
||||
|
||||
Filesystems supported by ADFS
|
||||
-----------------------------
|
||||
|
||||
@ -25,6 +31,7 @@ directory updates, specifically updating the access mode and timestamp.
|
||||
Mount options for ADFS
|
||||
----------------------
|
||||
|
||||
============ ======================================================
|
||||
uid=nnn All files in the partition will be owned by
|
||||
user id nnn. Default 0 (root).
|
||||
gid=nnn All files in the partition will be in group
|
||||
@ -36,22 +43,23 @@ Mount options for ADFS
|
||||
ftsuffix=n When ftsuffix=0, no file type suffix will be applied.
|
||||
When ftsuffix=1, a hexadecimal suffix corresponding to
|
||||
the RISC OS file type will be added. Default 0.
|
||||
============ ======================================================
|
||||
|
||||
Mapping of ADFS permissions to Linux permissions
|
||||
------------------------------------------------
|
||||
|
||||
ADFS permissions consist of the following:
|
||||
|
||||
Owner read
|
||||
Owner write
|
||||
Other read
|
||||
Other write
|
||||
- Owner read
|
||||
- Owner write
|
||||
- Other read
|
||||
- Other write
|
||||
|
||||
(In older versions, an 'execute' permission did exist, but this
|
||||
does not hold the same meaning as the Linux 'execute' permission
|
||||
and is now obsolete).
|
||||
does not hold the same meaning as the Linux 'execute' permission
|
||||
and is now obsolete).
|
||||
|
||||
The mapping is performed as follows:
|
||||
The mapping is performed as follows::
|
||||
|
||||
Owner read -> -r--r--r--
|
||||
Owner write -> --w--w---w
|
||||
@ -66,17 +74,18 @@ Mapping of ADFS permissions to Linux permissions
|
||||
Possible other mode permissions -> ----rwxrwx
|
||||
|
||||
Hence, with the default masks, if a file is owner read/write, and
|
||||
not a UnixExec filetype, then the permissions will be:
|
||||
not a UnixExec filetype, then the permissions will be::
|
||||
|
||||
-rw-------
|
||||
|
||||
However, if the masks were ownmask=0770,othmask=0007, then this would
|
||||
be modified to:
|
||||
be modified to::
|
||||
|
||||
-rw-rw----
|
||||
|
||||
There is no restriction on what you can do with these masks. You may
|
||||
wish that either read bits give read access to the file for all, but
|
||||
keep the default write protection (ownmask=0755,othmask=0577):
|
||||
keep the default write protection (ownmask=0755,othmask=0577)::
|
||||
|
||||
-rw-r--r--
|
||||
|
@ -1,9 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=============================
|
||||
Overview of Amiga Filesystems
|
||||
=============================
|
||||
|
||||
Not all varieties of the Amiga filesystems are supported for reading and
|
||||
writing. The Amiga currently knows six different filesystems:
|
||||
|
||||
============== ===============================================================
|
||||
DOS\0 The old or original filesystem, not really suited for
|
||||
hard disks and normally not used on them, either.
|
||||
Supported read/write.
|
||||
@ -23,6 +27,7 @@ DOS\4 The original filesystem with directory cache. The directory
|
||||
sense on hard disks. Supported read only.
|
||||
|
||||
DOS\5 The Fast File System with directory cache. Supported read only.
|
||||
============== ===============================================================
|
||||
|
||||
All of the above filesystems allow block sizes from 512 to 32K bytes.
|
||||
Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
|
||||
@ -36,14 +41,18 @@ are supported, too.
|
||||
Mount options for the AFFS
|
||||
==========================
|
||||
|
||||
protect If this option is set, the protection bits cannot be altered.
|
||||
protect
|
||||
If this option is set, the protection bits cannot be altered.
|
||||
|
||||
setuid[=uid] This sets the owner of all files and directories in the file
|
||||
setuid[=uid]
|
||||
This sets the owner of all files and directories in the file
|
||||
system to uid or the uid of the current user, respectively.
|
||||
|
||||
setgid[=gid] Same as above, but for gid.
|
||||
setgid[=gid]
|
||||
Same as above, but for gid.
|
||||
|
||||
mode=mode Sets the mode flags to the given (octal) value, regardless
|
||||
mode=mode
|
||||
Sets the mode flags to the given (octal) value, regardless
|
||||
of the original permissions. Directories will get an x
|
||||
permission if the corresponding r bit is set.
|
||||
This is useful since most of the plain AmigaOS files
|
||||
@ -53,33 +62,41 @@ nofilenametruncate
|
||||
The file system will return an error when filename exceeds
|
||||
standard maximum filename length (30 characters).
|
||||
|
||||
reserved=num Sets the number of reserved blocks at the start of the
|
||||
reserved=num
|
||||
Sets the number of reserved blocks at the start of the
|
||||
partition to num. You should never need this option.
|
||||
Default is 2.
|
||||
|
||||
root=block Sets the block number of the root block. This should never
|
||||
root=block
|
||||
Sets the block number of the root block. This should never
|
||||
be necessary.
|
||||
|
||||
bs=blksize Sets the blocksize to blksize. Valid block sizes are 512,
|
||||
bs=blksize
|
||||
Sets the blocksize to blksize. Valid block sizes are 512,
|
||||
1024, 2048 and 4096. Like the root option, this should
|
||||
never be necessary, as the affs can figure it out itself.
|
||||
|
||||
quiet The file system will not return an error for disallowed
|
||||
quiet
|
||||
The file system will not return an error for disallowed
|
||||
mode changes.
|
||||
|
||||
verbose The volume name, file system type and block size will
|
||||
verbose
|
||||
The volume name, file system type and block size will
|
||||
be written to the syslog when the filesystem is mounted.
|
||||
|
||||
mufs The filesystem is really a muFS, also it doesn't
|
||||
mufs
|
||||
The filesystem is really a muFS, also it doesn't
|
||||
identify itself as one. This option is necessary if
|
||||
the filesystem wasn't formatted as muFS, but is used
|
||||
as one.
|
||||
|
||||
prefix=path Path will be prefixed to every absolute path name of
|
||||
prefix=path
|
||||
Path will be prefixed to every absolute path name of
|
||||
symbolic links on an AFFS partition. Default = "/".
|
||||
(See below.)
|
||||
|
||||
volume=name When symbolic links with an absolute path are created
|
||||
volume=name
|
||||
When symbolic links with an absolute path are created
|
||||
on an AFFS partition, name will be prepended as the
|
||||
volume name. Default = "" (empty string).
|
||||
(See below.)
|
||||
@ -119,7 +136,7 @@ The Linux rwxrwxrwx file mode is handled as follows:
|
||||
|
||||
- All other flags (suid, sgid, ...) are ignored and will
|
||||
not be retained.
|
||||
|
||||
|
||||
Newly created files and directories will get the user and group ID
|
||||
of the current user and a mode according to the umask.
|
||||
|
||||
@ -148,11 +165,13 @@ might be "User", "WB" and "Graphics", the mount points /amiga/User,
|
||||
Examples
|
||||
========
|
||||
|
||||
Command line:
|
||||
Command line::
|
||||
|
||||
mount Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
|
||||
mount /dev/sda3 /Amiga -t affs
|
||||
|
||||
/etc/fstab entry:
|
||||
/etc/fstab entry::
|
||||
|
||||
/dev/sdb5 /amiga/Workbench affs noauto,user,exec,verbose 0 0
|
||||
|
||||
IMPORTANT NOTE
|
||||
@ -170,7 +189,8 @@ before booting Windows!
|
||||
|
||||
If the damage is already done, the following should fix the RDB
|
||||
(where <disk> is the device name).
|
||||
DO AT YOUR OWN RISK:
|
||||
|
||||
DO AT YOUR OWN RISK::
|
||||
|
||||
dd if=/dev/<disk> of=rdb.tmp count=1
|
||||
cp rdb.tmp rdb.fixed
|
||||
@ -189,10 +209,14 @@ By default, filenames are truncated to 30 characters without warning.
|
||||
'nofilenametruncate' mount option can change that behavior.
|
||||
|
||||
Case is ignored by the affs in filename matching, but Linux shells
|
||||
do care about the case. Example (with /wb being an affs mounted fs):
|
||||
do care about the case. Example (with /wb being an affs mounted fs)::
|
||||
|
||||
rm /wb/WRONGCASE
|
||||
will remove /mnt/wrongcase, but
|
||||
|
||||
will remove /mnt/wrongcase, but::
|
||||
|
||||
rm /wb/WR*
|
||||
|
||||
will not since the names are matched by the shell.
|
||||
|
||||
The block allocation is designed for hard disk partitions. If more
|
||||
@ -219,4 +243,4 @@ due to an incompatibility with the Amiga floppy controller.
|
||||
|
||||
If you are interested in an Amiga Emulator for Linux, look at
|
||||
|
||||
http://web.archive.org/web/*/http://www.freiburg.linux.de/~uae/
|
||||
http://web.archive.org/web/%2E/http://www.freiburg.linux.de/~uae/
|
@ -1,8 +1,10 @@
|
||||
====================
|
||||
kAFS: AFS FILESYSTEM
|
||||
====================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Contents:
|
||||
====================
|
||||
kAFS: AFS FILESYSTEM
|
||||
====================
|
||||
|
||||
.. Contents:
|
||||
|
||||
- Overview.
|
||||
- Usage.
|
||||
@ -14,8 +16,7 @@ Contents:
|
||||
- The @sys substitution.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
Overview
|
||||
========
|
||||
|
||||
This filesystem provides a fairly simple secure AFS filesystem driver. It is
|
||||
@ -35,35 +36,33 @@ It does not yet support the following AFS features:
|
||||
(*) pioctl() system call.
|
||||
|
||||
|
||||
===========
|
||||
COMPILATION
|
||||
Compilation
|
||||
===========
|
||||
|
||||
The filesystem should be enabled by turning on the kernel configuration
|
||||
options:
|
||||
options::
|
||||
|
||||
CONFIG_AF_RXRPC - The RxRPC protocol transport
|
||||
CONFIG_RXKAD - The RxRPC Kerberos security handler
|
||||
CONFIG_AFS - The AFS filesystem
|
||||
|
||||
Additionally, the following can be turned on to aid debugging:
|
||||
Additionally, the following can be turned on to aid debugging::
|
||||
|
||||
CONFIG_AF_RXRPC_DEBUG - Permit AF_RXRPC debugging to be enabled
|
||||
CONFIG_AFS_DEBUG - Permit AFS debugging to be enabled
|
||||
|
||||
They permit the debugging messages to be turned on dynamically by manipulating
|
||||
the masks in the following files:
|
||||
the masks in the following files::
|
||||
|
||||
/sys/module/af_rxrpc/parameters/debug
|
||||
/sys/module/kafs/parameters/debug
|
||||
|
||||
|
||||
=====
|
||||
USAGE
|
||||
Usage
|
||||
=====
|
||||
|
||||
When inserting the driver modules the root cell must be specified along with a
|
||||
list of volume location server IP addresses:
|
||||
list of volume location server IP addresses::
|
||||
|
||||
modprobe rxrpc
|
||||
modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
|
||||
@ -77,14 +76,14 @@ The second module is the kerberos RxRPC security driver, and the third module
|
||||
is the actual filesystem driver for the AFS filesystem.
|
||||
|
||||
Once the module has been loaded, more modules can be added by the following
|
||||
procedure:
|
||||
procedure::
|
||||
|
||||
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
|
||||
|
||||
Where the parameters to the "add" command are the name of a cell and a list of
|
||||
volume location servers within that cell, with the latter separated by colons.
|
||||
|
||||
Filesystems can be mounted anywhere by commands similar to the following:
|
||||
Filesystems can be mounted anywhere by commands similar to the following::
|
||||
|
||||
mount -t afs "%cambridge.redhat.com:root.afs." /afs
|
||||
mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
|
||||
@ -104,8 +103,7 @@ named volume will be looked up in the cell specified during modprobe.
|
||||
Additional cells can be added through /proc (see later section).
|
||||
|
||||
|
||||
===========
|
||||
MOUNTPOINTS
|
||||
Mountpoints
|
||||
===========
|
||||
|
||||
AFS has a concept of mountpoints. In AFS terms, these are specially formatted
|
||||
@ -123,42 +121,40 @@ culled first. If all are culled, then the requested volume will also be
|
||||
unmounted, otherwise error EBUSY will be returned.
|
||||
|
||||
This can be used by the administrator to attempt to unmount the whole AFS tree
|
||||
mounted on /afs in one go by doing:
|
||||
mounted on /afs in one go by doing::
|
||||
|
||||
umount /afs
|
||||
|
||||
|
||||
============
|
||||
DYNAMIC ROOT
|
||||
Dynamic Root
|
||||
============
|
||||
|
||||
A mount option is available to create a serverless mount that is only usable
|
||||
for dynamic lookup. Creating such a mount can be done by, for example:
|
||||
for dynamic lookup. Creating such a mount can be done by, for example::
|
||||
|
||||
mount -t afs none /afs -o dyn
|
||||
|
||||
This creates a mount that just has an empty directory at the root. Attempting
|
||||
to look up a name in this directory will cause a mountpoint to be created that
|
||||
looks up a cell of the same name, for example:
|
||||
looks up a cell of the same name, for example::
|
||||
|
||||
ls /afs/grand.central.org/
|
||||
|
||||
|
||||
===============
|
||||
PROC FILESYSTEM
|
||||
Proc Filesystem
|
||||
===============
|
||||
|
||||
The AFS modules creates a "/proc/fs/afs/" directory and populates it:
|
||||
|
||||
(*) A "cells" file that lists cells currently known to the afs module and
|
||||
their usage counts:
|
||||
their usage counts::
|
||||
|
||||
[root@andromeda ~]# cat /proc/fs/afs/cells
|
||||
USE NAME
|
||||
3 cambridge.redhat.com
|
||||
|
||||
(*) A directory per cell that contains files that list volume location
|
||||
servers, volumes, and active servers known within that cell.
|
||||
servers, volumes, and active servers known within that cell::
|
||||
|
||||
[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
|
||||
USE ADDR STATE
|
||||
@ -171,8 +167,7 @@ The AFS modules creates a "/proc/fs/afs/" directory and populates it:
|
||||
1 Val 20000000 20000001 20000002 root.afs
|
||||
|
||||
|
||||
=================
|
||||
THE CELL DATABASE
|
||||
The Cell Database
|
||||
=================
|
||||
|
||||
The filesystem maintains an internal database of all the cells it knows and the
|
||||
@ -181,7 +176,7 @@ the system belongs is added to the database when modprobe is performed by the
|
||||
"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
|
||||
the kernel command line.
|
||||
|
||||
Further cells can be added by commands similar to the following:
|
||||
Further cells can be added by commands similar to the following::
|
||||
|
||||
echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
|
||||
echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
|
||||
@ -189,8 +184,7 @@ Further cells can be added by commands similar to the following:
|
||||
No other cell database operations are available at this time.
|
||||
|
||||
|
||||
========
|
||||
SECURITY
|
||||
Security
|
||||
========
|
||||
|
||||
Secure operations are initiated by acquiring a key using the klog program. A
|
||||
@ -198,17 +192,17 @@ very primitive klog program is available at:
|
||||
|
||||
http://people.redhat.com/~dhowells/rxrpc/klog.c
|
||||
|
||||
This should be compiled by:
|
||||
This should be compiled by::
|
||||
|
||||
make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
|
||||
|
||||
And then run as:
|
||||
And then run as::
|
||||
|
||||
./klog
|
||||
|
||||
Assuming it's successful, this adds a key of type RxRPC, named for the service
|
||||
and cell, eg: "afs@<cellname>". This can be viewed with the keyctl program or
|
||||
by cat'ing /proc/keys:
|
||||
by cat'ing /proc/keys::
|
||||
|
||||
[root@andromeda ~]# keyctl show
|
||||
Session Keyring
|
||||
@ -232,20 +226,19 @@ socket), then the operations on the file will be made with key that was used to
|
||||
open the file.
|
||||
|
||||
|
||||
=====================
|
||||
THE @SYS SUBSTITUTION
|
||||
The @sys Substitution
|
||||
=====================
|
||||
|
||||
The list of up to 16 @sys substitutions for the current network namespace can
|
||||
be configured by writing a list to /proc/fs/afs/sysname:
|
||||
be configured by writing a list to /proc/fs/afs/sysname::
|
||||
|
||||
[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname
|
||||
|
||||
or cleared entirely by writing an empty list:
|
||||
or cleared entirely by writing an empty list::
|
||||
|
||||
[root@andromeda ~]# echo >/proc/fs/afs/sysname
|
||||
|
||||
The current list for current network namespace can be retrieved by:
|
||||
The current list for current network namespace can be retrieved by::
|
||||
|
||||
[root@andromeda ~]# cat /proc/fs/afs/sysname
|
||||
foo
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================================================================
|
||||
Miscellaneous Device control operations for the autofs kernel module
|
||||
====================================================================
|
||||
|
||||
@ -36,24 +38,24 @@ For example, there are two types of automount maps, direct (in the kernel
|
||||
module source you will see a third type called an offset, which is just
|
||||
a direct mount in disguise) and indirect.
|
||||
|
||||
Here is a master map with direct and indirect map entries:
|
||||
Here is a master map with direct and indirect map entries::
|
||||
|
||||
/- /etc/auto.direct
|
||||
/test /etc/auto.indirect
|
||||
/- /etc/auto.direct
|
||||
/test /etc/auto.indirect
|
||||
|
||||
and the corresponding map files:
|
||||
and the corresponding map files::
|
||||
|
||||
/etc/auto.direct:
|
||||
/etc/auto.direct:
|
||||
|
||||
/automount/dparse/g6 budgie:/autofs/export1
|
||||
/automount/dparse/g1 shark:/autofs/export1
|
||||
and so on.
|
||||
/automount/dparse/g6 budgie:/autofs/export1
|
||||
/automount/dparse/g1 shark:/autofs/export1
|
||||
and so on.
|
||||
|
||||
/etc/auto.indirect:
|
||||
/etc/auto.indirect::
|
||||
|
||||
g1 shark:/autofs/export1
|
||||
g6 budgie:/autofs/export1
|
||||
and so on.
|
||||
g1 shark:/autofs/export1
|
||||
g6 budgie:/autofs/export1
|
||||
and so on.
|
||||
|
||||
For the above indirect map an autofs file system is mounted on /test and
|
||||
mounts are triggered for each sub-directory key by the inode lookup
|
||||
@ -69,23 +71,23 @@ use the follow_link inode operation to trigger the mount.
|
||||
But, each entry in direct and indirect maps can have offsets (making
|
||||
them multi-mount map entries).
|
||||
|
||||
For example, an indirect mount map entry could also be:
|
||||
For example, an indirect mount map entry could also be::
|
||||
|
||||
g1 \
|
||||
/ shark:/autofs/export5/testing/test \
|
||||
/s1 shark:/autofs/export/testing/test/s1 \
|
||||
/s2 shark:/autofs/export5/testing/test/s2 \
|
||||
/s1/ss1 shark:/autofs/export1 \
|
||||
/s2/ss2 shark:/autofs/export2
|
||||
g1 \
|
||||
/ shark:/autofs/export5/testing/test \
|
||||
/s1 shark:/autofs/export/testing/test/s1 \
|
||||
/s2 shark:/autofs/export5/testing/test/s2 \
|
||||
/s1/ss1 shark:/autofs/export1 \
|
||||
/s2/ss2 shark:/autofs/export2
|
||||
|
||||
and a similarly a direct mount map entry could also be:
|
||||
and a similarly a direct mount map entry could also be::
|
||||
|
||||
/automount/dparse/g1 \
|
||||
/ shark:/autofs/export5/testing/test \
|
||||
/s1 shark:/autofs/export/testing/test/s1 \
|
||||
/s2 shark:/autofs/export5/testing/test/s2 \
|
||||
/s1/ss1 shark:/autofs/export2 \
|
||||
/s2/ss2 shark:/autofs/export2
|
||||
/automount/dparse/g1 \
|
||||
/ shark:/autofs/export5/testing/test \
|
||||
/s1 shark:/autofs/export/testing/test/s1 \
|
||||
/s2 shark:/autofs/export5/testing/test/s2 \
|
||||
/s1/ss1 shark:/autofs/export2 \
|
||||
/s2/ss2 shark:/autofs/export2
|
||||
|
||||
One of the issues with version 4 of autofs was that, when mounting an
|
||||
entry with a large number of offsets, possibly with nesting, we needed
|
||||
@ -170,32 +172,32 @@ autofs Miscellaneous Device mount control interface
|
||||
The control interface is opening a device node, typically /dev/autofs.
|
||||
|
||||
All the ioctls use a common structure to pass the needed parameter
|
||||
information and return operation results:
|
||||
information and return operation results::
|
||||
|
||||
struct autofs_dev_ioctl {
|
||||
__u32 ver_major;
|
||||
__u32 ver_minor;
|
||||
__u32 size; /* total size of data passed in
|
||||
* including this struct */
|
||||
__s32 ioctlfd; /* automount command fd */
|
||||
struct autofs_dev_ioctl {
|
||||
__u32 ver_major;
|
||||
__u32 ver_minor;
|
||||
__u32 size; /* total size of data passed in
|
||||
* including this struct */
|
||||
__s32 ioctlfd; /* automount command fd */
|
||||
|
||||
/* Command parameters */
|
||||
union {
|
||||
struct args_protover protover;
|
||||
struct args_protosubver protosubver;
|
||||
struct args_openmount openmount;
|
||||
struct args_ready ready;
|
||||
struct args_fail fail;
|
||||
struct args_setpipefd setpipefd;
|
||||
struct args_timeout timeout;
|
||||
struct args_requester requester;
|
||||
struct args_expire expire;
|
||||
struct args_askumount askumount;
|
||||
struct args_ismountpoint ismountpoint;
|
||||
};
|
||||
/* Command parameters */
|
||||
union {
|
||||
struct args_protover protover;
|
||||
struct args_protosubver protosubver;
|
||||
struct args_openmount openmount;
|
||||
struct args_ready ready;
|
||||
struct args_fail fail;
|
||||
struct args_setpipefd setpipefd;
|
||||
struct args_timeout timeout;
|
||||
struct args_requester requester;
|
||||
struct args_expire expire;
|
||||
struct args_askumount askumount;
|
||||
struct args_ismountpoint ismountpoint;
|
||||
};
|
||||
|
||||
char path[0];
|
||||
};
|
||||
char path[0];
|
||||
};
|
||||
|
||||
The ioctlfd field is a mount point file descriptor of an autofs mount
|
||||
point. It is returned by the open call and is used by all calls except
|
||||
@ -212,7 +214,7 @@ is used account for the increased structure length when translating the
|
||||
structure sent from user space.
|
||||
|
||||
This structure can be initialized before setting specific fields by using
|
||||
the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *).
|
||||
the void function call init_autofs_dev_ioctl(``struct autofs_dev_ioctl *``).
|
||||
|
||||
All of the ioctls perform a copy of this structure from user space to
|
||||
kernel space and return -EINVAL if the size parameter is smaller than
|
@ -1,48 +1,54 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
BeOS filesystem for Linux
|
||||
=========================
|
||||
|
||||
Document last updated: Dec 6, 2001
|
||||
|
||||
WARNING
|
||||
Warning
|
||||
=======
|
||||
Make sure you understand that this is alpha software. This means that the
|
||||
implementation is neither complete nor well-tested.
|
||||
implementation is neither complete nor well-tested.
|
||||
|
||||
I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
|
||||
|
||||
LICENSE
|
||||
=====
|
||||
This software is covered by the GNU General Public License.
|
||||
License
|
||||
=======
|
||||
This software is covered by the GNU General Public License.
|
||||
See the file COPYING for the complete text of the license.
|
||||
Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
|
||||
|
||||
AUTHOR
|
||||
=====
|
||||
Author
|
||||
======
|
||||
The largest part of the code written by Will Dyson <will_dyson@pobox.com>
|
||||
He has been working on the code since Aug 13, 2001. See the changelog for
|
||||
details.
|
||||
|
||||
Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
|
||||
|
||||
His original code can still be found at:
|
||||
<http://hp.vector.co.jp/authors/VA008030/bfs/>
|
||||
|
||||
Does anyone know of a more current email address for Makoto? He doesn't
|
||||
respond to the address given above...
|
||||
|
||||
This filesystem doesn't have a maintainer.
|
||||
|
||||
WHAT IS THIS DRIVER?
|
||||
==================
|
||||
This module implements the native filesystem of BeOS http://www.beincorporated.com/
|
||||
What is this Driver?
|
||||
====================
|
||||
This module implements the native filesystem of BeOS http://www.beincorporated.com/
|
||||
for the linux 2.4.1 and later kernels. Currently it is a read-only
|
||||
implementation.
|
||||
|
||||
Which is it, BFS or BEFS?
|
||||
================
|
||||
Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".
|
||||
=========================
|
||||
Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".
|
||||
But Unixware Boot Filesystem is called bfs, too. And they are already in
|
||||
the kernel. Because of this naming conflict, on Linux the BeOS
|
||||
filesystem is called befs.
|
||||
|
||||
HOW TO INSTALL
|
||||
How to Install
|
||||
==============
|
||||
step 1. Install the BeFS patch into the source code tree of linux.
|
||||
|
||||
@ -54,16 +60,16 @@ is called patch-befs-xxx, you would do the following:
|
||||
patch -p1 < /path/to/patch-befs-xxx
|
||||
|
||||
if the patching step fails (i.e. there are rejected hunks), you can try to
|
||||
figure it out yourself (it shouldn't be hard), or mail the maintainer
|
||||
figure it out yourself (it shouldn't be hard), or mail the maintainer
|
||||
(Will Dyson <will_dyson@pobox.com>) for help.
|
||||
|
||||
step 2. Configuration & make kernel
|
||||
|
||||
The linux kernel has many compile-time options. Most of them are beyond the
|
||||
scope of this document. I suggest the Kernel-HOWTO document as a good general
|
||||
reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html
|
||||
reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html
|
||||
|
||||
However, to use the BeFS module, you must enable it at configure time.
|
||||
However, to use the BeFS module, you must enable it at configure time::
|
||||
|
||||
cd /foo/bar/linux
|
||||
make menuconfig (or xconfig)
|
||||
@ -82,35 +88,40 @@ step 3. Install
|
||||
See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
|
||||
instructions on this critical step.
|
||||
|
||||
USING BFS
|
||||
Using BFS
|
||||
=========
|
||||
To use the BeOS filesystem, use filesystem type 'befs'.
|
||||
|
||||
ex)
|
||||
ex::
|
||||
|
||||
mount -t befs /dev/fd0 /beos
|
||||
|
||||
MOUNT OPTIONS
|
||||
Mount Options
|
||||
=============
|
||||
|
||||
============= ===========================================================
|
||||
uid=nnn All files in the partition will be owned by user id nnn.
|
||||
gid=nnn All files in the partition will be in group nnn.
|
||||
iocharset=xxx Use xxx as the name of the NLS translation table.
|
||||
debug The driver will output debugging information to the syslog.
|
||||
============= ===========================================================
|
||||
|
||||
HOW TO GET LASTEST VERSION
|
||||
How to Get Lastest Version
|
||||
==========================
|
||||
|
||||
The latest version is currently available at:
|
||||
<http://befs-driver.sourceforge.net/>
|
||||
|
||||
ANY KNOWN BUGS?
|
||||
===========
|
||||
Any Known Bugs?
|
||||
===============
|
||||
As of Jan 20, 2002:
|
||||
|
||||
|
||||
None
|
||||
|
||||
SPECIAL THANKS
|
||||
Special Thanks
|
||||
==============
|
||||
Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
|
||||
|
||||
Hiroyuki Yamada ... Testing LinuxPPC.
|
||||
|
||||
|
@ -1,4 +1,7 @@
|
||||
BFS FILESYSTEM FOR LINUX
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================
|
||||
BFS Filesystem for Linux
|
||||
========================
|
||||
|
||||
The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
|
||||
@ -9,22 +12,22 @@ In order to access /stand partition under Linux you obviously need to
|
||||
know the partition number and the kernel must support UnixWare disk slices
|
||||
(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
|
||||
depend on having UnixWare disklabel support because one can also mount
|
||||
BFS filesystem via loopback:
|
||||
BFS filesystem via loopback::
|
||||
|
||||
# losetup /dev/loop0 stand.img
|
||||
# mount -t bfs /dev/loop0 /mnt/stand
|
||||
# losetup /dev/loop0 stand.img
|
||||
# mount -t bfs /dev/loop0 /mnt/stand
|
||||
|
||||
where stand.img is a file containing the image of BFS filesystem.
|
||||
where stand.img is a file containing the image of BFS filesystem.
|
||||
When you have finished using it and umounted you need to also deallocate
|
||||
/dev/loop0 device by:
|
||||
/dev/loop0 device by::
|
||||
|
||||
# losetup -d /dev/loop0
|
||||
# losetup -d /dev/loop0
|
||||
|
||||
You can simplify mounting by just typing:
|
||||
You can simplify mounting by just typing::
|
||||
|
||||
# mount -t bfs -o loop stand.img /mnt/stand
|
||||
# mount -t bfs -o loop stand.img /mnt/stand
|
||||
|
||||
this will allocate the first available loopback device (and load loop.o
|
||||
this will allocate the first available loopback device (and load loop.o
|
||||
kernel module if necessary) automatically. If the loopback driver is not
|
||||
loaded automatically, make sure that you have compiled the module and
|
||||
that modprobe is functioning. Beware that umount will not deallocate
|
||||
@ -33,21 +36,21 @@ that modprobe is functioning. Beware that umount will not deallocate
|
||||
losetup(8). Read losetup(8) manpage for more info.
|
||||
|
||||
To create the BFS image under UnixWare you need to find out first which
|
||||
slice contains it. The command prtvtoc(1M) is your friend:
|
||||
slice contains it. The command prtvtoc(1M) is your friend::
|
||||
|
||||
# prtvtoc /dev/rdsk/c0b0t0d0s0
|
||||
# prtvtoc /dev/rdsk/c0b0t0d0s0
|
||||
|
||||
(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
|
||||
look for the slice with tag "STAND", which is usually slice 10. With this
|
||||
information you can use dd(1) to create the BFS image:
|
||||
information you can use dd(1) to create the BFS image::
|
||||
|
||||
# umount /stand
|
||||
# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
|
||||
# umount /stand
|
||||
# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
|
||||
|
||||
Just in case, you can verify that you have done the right thing by checking
|
||||
the magic number:
|
||||
the magic number::
|
||||
|
||||
# od -Ad -tx4 stand.img | more
|
||||
# od -Ad -tx4 stand.img | more
|
||||
|
||||
The first 4 bytes should be 0x1badface.
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====
|
||||
BTRFS
|
||||
=====
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================
|
||||
Ceph Distributed File System
|
||||
============================
|
||||
|
||||
@ -15,6 +18,7 @@ Basic features include:
|
||||
* Easy deployment: most FS components are userspace daemons
|
||||
|
||||
Also,
|
||||
|
||||
* Flexible snapshots (on any directory)
|
||||
* Recursive accounting (nested files, directories, bytes)
|
||||
|
||||
@ -63,7 +67,7 @@ no 'du' or similar recursive scan of the file system is required.
|
||||
Finally, Ceph also allows quotas to be set on any directory in the system.
|
||||
The quota can restrict the number of bytes or the number of files stored
|
||||
beneath that point in the directory hierarchy. Quotas can be set using
|
||||
extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg:
|
||||
extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg::
|
||||
|
||||
setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
|
||||
getfattr -n ceph.quota.max_bytes /some/dir
|
||||
@ -76,7 +80,7 @@ from writing as much data as it needs.
|
||||
Mount Syntax
|
||||
============
|
||||
|
||||
The basic mount syntax is:
|
||||
The basic mount syntax is::
|
||||
|
||||
# mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
|
||||
|
||||
@ -84,7 +88,7 @@ You only need to specify a single monitor, as the client will get the
|
||||
full list when it connects. (However, if the monitor you specify
|
||||
happens to be down, the mount won't succeed.) The port can be left
|
||||
off if the monitor is using the default. So if the monitor is at
|
||||
1.2.3.4,
|
||||
1.2.3.4::
|
||||
|
||||
# mount -t ceph 1.2.3.4:/ /mnt/ceph
|
||||
|
||||
@ -163,14 +167,14 @@ Mount Options
|
||||
available modes are "no" and "clean". The default is "no".
|
||||
|
||||
* no: never attempt to reconnect when client detects that it has been
|
||||
blacklisted. Operations will generally fail after being blacklisted.
|
||||
blacklisted. Operations will generally fail after being blacklisted.
|
||||
|
||||
* clean: client reconnects to the ceph cluster automatically when it
|
||||
detects that it has been blacklisted. During reconnect, client drops
|
||||
dirty data/metadata, invalidates page caches and writable file handles.
|
||||
After reconnect, file locks become stale because the MDS loses track
|
||||
of them. If an inode contains any stale file locks, read/write on the
|
||||
inode is not allowed until applications release all stale file locks.
|
||||
detects that it has been blacklisted. During reconnect, client drops
|
||||
dirty data/metadata, invalidates page caches and writable file handles.
|
||||
After reconnect, file locks become stale because the MDS loses track
|
||||
of them. If an inode contains any stale file locks, read/write on the
|
||||
inode is not allowed until applications release all stale file locks.
|
||||
|
||||
More Information
|
||||
================
|
||||
@ -179,8 +183,8 @@ For more information on Ceph, see the home page at
|
||||
https://ceph.com/
|
||||
|
||||
The Linux kernel client source tree is available at
|
||||
https://github.com/ceph/ceph-client.git
|
||||
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
|
||||
- https://github.com/ceph/ceph-client.git
|
||||
- git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
|
||||
|
||||
and the source for the full system is at
|
||||
https://github.com/ceph/ceph.git
|
@ -1,12 +1,15 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Cramfs - cram a filesystem onto a small ROM
|
||||
===========================================
|
||||
Cramfs - cram a filesystem onto a small ROM
|
||||
===========================================
|
||||
|
||||
cramfs is designed to be simple and small, and to compress things well.
|
||||
cramfs is designed to be simple and small, and to compress things well.
|
||||
|
||||
It uses the zlib routines to compress a file one page at a time, and
|
||||
allows random page access. The meta-data is not compressed, but is
|
||||
expressed in a very terse representation to make it use much less
|
||||
diskspace than traditional filesystems.
|
||||
diskspace than traditional filesystems.
|
||||
|
||||
You can't write to a cramfs filesystem (making it compressible and
|
||||
compact also makes it _very_ hard to update on-the-fly), so you have to
|
||||
@ -28,9 +31,9 @@ issue.
|
||||
Hard links are supported, but hard linked files
|
||||
will still have a link count of 1 in the cramfs image.
|
||||
|
||||
Cramfs directories have no `.' or `..' entries. Directories (like
|
||||
Cramfs directories have no ``.`` or ``..`` entries. Directories (like
|
||||
every other file on cramfs) always have a link count of 1. (There's
|
||||
no need to use -noleaf in `find', btw.)
|
||||
no need to use -noleaf in ``find``, btw.)
|
||||
|
||||
No timestamps are stored in a cramfs, so these default to the epoch
|
||||
(1970 GMT). Recently-accessed files may have updated timestamps, but
|
||||
@ -70,9 +73,9 @@ MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap
|
||||
(Flash device in physical memory map). MTD partitions based on such devices
|
||||
are fine too. Then that device should be specified with the "mtd:" prefix
|
||||
as the mount device argument. For example, to mount the MTD device named
|
||||
"fs_partition" on the /mnt directory:
|
||||
"fs_partition" on the /mnt directory::
|
||||
|
||||
$ mount -t cramfs mtd:fs_partition /mnt
|
||||
$ mount -t cramfs mtd:fs_partition /mnt
|
||||
|
||||
To boot a kernel with this as root filesystem, suffice to specify
|
||||
something like "root=mtd:fs_partition" on the kernel command line.
|
||||
@ -90,6 +93,7 @@ https://github.com/npitre/cramfs-tools
|
||||
For /usr/share/magic
|
||||
--------------------
|
||||
|
||||
===== ======================= =======================
|
||||
0 ulelong 0x28cd3d45 Linux cramfs offset 0
|
||||
>4 ulelong x size %d
|
||||
>8 ulelong x flags 0x%x
|
||||
@ -110,6 +114,7 @@ For /usr/share/magic
|
||||
>552 ulelong x fsid.blocks %d
|
||||
>556 ulelong x fsid.files %d
|
||||
>560 string >\0 name "%.16s"
|
||||
===== ======================= =======================
|
||||
|
||||
|
||||
Hacker Notes
|
@ -1,4 +1,11 @@
|
||||
Copyright 2009 Jonathan Corbet <corbet@lwn.net>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=======
|
||||
DebugFS
|
||||
=======
|
||||
|
||||
Copyright |copy| 2009 Jonathan Corbet <corbet@lwn.net>
|
||||
|
||||
Debugfs exists as a simple way for kernel developers to make information
|
||||
available to user space. Unlike /proc, which is only meant for information
|
||||
@ -6,11 +13,11 @@ about a process, or sysfs, which has strict one-value-per-file rules,
|
||||
debugfs has no rules at all. Developers can put any information they want
|
||||
there. The debugfs filesystem is also intended to not serve as a stable
|
||||
ABI to user space; in theory, there are no stability constraints placed on
|
||||
files exported there. The real world is not always so simple, though [1];
|
||||
files exported there. The real world is not always so simple, though [1]_;
|
||||
even debugfs interfaces are best designed with the idea that they will need
|
||||
to be maintained forever.
|
||||
|
||||
Debugfs is typically mounted with a command like:
|
||||
Debugfs is typically mounted with a command like::
|
||||
|
||||
mount -t debugfs none /sys/kernel/debug
|
||||
|
||||
@ -23,7 +30,7 @@ Note that the debugfs API is exported GPL-only to modules.
|
||||
|
||||
Code using debugfs should include <linux/debugfs.h>. Then, the first order
|
||||
of business will be to create at least one directory to hold a set of
|
||||
debugfs files:
|
||||
debugfs files::
|
||||
|
||||
struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
|
||||
|
||||
@ -36,7 +43,7 @@ something went wrong. If ERR_PTR(-ENODEV) is returned, that is an
|
||||
indication that the kernel has been built without debugfs support and none
|
||||
of the functions described below will work.
|
||||
|
||||
The most general way to create a file within a debugfs directory is with:
|
||||
The most general way to create a file within a debugfs directory is with::
|
||||
|
||||
struct dentry *debugfs_create_file(const char *name, umode_t mode,
|
||||
struct dentry *parent, void *data,
|
||||
@ -53,7 +60,7 @@ ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
|
||||
missing.
|
||||
|
||||
Create a file with an initial size, the following function can be used
|
||||
instead:
|
||||
instead::
|
||||
|
||||
struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
|
||||
struct dentry *parent, void *data,
|
||||
@ -66,7 +73,7 @@ as the function debugfs_create_file.
|
||||
In a number of cases, the creation of a set of file operations is not
|
||||
actually necessary; the debugfs code provides a number of helper functions
|
||||
for simple situations. Files containing a single integer value can be
|
||||
created with any of:
|
||||
created with any of::
|
||||
|
||||
void debugfs_create_u8(const char *name, umode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
@ -80,7 +87,7 @@ created with any of:
|
||||
These files support both reading and writing the given value; if a specific
|
||||
file should not be written to, simply set the mode bits accordingly. The
|
||||
values in these files are in decimal; if hexadecimal is more appropriate,
|
||||
the following functions can be used instead:
|
||||
the following functions can be used instead::
|
||||
|
||||
void debugfs_create_x8(const char *name, umode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
@ -94,7 +101,7 @@ the following functions can be used instead:
|
||||
These functions are useful as long as the developer knows the size of the
|
||||
value to be exported. Some types can have different widths on different
|
||||
architectures, though, complicating the situation somewhat. There are
|
||||
functions meant to help out in such special cases:
|
||||
functions meant to help out in such special cases::
|
||||
|
||||
void debugfs_create_size_t(const char *name, umode_t mode,
|
||||
struct dentry *parent, size_t *value);
|
||||
@ -103,7 +110,7 @@ As might be expected, this function will create a debugfs file to represent
|
||||
a variable of type size_t.
|
||||
|
||||
Similarly, there are helpers for variables of type unsigned long, in decimal
|
||||
and hexadecimal:
|
||||
and hexadecimal::
|
||||
|
||||
struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
|
||||
struct dentry *parent,
|
||||
@ -111,7 +118,7 @@ and hexadecimal:
|
||||
void debugfs_create_xul(const char *name, umode_t mode,
|
||||
struct dentry *parent, unsigned long *value);
|
||||
|
||||
Boolean values can be placed in debugfs with:
|
||||
Boolean values can be placed in debugfs with::
|
||||
|
||||
struct dentry *debugfs_create_bool(const char *name, umode_t mode,
|
||||
struct dentry *parent, bool *value);
|
||||
@ -120,7 +127,7 @@ A read on the resulting file will yield either Y (for non-zero values) or
|
||||
N, followed by a newline. If written to, it will accept either upper- or
|
||||
lower-case values, or 1 or 0. Any other input will be silently ignored.
|
||||
|
||||
Also, atomic_t values can be placed in debugfs with:
|
||||
Also, atomic_t values can be placed in debugfs with::
|
||||
|
||||
void debugfs_create_atomic_t(const char *name, umode_t mode,
|
||||
struct dentry *parent, atomic_t *value)
|
||||
@ -129,7 +136,7 @@ A read of this file will get atomic_t values, and a write of this file
|
||||
will set atomic_t values.
|
||||
|
||||
Another option is exporting a block of arbitrary binary data, with
|
||||
this structure and function:
|
||||
this structure and function::
|
||||
|
||||
struct debugfs_blob_wrapper {
|
||||
void *data;
|
||||
@ -151,7 +158,7 @@ If you want to dump a block of registers (something that happens quite
|
||||
often during development, even if little such code reaches mainline.
|
||||
Debugfs offers two functions: one to make a registers-only file, and
|
||||
another to insert a register block in the middle of another sequential
|
||||
file.
|
||||
file::
|
||||
|
||||
struct debugfs_reg32 {
|
||||
char *name;
|
||||
@ -175,7 +182,7 @@ The "base" argument may be 0, but you may want to build the reg32 array
|
||||
using __stringify, and a number of register names (macros) are actually
|
||||
byte offsets over a base for the register block.
|
||||
|
||||
If you want to dump an u32 array in debugfs, you can create file with:
|
||||
If you want to dump an u32 array in debugfs, you can create file with::
|
||||
|
||||
void debugfs_create_u32_array(const char *name, umode_t mode,
|
||||
struct dentry *parent,
|
||||
@ -185,7 +192,7 @@ The "array" argument provides data, and the "elements" argument is
|
||||
the number of elements in the array. Note: Once array is created its
|
||||
size can not be changed.
|
||||
|
||||
There is a helper function to create device related seq_file:
|
||||
There is a helper function to create device related seq_file::
|
||||
|
||||
struct dentry *debugfs_create_devm_seqfile(struct device *dev,
|
||||
const char *name,
|
||||
@ -197,14 +204,14 @@ The "dev" argument is the device related to this debugfs file, and
|
||||
the "read_fn" is a function pointer which to be called to print the
|
||||
seq_file content.
|
||||
|
||||
There are a couple of other directory-oriented helper functions:
|
||||
There are a couple of other directory-oriented helper functions::
|
||||
|
||||
struct dentry *debugfs_rename(struct dentry *old_dir,
|
||||
struct dentry *debugfs_rename(struct dentry *old_dir,
|
||||
struct dentry *old_dentry,
|
||||
struct dentry *new_dir,
|
||||
struct dentry *new_dir,
|
||||
const char *new_name);
|
||||
|
||||
struct dentry *debugfs_create_symlink(const char *name,
|
||||
struct dentry *debugfs_create_symlink(const char *name,
|
||||
struct dentry *parent,
|
||||
const char *target);
|
||||
|
||||
@ -219,7 +226,7 @@ module is unloaded without explicitly removing debugfs entries, the result
|
||||
will be a lot of stale pointers and no end of highly antisocial behavior.
|
||||
So all debugfs users - at least those which can be built as modules - must
|
||||
be prepared to remove all files and directories they create there. A file
|
||||
can be removed with:
|
||||
can be removed with::
|
||||
|
||||
void debugfs_remove(struct dentry *dentry);
|
||||
|
||||
@ -229,7 +236,7 @@ be removed.
|
||||
Once upon a time, debugfs users were required to remember the dentry
|
||||
pointer for every debugfs file they created so that all files could be
|
||||
cleaned up. We live in more civilized times now, though, and debugfs users
|
||||
can call:
|
||||
can call::
|
||||
|
||||
void debugfs_remove_recursive(struct dentry *dentry);
|
||||
|
||||
@ -237,5 +244,4 @@ If this function is passed a pointer for the dentry corresponding to the
|
||||
top-level directory, the entire hierarchy below that directory will be
|
||||
removed.
|
||||
|
||||
Notes:
|
||||
[1] http://lwn.net/Articles/309298/
|
||||
.. [1] http://lwn.net/Articles/309298/
|
@ -1,20 +1,25 @@
|
||||
dlmfs
|
||||
==================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=====
|
||||
DLMFS
|
||||
=====
|
||||
|
||||
A minimal DLM userspace interface implemented via a virtual file
|
||||
system.
|
||||
|
||||
dlmfs is built with OCFS2 as it requires most of its infrastructure.
|
||||
|
||||
Project web page: http://ocfs2.wiki.kernel.org
|
||||
Tools web page: https://github.com/markfasheh/ocfs2-tools
|
||||
OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
|
||||
:Project web page: http://ocfs2.wiki.kernel.org
|
||||
:Tools web page: https://github.com/markfasheh/ocfs2-tools
|
||||
:OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
|
||||
|
||||
All code copyright 2005 Oracle except when otherwise noted.
|
||||
|
||||
CREDITS
|
||||
Credits
|
||||
=======
|
||||
|
||||
Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds
|
||||
Some code taken from ramfs which is Copyright |copy| 2000 Linus Torvalds
|
||||
and Transmeta Corp.
|
||||
|
||||
Mark Fasheh <mark.fasheh@oracle.com>
|
||||
@ -96,14 +101,19 @@ operation. If the lock succeeds, you'll get an fd.
|
||||
open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
|
||||
not automatically create inodes for existing lock resources.
|
||||
|
||||
============ ===========================
|
||||
Open Flag Lock Request Type
|
||||
--------- -----------------
|
||||
============ ===========================
|
||||
O_RDONLY Shared Read
|
||||
O_RDWR Exclusive
|
||||
============ ===========================
|
||||
|
||||
|
||||
============ ===========================
|
||||
Open Flag Resulting Locking Behavior
|
||||
--------- --------------------------
|
||||
============ ===========================
|
||||
O_NONBLOCK Trylock operation
|
||||
============ ===========================
|
||||
|
||||
You must provide exactly one of O_RDONLY or O_RDWR.
|
||||
|
@ -1,14 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================================
|
||||
eCryptfs: A stacked cryptographic filesystem for Linux
|
||||
======================================================
|
||||
|
||||
eCryptfs is free software. Please see the file COPYING for details.
|
||||
For documentation, please see the files in the doc/ subdirectory. For
|
||||
building and installation instructions please see the INSTALL file.
|
||||
|
||||
Maintainer: Phillip Hellewell
|
||||
Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
|
||||
Developers: Michael C. Thompson
|
||||
Kent Yoder
|
||||
Web Site: http://ecryptfs.sf.net
|
||||
:Maintainer: Phillip Hellewell
|
||||
:Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
|
||||
:Developers: Michael C. Thompson
|
||||
Kent Yoder
|
||||
:Web Site: http://ecryptfs.sf.net
|
||||
|
||||
This software is currently undergoing development. Make sure to
|
||||
maintain a backup copy of any data you write into eCryptfs.
|
||||
@ -19,13 +23,15 @@ SourceForge site:
|
||||
http://sourceforge.net/projects/ecryptfs/
|
||||
|
||||
Userspace requirements include:
|
||||
- David Howells' userspace keyring headers and libraries (version
|
||||
1.0 or higher), obtainable from
|
||||
http://people.redhat.com/~dhowells/keyutils/
|
||||
- Libgcrypt
|
||||
|
||||
- David Howells' userspace keyring headers and libraries (version
|
||||
1.0 or higher), obtainable from
|
||||
http://people.redhat.com/~dhowells/keyutils/
|
||||
- Libgcrypt
|
||||
|
||||
|
||||
NOTES
|
||||
Notes
|
||||
=====
|
||||
|
||||
In the beta/experimental releases of eCryptfs, when you upgrade
|
||||
eCryptfs, you should copy the files to an unencrypted location and
|
||||
@ -33,20 +39,21 @@ then copy the files back into the new eCryptfs mount to migrate the
|
||||
files.
|
||||
|
||||
|
||||
MOUNT-WIDE PASSPHRASE
|
||||
Mount-wide Passphrase
|
||||
=====================
|
||||
|
||||
Create a new directory into which eCryptfs will write its encrypted
|
||||
files (i.e., /root/crypt). Then, create the mount point directory
|
||||
(i.e., /mnt/crypt). Now it's time to mount eCryptfs:
|
||||
(i.e., /mnt/crypt). Now it's time to mount eCryptfs::
|
||||
|
||||
mount -t ecryptfs /root/crypt /mnt/crypt
|
||||
mount -t ecryptfs /root/crypt /mnt/crypt
|
||||
|
||||
You should be prompted for a passphrase and a salt (the salt may be
|
||||
blank).
|
||||
|
||||
Try writing a new file:
|
||||
Try writing a new file::
|
||||
|
||||
echo "Hello, World" > /mnt/crypt/hello.txt
|
||||
echo "Hello, World" > /mnt/crypt/hello.txt
|
||||
|
||||
The operation will complete. Notice that there is a new file in
|
||||
/root/crypt that is at least 12288 bytes in size (depending on your
|
||||
@ -59,10 +66,13 @@ keyctl clear @u
|
||||
Then umount /mnt/crypt and mount again per the instructions given
|
||||
above.
|
||||
|
||||
cat /mnt/crypt/hello.txt
|
||||
::
|
||||
|
||||
cat /mnt/crypt/hello.txt
|
||||
|
||||
|
||||
NOTES
|
||||
Notes
|
||||
=====
|
||||
|
||||
eCryptfs version 0.1 should only be mounted on (1) empty directories
|
||||
or (2) directories containing files only created by eCryptfs. If you
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================================
|
||||
efivarfs - a (U)EFI variable filesystem
|
||||
=======================================
|
||||
|
||||
The efivarfs filesystem was created to address the shortcomings of
|
||||
using entries in sysfs to maintain EFI variables. The old sysfs EFI
|
||||
@ -11,7 +14,7 @@ than a single page, sysfs isn't the best interface for this.
|
||||
Variables can be created, deleted and modified with the efivarfs
|
||||
filesystem.
|
||||
|
||||
efivarfs is typically mounted like this,
|
||||
efivarfs is typically mounted like this::
|
||||
|
||||
mount -t efivarfs none /sys/firmware/efi/efivars
|
||||
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================
|
||||
Enhanced Read-Only File System - EROFS
|
||||
======================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
@ -6,6 +12,7 @@ from other read-only file systems, it aims to be designed for flexibility,
|
||||
scalability, but be kept simple and high performance.
|
||||
|
||||
It is designed as a better filesystem solution for the following scenarios:
|
||||
|
||||
- read-only storage media or
|
||||
|
||||
- part of a fully trusted read-only solution, which means it needs to be
|
||||
@ -17,6 +24,7 @@ It is designed as a better filesystem solution for the following scenarios:
|
||||
for those embedded devices with limited memory (ex, smartphone);
|
||||
|
||||
Here is the main features of EROFS:
|
||||
|
||||
- Little endian on-disk design;
|
||||
|
||||
- Currently 4KB block size (nobh) and therefore maximum 16TB address space;
|
||||
@ -24,13 +32,17 @@ Here is the main features of EROFS:
|
||||
- Metadata & data could be mixed by design;
|
||||
|
||||
- 2 inode versions for different requirements:
|
||||
|
||||
===================== ============ =====================================
|
||||
compact (v1) extended (v2)
|
||||
Inode metadata size: 32 bytes 64 bytes
|
||||
Max file size: 4 GB 16 EB (also limited by max. vol size)
|
||||
Max uids/gids: 65536 4294967296
|
||||
File change time: no yes (64 + 32-bit timestamp)
|
||||
Max hardlinks: 65536 4294967296
|
||||
Metadata reserved: 4 bytes 14 bytes
|
||||
===================== ============ =====================================
|
||||
Inode metadata size 32 bytes 64 bytes
|
||||
Max file size 4 GB 16 EB (also limited by max. vol size)
|
||||
Max uids/gids 65536 4294967296
|
||||
File change time no yes (64 + 32-bit timestamp)
|
||||
Max hardlinks 65536 4294967296
|
||||
Metadata reserved 4 bytes 14 bytes
|
||||
===================== ============ =====================================
|
||||
|
||||
- Support extended attributes (xattrs) as an option;
|
||||
|
||||
@ -43,29 +55,36 @@ Here is the main features of EROFS:
|
||||
|
||||
The following git tree provides the file system user-space tools under
|
||||
development (ex, formatting tool mkfs.erofs):
|
||||
>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
|
||||
|
||||
- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
|
||||
|
||||
Bugs and patches are welcome, please kindly help us and send to the following
|
||||
linux-erofs mailing list:
|
||||
>> linux-erofs mailing list <linux-erofs@lists.ozlabs.org>
|
||||
|
||||
- linux-erofs mailing list <linux-erofs@lists.ozlabs.org>
|
||||
|
||||
Mount options
|
||||
=============
|
||||
|
||||
=================== =========================================================
|
||||
(no)user_xattr Setup Extended User Attributes. Note: xattr is enabled
|
||||
by default if CONFIG_EROFS_FS_XATTR is selected.
|
||||
(no)acl Setup POSIX Access Control List. Note: acl is enabled
|
||||
by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
|
||||
cache_strategy=%s Select a strategy for cached decompression from now on:
|
||||
disabled: In-place I/O decompression only;
|
||||
readahead: Cache the last incomplete compressed physical
|
||||
|
||||
========== =============================================
|
||||
disabled In-place I/O decompression only;
|
||||
readahead Cache the last incomplete compressed physical
|
||||
cluster for further reading. It still does
|
||||
in-place I/O decompression for the rest
|
||||
compressed physical clusters;
|
||||
readaround: Cache the both ends of incomplete compressed
|
||||
readaround Cache the both ends of incomplete compressed
|
||||
physical clusters for further reading.
|
||||
It still does in-place I/O decompression
|
||||
for the rest compressed physical clusters.
|
||||
========== =============================================
|
||||
=================== =========================================================
|
||||
|
||||
On-disk details
|
||||
===============
|
||||
@ -73,7 +92,7 @@ On-disk details
|
||||
Summary
|
||||
-------
|
||||
Different from other read-only file systems, an EROFS volume is designed
|
||||
to be as simple as possible:
|
||||
to be as simple as possible::
|
||||
|
||||
|-> aligned with the block size
|
||||
____________________________________________________________
|
||||
@ -83,41 +102,45 @@ to be as simple as possible:
|
||||
|
||||
All data areas should be aligned with the block size, but metadata areas
|
||||
may not. All metadatas can be now observed in two different spaces (views):
|
||||
|
||||
1. Inode metadata space
|
||||
|
||||
Each valid inode should be aligned with an inode slot, which is a fixed
|
||||
value (32 bytes) and designed to be kept in line with compact inode size.
|
||||
|
||||
Each inode can be directly found with the following formula:
|
||||
inode offset = meta_blkaddr * block_size + 32 * nid
|
||||
|
||||
|-> aligned with 8B
|
||||
|-> followed closely
|
||||
+ meta_blkaddr blocks |-> another slot
|
||||
_____________________________________________________________________
|
||||
| ... | inode | xattrs | extents | data inline | ... | inode ...
|
||||
|________|_______|(optional)|(optional)|__(optional)_|_____|__________
|
||||
|-> aligned with the inode slot size
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
.____________________________________________________|-> aligned with 4B
|
||||
| xattr_ibody_header | shared xattrs | inline xattrs |
|
||||
|____________________|_______________|_______________|
|
||||
|-> 12 bytes <-|->x * 4 bytes<-| .
|
||||
. . .
|
||||
. . .
|
||||
. . .
|
||||
._______________________________.______________________.
|
||||
| id | id | id | id | ... | id | ent | ... | ent| ... |
|
||||
|____|____|____|____|______|____|_____|_____|____|_____|
|
||||
|-> aligned with 4B
|
||||
|-> aligned with 4B
|
||||
::
|
||||
|
||||
|-> aligned with 8B
|
||||
|-> followed closely
|
||||
+ meta_blkaddr blocks |-> another slot
|
||||
_____________________________________________________________________
|
||||
| ... | inode | xattrs | extents | data inline | ... | inode ...
|
||||
|________|_______|(optional)|(optional)|__(optional)_|_____|__________
|
||||
|-> aligned with the inode slot size
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
.____________________________________________________|-> aligned with 4B
|
||||
| xattr_ibody_header | shared xattrs | inline xattrs |
|
||||
|____________________|_______________|_______________|
|
||||
|-> 12 bytes <-|->x * 4 bytes<-| .
|
||||
. . .
|
||||
. . .
|
||||
. . .
|
||||
._______________________________.______________________.
|
||||
| id | id | id | id | ... | id | ent | ... | ent| ... |
|
||||
|____|____|____|____|______|____|_____|_____|____|_____|
|
||||
|-> aligned with 4B
|
||||
|-> aligned with 4B
|
||||
|
||||
Inode could be 32 or 64 bytes, which can be distinguished from a common
|
||||
field which all inode versions have -- i_format:
|
||||
field which all inode versions have -- i_format::
|
||||
|
||||
__________________ __________________
|
||||
| i_format | | i_format |
|
||||
@ -132,16 +155,19 @@ may not. All metadatas can be now observed in two different spaces (views):
|
||||
proper alignment, and they could be optional for different data mappings.
|
||||
_currently_ total 4 valid data mappings are supported:
|
||||
|
||||
== ====================================================================
|
||||
0 flat file data without data inline (no extent);
|
||||
1 fixed-sized output data compression (with non-compacted indexes);
|
||||
2 flat file data with tail packing data inline (no extent);
|
||||
3 fixed-sized output data compression (with compacted indexes, v5.3+).
|
||||
== ====================================================================
|
||||
|
||||
The size of the optional xattrs is indicated by i_xattr_count in inode
|
||||
header. Large xattrs or xattrs shared by many different files can be
|
||||
stored in shared xattrs metadata rather than inlined right after inode.
|
||||
|
||||
2. Shared xattrs metadata space
|
||||
|
||||
Shared xattrs space is similar to the above inode space, started with
|
||||
a specific block indicated by xattr_blkaddr, organized one by one with
|
||||
proper align.
|
||||
@ -149,11 +175,13 @@ may not. All metadatas can be now observed in two different spaces (views):
|
||||
Each share xattr can also be directly found by the following formula:
|
||||
xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
|
||||
|
||||
|-> aligned by 4 bytes
|
||||
+ xattr_blkaddr blocks |-> aligned with 4 bytes
|
||||
_________________________________________________________________________
|
||||
| ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ...
|
||||
|________|_____________|_____________|_____|______________|_______________
|
||||
::
|
||||
|
||||
|-> aligned by 4 bytes
|
||||
+ xattr_blkaddr blocks |-> aligned with 4 bytes
|
||||
_________________________________________________________________________
|
||||
| ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ...
|
||||
|________|_____________|_____________|_____|______________|_______________
|
||||
|
||||
Directories
|
||||
-----------
|
||||
@ -163,19 +191,21 @@ random file lookup, and all directory entries are _strictly_ recorded in
|
||||
alphabetical order in order to support improved prefix binary search
|
||||
algorithm (could refer to the related source code).
|
||||
|
||||
___________________________
|
||||
/ |
|
||||
/ ______________|________________
|
||||
/ / | nameoff1 | nameoffN-1
|
||||
____________.______________._______________v________________v__________
|
||||
| dirent | dirent | ... | dirent | filename | filename | ... | filename |
|
||||
|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
|
||||
\ ^
|
||||
\ | * could have
|
||||
\ | trailing '\0'
|
||||
\________________________| nameoff0
|
||||
::
|
||||
|
||||
Directory block
|
||||
___________________________
|
||||
/ |
|
||||
/ ______________|________________
|
||||
/ / | nameoff1 | nameoffN-1
|
||||
____________.______________._______________v________________v__________
|
||||
| dirent | dirent | ... | dirent | filename | filename | ... | filename |
|
||||
|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
|
||||
\ ^
|
||||
\ | * could have
|
||||
\ | trailing '\0'
|
||||
\________________________| nameoff0
|
||||
|
||||
Directory block
|
||||
|
||||
Note that apart from the offset of the first filename, nameoff0 also indicates
|
||||
the total number of directory entries in this block since it is no need to
|
||||
@ -184,28 +214,27 @@ introduce another on-disk field at all.
|
||||
Compression
|
||||
-----------
|
||||
Currently, EROFS supports 4KB fixed-sized output transparent file compression,
|
||||
as illustrated below:
|
||||
as illustrated below::
|
||||
|
||||
|---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
|
||||
clusterofs clusterofs clusterofs
|
||||
| | | logical data
|
||||
_________v_______________________________v_____________________v_______________
|
||||
... | . | | . | | . | ...
|
||||
____|____.________|_____________|________.____|_____________|__.__________|____
|
||||
|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
|
||||
size size size size size
|
||||
. . . .
|
||||
. . . .
|
||||
. . . .
|
||||
_______._____________._____________._____________._____________________
|
||||
... | | | | ... physical data
|
||||
_______|_____________|_____________|_____________|_____________________
|
||||
|-> cluster <-|-> cluster <-|-> cluster <-|
|
||||
size size size
|
||||
|---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
|
||||
clusterofs clusterofs clusterofs
|
||||
| | | logical data
|
||||
_________v_______________________________v_____________________v_______________
|
||||
... | . | | . | | . | ...
|
||||
____|____.________|_____________|________.____|_____________|__.__________|____
|
||||
|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
|
||||
size size size size size
|
||||
. . . .
|
||||
. . . .
|
||||
. . . .
|
||||
_______._____________._____________._____________._____________________
|
||||
... | | | | ... physical data
|
||||
_______|_____________|_____________|_____________|_____________________
|
||||
|-> cluster <-|-> cluster <-|-> cluster <-|
|
||||
size size size
|
||||
|
||||
Currently each on-disk physical cluster can contain 4KB (un)compressed data
|
||||
at most. For each logical cluster, there is a corresponding on-disk index to
|
||||
describe its cluster type, physical cluster address, etc.
|
||||
|
||||
See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
|
||||
The Second Extended Filesystem
|
||||
==============================
|
||||
@ -14,8 +16,9 @@ Options
|
||||
Most defaults are determined by the filesystem superblock, and can be
|
||||
set using tune2fs(8). Kernel-determined defaults are indicated by (*).
|
||||
|
||||
bsddf (*) Makes `df' act like BSD.
|
||||
minixdf Makes `df' act like Minix.
|
||||
==================== === ================================================
|
||||
bsddf (*) Makes ``df`` act like BSD.
|
||||
minixdf Makes ``df`` act like Minix.
|
||||
|
||||
check=none, nocheck (*) Don't do extra checking of bitmaps on mount
|
||||
(check=normal and check=strict options removed)
|
||||
@ -62,6 +65,7 @@ quota, usrquota Enable user disk quota support
|
||||
|
||||
grpquota Enable group disk quota support
|
||||
(requires CONFIG_QUOTA).
|
||||
==================== === ================================================
|
||||
|
||||
noquota option ls silently ignored by ext2.
|
||||
|
||||
@ -294,9 +298,9 @@ respective fsck programs.
|
||||
If you're exceptionally paranoid, there are 3 ways of making metadata
|
||||
writes synchronous on ext2:
|
||||
|
||||
per-file if you have the program source: use the O_SYNC flag to open()
|
||||
per-file if you don't have the source: use "chattr +S" on the file
|
||||
per-filesystem: add the "sync" option to mount (or in /etc/fstab)
|
||||
- per-file if you have the program source: use the O_SYNC flag to open()
|
||||
- per-file if you don't have the source: use "chattr +S" on the file
|
||||
- per-filesystem: add the "sync" option to mount (or in /etc/fstab)
|
||||
|
||||
the first and last are not ext2 specific but do force the metadata to
|
||||
be written synchronously. See also Journaling below.
|
||||
@ -316,10 +320,12 @@ Most of these limits could be overcome with slight changes in the on-disk
|
||||
format and using a compatibility flag to signal the format change (at
|
||||
the expense of some compatibility).
|
||||
|
||||
Filesystem block size: 1kB 2kB 4kB 8kB
|
||||
|
||||
File size limit: 16GB 256GB 2048GB 2048GB
|
||||
Filesystem size limit: 2047GB 8192GB 16384GB 32768GB
|
||||
===================== ======= ======= ======= ========
|
||||
Filesystem block size 1kB 2kB 4kB 8kB
|
||||
===================== ======= ======= ======= ========
|
||||
File size limit 16GB 256GB 2048GB 2048GB
|
||||
Filesystem size limit 2047GB 8192GB 16384GB 32768GB
|
||||
===================== ======= ======= ======= ========
|
||||
|
||||
There is a 2.4 kernel limit of 2048GB for a single block device, so no
|
||||
filesystem larger than that can be created at this time. There is also
|
||||
@ -370,19 +376,24 @@ ext4 and journaling.
|
||||
References
|
||||
==========
|
||||
|
||||
======================= ===============================================
|
||||
The kernel source file:/usr/src/linux/fs/ext2/
|
||||
e2fsprogs (e2fsck) http://e2fsprogs.sourceforge.net/
|
||||
Design & Implementation http://e2fsprogs.sourceforge.net/ext2intro.html
|
||||
Journaling (ext3) ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
|
||||
Filesystem Resizing http://ext2resize.sourceforge.net/
|
||||
Compression (*) http://e2compr.sourceforge.net/
|
||||
Compression [1]_ http://e2compr.sourceforge.net/
|
||||
======================= ===============================================
|
||||
|
||||
Implementations for:
|
||||
Windows 95/98/NT/2000 http://www.chrysocome.net/explore2fs
|
||||
Windows 95 (*) http://www.yipton.net/content.html#FSDEXT2
|
||||
DOS client (*) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
OS/2 (+) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
RISC OS client http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
|
||||
|
||||
(*) no longer actively developed/supported (as of Apr 2001)
|
||||
(+) no longer actively developed/supported (as of Mar 2009)
|
||||
======================= ===========================================================
|
||||
Windows 95/98/NT/2000 http://www.chrysocome.net/explore2fs
|
||||
Windows 95 [1]_ http://www.yipton.net/content.html#FSDEXT2
|
||||
DOS client [1]_ ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
OS/2 [2]_ ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||
RISC OS client http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
|
||||
======================= ===========================================================
|
||||
|
||||
.. [1] no longer actively developed/supported (as of Apr 2001)
|
||||
.. [2] no longer actively developed/supported (as of Mar 2009)
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
Ext3 Filesystem
|
||||
===============
|
||||
|
@ -1,6 +1,8 @@
|
||||
================================================================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================================
|
||||
WHAT IS Flash-Friendly File System (F2FS)?
|
||||
================================================================================
|
||||
==========================================
|
||||
|
||||
NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
|
||||
been equipped on a variety systems ranging from mobile to server systems. Since
|
||||
@ -20,14 +22,15 @@ layout, but also for selecting allocation and cleaning algorithms.
|
||||
|
||||
The following git tree provides the file system formatting tool (mkfs.f2fs),
|
||||
a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
|
||||
>> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
|
||||
|
||||
- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
|
||||
|
||||
For reporting bugs and sending patches, please use the following mailing list:
|
||||
>> linux-f2fs-devel@lists.sourceforge.net
|
||||
|
||||
================================================================================
|
||||
BACKGROUND AND DESIGN ISSUES
|
||||
================================================================================
|
||||
- linux-f2fs-devel@lists.sourceforge.net
|
||||
|
||||
Background and Design issues
|
||||
============================
|
||||
|
||||
Log-structured File System (LFS)
|
||||
--------------------------------
|
||||
@ -61,6 +64,7 @@ needs to reclaim these obsolete blocks seamlessly to users. This job is called
|
||||
as a cleaning process.
|
||||
|
||||
The process consists of three operations as follows.
|
||||
|
||||
1. A victim segment is selected through referencing segment usage table.
|
||||
2. It loads parent index structures of all the data in the victim identified by
|
||||
segment summary blocks.
|
||||
@ -71,9 +75,8 @@ This cleaning job may cause unexpected long delays, so the most important goal
|
||||
is to hide the latencies to users. And also definitely, it should reduce the
|
||||
amount of valid data to be moved, and move them quickly as well.
|
||||
|
||||
================================================================================
|
||||
KEY FEATURES
|
||||
================================================================================
|
||||
Key Features
|
||||
============
|
||||
|
||||
Flash Awareness
|
||||
---------------
|
||||
@ -94,10 +97,11 @@ Cleaning Overhead
|
||||
- Support multi-head logs for static/dynamic hot and cold data separation
|
||||
- Introduce adaptive logging for efficient block allocation
|
||||
|
||||
================================================================================
|
||||
MOUNT OPTIONS
|
||||
================================================================================
|
||||
Mount Options
|
||||
=============
|
||||
|
||||
|
||||
====================== ============================================================
|
||||
background_gc=%s Turn on/off cleaning operations, namely garbage
|
||||
collection, triggered in background when I/O subsystem is
|
||||
idle. If background_gc=on, it will turn on the garbage
|
||||
@ -167,7 +171,10 @@ fault_injection=%d Enable fault injection in all supported types with
|
||||
fault_type=%d Support configuring fault injection type, should be
|
||||
enabled with fault_injection option, fault type value
|
||||
is shown below, it supports single or combined type.
|
||||
|
||||
=================== ===========
|
||||
Type_Name Type_Value
|
||||
=================== ===========
|
||||
FAULT_KMALLOC 0x000000001
|
||||
FAULT_KVMALLOC 0x000000002
|
||||
FAULT_PAGE_ALLOC 0x000000004
|
||||
@ -183,6 +190,7 @@ fault_type=%d Support configuring fault injection type, should be
|
||||
FAULT_CHECKPOINT 0x000001000
|
||||
FAULT_DISCARD 0x000002000
|
||||
FAULT_WRITE_IO 0x000004000
|
||||
=================== ===========
|
||||
mode=%s Control block allocation mode which supports "adaptive"
|
||||
and "lfs". In "lfs" mode, there should be no random
|
||||
writes towards main area.
|
||||
@ -219,7 +227,7 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix",
|
||||
non-atomic files likewise "nobarrier" mount option.
|
||||
test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt
|
||||
context. The fake fscrypt context is used by xfstests.
|
||||
checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable"
|
||||
checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable"
|
||||
to reenable checkpointing. Is enabled by default. While
|
||||
disabled, any unmounting or unexpected shutdowns will cause
|
||||
the filesystem contents to appear as they did when the
|
||||
@ -246,22 +254,22 @@ compress_extension=%s Support adding specified extension, so that f2fs can enab
|
||||
on compression extension list and enable compression on
|
||||
these file by default rather than to enable it via ioctl.
|
||||
For other files, we can still enable compression via ioctl.
|
||||
====================== ============================================================
|
||||
|
||||
================================================================================
|
||||
DEBUGFS ENTRIES
|
||||
================================================================================
|
||||
Debugfs Entries
|
||||
===============
|
||||
|
||||
/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
|
||||
f2fs. Each file shows the whole f2fs information.
|
||||
|
||||
/sys/kernel/debug/f2fs/status includes:
|
||||
|
||||
- major file system information managed by f2fs currently
|
||||
- average SIT information about whole segments
|
||||
- current memory footprint consumed by f2fs.
|
||||
|
||||
================================================================================
|
||||
SYSFS ENTRIES
|
||||
================================================================================
|
||||
Sysfs Entries
|
||||
=============
|
||||
|
||||
Information about mounted f2fs file systems can be found in
|
||||
/sys/fs/f2fs. Each mounted filesystem will have a directory in
|
||||
@ -271,22 +279,24 @@ The files in each per-device directory are shown in table below.
|
||||
Files in /sys/fs/f2fs/<devname>
|
||||
(see also Documentation/ABI/testing/sysfs-fs-f2fs)
|
||||
|
||||
================================================================================
|
||||
USAGE
|
||||
================================================================================
|
||||
Usage
|
||||
=====
|
||||
|
||||
1. Download userland tools and compile them.
|
||||
|
||||
2. Skip, if f2fs was compiled statically inside kernel.
|
||||
Otherwise, insert the f2fs.ko module.
|
||||
# insmod f2fs.ko
|
||||
Otherwise, insert the f2fs.ko module::
|
||||
|
||||
3. Create a directory trying to mount
|
||||
# mkdir /mnt/f2fs
|
||||
# insmod f2fs.ko
|
||||
|
||||
4. Format the block device, and then mount as f2fs
|
||||
# mkfs.f2fs -l label /dev/block_device
|
||||
# mount -t f2fs /dev/block_device /mnt/f2fs
|
||||
3. Create a directory trying to mount::
|
||||
|
||||
# mkdir /mnt/f2fs
|
||||
|
||||
4. Format the block device, and then mount as f2fs::
|
||||
|
||||
# mkfs.f2fs -l label /dev/block_device
|
||||
# mount -t f2fs /dev/block_device /mnt/f2fs
|
||||
|
||||
mkfs.f2fs
|
||||
---------
|
||||
@ -294,18 +304,26 @@ The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
|
||||
which builds a basic on-disk layout.
|
||||
|
||||
The options consist of:
|
||||
-l [label] : Give a volume label, up to 512 unicode name.
|
||||
-a [0 or 1] : Split start location of each area for heap-based allocation.
|
||||
1 is set by default, which performs this.
|
||||
-o [int] : Set overprovision ratio in percent over volume size.
|
||||
5 is set by default.
|
||||
-s [int] : Set the number of segments per section.
|
||||
1 is set by default.
|
||||
-z [int] : Set the number of sections per zone.
|
||||
1 is set by default.
|
||||
-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
|
||||
-t [0 or 1] : Disable discard command or not.
|
||||
1 is set by default, which conducts discard.
|
||||
|
||||
=============== ===========================================================
|
||||
``-l [label]`` Give a volume label, up to 512 unicode name.
|
||||
``-a [0 or 1]`` Split start location of each area for heap-based allocation.
|
||||
|
||||
1 is set by default, which performs this.
|
||||
``-o [int]`` Set overprovision ratio in percent over volume size.
|
||||
|
||||
5 is set by default.
|
||||
``-s [int]`` Set the number of segments per section.
|
||||
|
||||
1 is set by default.
|
||||
``-z [int]`` Set the number of sections per zone.
|
||||
|
||||
1 is set by default.
|
||||
``-e [str]`` Set basic extension list. e.g. "mp3,gif,mov"
|
||||
``-t [0 or 1]`` Disable discard command or not.
|
||||
|
||||
1 is set by default, which conducts discard.
|
||||
=============== ===========================================================
|
||||
|
||||
fsck.f2fs
|
||||
---------
|
||||
@ -314,7 +332,8 @@ partition, which examines whether the filesystem metadata and user-made data
|
||||
are cross-referenced correctly or not.
|
||||
Note that, initial version of the tool does not fix any inconsistency.
|
||||
|
||||
The options consist of:
|
||||
The options consist of::
|
||||
|
||||
-d debug level [default:0]
|
||||
|
||||
dump.f2fs
|
||||
@ -327,20 +346,21 @@ It shows on-disk inode information recognized by a given inode number, and is
|
||||
able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
|
||||
./dump_sit respectively.
|
||||
|
||||
The options consist of:
|
||||
The options consist of::
|
||||
|
||||
-d debug level [default:0]
|
||||
-i inode no (hex)
|
||||
-s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
|
||||
-a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
|
||||
|
||||
Examples:
|
||||
# dump.f2fs -i [ino] /dev/sdx
|
||||
# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
|
||||
# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
|
||||
Examples::
|
||||
|
||||
================================================================================
|
||||
DESIGN
|
||||
================================================================================
|
||||
# dump.f2fs -i [ino] /dev/sdx
|
||||
# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
|
||||
# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
|
||||
|
||||
Design
|
||||
======
|
||||
|
||||
On-disk Layout
|
||||
--------------
|
||||
@ -351,7 +371,7 @@ consists of a set of sections. By default, section and zone sizes are set to one
|
||||
segment size identically, but users can easily modify the sizes by mkfs.
|
||||
|
||||
F2FS splits the entire volume into six areas, and all the areas except superblock
|
||||
consists of multiple segments as described below.
|
||||
consists of multiple segments as described below::
|
||||
|
||||
align with the zone size <-|
|
||||
|-> align with the segment size
|
||||
@ -373,28 +393,28 @@ consists of multiple segments as described below.
|
||||
|__zone__|
|
||||
|
||||
- Superblock (SB)
|
||||
: It is located at the beginning of the partition, and there exist two copies
|
||||
It is located at the beginning of the partition, and there exist two copies
|
||||
to avoid file system crash. It contains basic partition information and some
|
||||
default parameters of f2fs.
|
||||
|
||||
- Checkpoint (CP)
|
||||
: It contains file system information, bitmaps for valid NAT/SIT sets, orphan
|
||||
It contains file system information, bitmaps for valid NAT/SIT sets, orphan
|
||||
inode lists, and summary entries of current active segments.
|
||||
|
||||
- Segment Information Table (SIT)
|
||||
: It contains segment information such as valid block count and bitmap for the
|
||||
It contains segment information such as valid block count and bitmap for the
|
||||
validity of all the blocks.
|
||||
|
||||
- Node Address Table (NAT)
|
||||
: It is composed of a block address table for all the node blocks stored in
|
||||
It is composed of a block address table for all the node blocks stored in
|
||||
Main area.
|
||||
|
||||
- Segment Summary Area (SSA)
|
||||
: It contains summary entries which contains the owner information of all the
|
||||
It contains summary entries which contains the owner information of all the
|
||||
data and node blocks stored in Main area.
|
||||
|
||||
- Main Area
|
||||
: It contains file and directory data including their indices.
|
||||
It contains file and directory data including their indices.
|
||||
|
||||
In order to avoid misalignment between file system and flash-based storage, F2FS
|
||||
aligns the start block address of CP with the segment size. Also, it aligns the
|
||||
@ -414,7 +434,7 @@ One of them always indicates the last valid data, which is called as shadow copy
|
||||
mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
|
||||
|
||||
For file system consistency, each CP points to which NAT and SIT copies are
|
||||
valid, as shown as below.
|
||||
valid, as shown as below::
|
||||
|
||||
+--------+----------+---------+
|
||||
| CP | SIT | NAT |
|
||||
@ -438,7 +458,7 @@ indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
|
||||
indices, two direct node pointers, two indirect node pointers, and one double
|
||||
indirect node pointer as described below. One direct node block contains 1018
|
||||
data blocks, and one indirect node block contains also 1018 node blocks. Thus,
|
||||
one inode block (i.e., a file) covers:
|
||||
one inode block (i.e., a file) covers::
|
||||
|
||||
4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
|
||||
|
||||
@ -473,6 +493,8 @@ A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
|
||||
used to represent whether each dentry is valid or not. A dentry block occupies
|
||||
4KB with the following composition.
|
||||
|
||||
::
|
||||
|
||||
Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
|
||||
dentries(11 * 214 bytes) + file name (8 * 214 bytes)
|
||||
|
||||
@ -498,23 +520,25 @@ F2FS implements multi-level hash tables for directory structure. Each level has
|
||||
a hash table with dedicated number of hash buckets as shown below. Note that
|
||||
"A(2B)" means a bucket includes 2 data blocks.
|
||||
|
||||
----------------------
|
||||
A : bucket
|
||||
B : block
|
||||
N : MAX_DIR_HASH_DEPTH
|
||||
----------------------
|
||||
::
|
||||
|
||||
level #0 | A(2B)
|
||||
|
|
||||
level #1 | A(2B) - A(2B)
|
||||
|
|
||||
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
|
||||
. | . . . .
|
||||
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
|
||||
. | . . . .
|
||||
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
|
||||
----------------------
|
||||
A : bucket
|
||||
B : block
|
||||
N : MAX_DIR_HASH_DEPTH
|
||||
----------------------
|
||||
|
||||
The number of blocks and buckets are determined by,
|
||||
level #0 | A(2B)
|
||||
|
|
||||
level #1 | A(2B) - A(2B)
|
||||
|
|
||||
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
|
||||
. | . . . .
|
||||
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
|
||||
. | . . . .
|
||||
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
|
||||
|
||||
The number of blocks and buckets are determined by::
|
||||
|
||||
,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
|
||||
# of blocks in level #n = |
|
||||
@ -532,7 +556,7 @@ dentry consisting of the file name and its inode number. If not found, F2FS
|
||||
scans the next hash table in level #1. In this way, F2FS scans hash tables in
|
||||
each levels incrementally from 1 to N. In each levels F2FS needs to scan only
|
||||
one bucket determined by the following equation, which shows O(log(# of files))
|
||||
complexity.
|
||||
complexity::
|
||||
|
||||
bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
|
||||
|
||||
@ -540,7 +564,8 @@ In the case of file creation, F2FS finds empty consecutive slots that cover the
|
||||
file name. F2FS searches the empty slots in the hash tables of whole levels from
|
||||
1 to N in the same way as the lookup operation.
|
||||
|
||||
The following figure shows an example of two cases holding children.
|
||||
The following figure shows an example of two cases holding children::
|
||||
|
||||
--------------> Dir <--------------
|
||||
| |
|
||||
child child
|
||||
@ -611,14 +636,15 @@ Write-hint Policy
|
||||
2) whint_mode=user-based. F2FS tries to pass down hints given by
|
||||
users.
|
||||
|
||||
===================== ======================== ===================
|
||||
User F2FS Block
|
||||
---- ---- -----
|
||||
===================== ======================== ===================
|
||||
META WRITE_LIFE_NOT_SET
|
||||
HOT_NODE "
|
||||
WARM_NODE "
|
||||
COLD_NODE "
|
||||
*ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
||||
*extension list " "
|
||||
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
||||
extension list " "
|
||||
|
||||
-- buffered io
|
||||
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
||||
@ -635,11 +661,13 @@ WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
||||
WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
||||
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
||||
WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
||||
===================== ======================== ===================
|
||||
|
||||
3) whint_mode=fs-based. F2FS passes down hints with its policy.
|
||||
|
||||
===================== ======================== ===================
|
||||
User F2FS Block
|
||||
---- ---- -----
|
||||
===================== ======================== ===================
|
||||
META WRITE_LIFE_MEDIUM;
|
||||
HOT_NODE WRITE_LIFE_NOT_SET
|
||||
WARM_NODE "
|
||||
@ -662,6 +690,7 @@ WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
||||
WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
||||
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
||||
WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
||||
===================== ======================== ===================
|
||||
|
||||
Fallocate(2) Policy
|
||||
-------------------
|
||||
@ -681,6 +710,7 @@ Allocating disk space
|
||||
However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
|
||||
fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
|
||||
zero or random data, which is useful to the below scenario where:
|
||||
|
||||
1. create(fd)
|
||||
2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
|
||||
3. fallocate(fd, 0, 0, size)
|
||||
@ -692,39 +722,41 @@ Compression implementation
|
||||
--------------------------
|
||||
|
||||
- New term named cluster is defined as basic unit of compression, file can
|
||||
be divided into multiple clusters logically. One cluster includes 4 << n
|
||||
(n >= 0) logical pages, compression size is also cluster size, each of
|
||||
cluster can be compressed or not.
|
||||
be divided into multiple clusters logically. One cluster includes 4 << n
|
||||
(n >= 0) logical pages, compression size is also cluster size, each of
|
||||
cluster can be compressed or not.
|
||||
|
||||
- In cluster metadata layout, one special block address is used to indicate
|
||||
cluster is compressed one or normal one, for compressed cluster, following
|
||||
metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
|
||||
stores data including compress header and compressed data.
|
||||
cluster is compressed one or normal one, for compressed cluster, following
|
||||
metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
|
||||
stores data including compress header and compressed data.
|
||||
|
||||
- In order to eliminate write amplification during overwrite, F2FS only
|
||||
support compression on write-once file, data can be compressed only when
|
||||
all logical blocks in file are valid and cluster compress ratio is lower
|
||||
than specified threshold.
|
||||
support compression on write-once file, data can be compressed only when
|
||||
all logical blocks in file are valid and cluster compress ratio is lower
|
||||
than specified threshold.
|
||||
|
||||
- To enable compression on regular inode, there are three ways:
|
||||
* chattr +c file
|
||||
* chattr +c dir; touch dir/file
|
||||
* mount w/ -o compress_extension=ext; touch file.ext
|
||||
|
||||
Compress metadata layout:
|
||||
[Dnode Structure]
|
||||
+-----------------------------------------------+
|
||||
| cluster 1 | cluster 2 | ......... | cluster N |
|
||||
+-----------------------------------------------+
|
||||
. . . .
|
||||
. . . .
|
||||
. Compressed Cluster . . Normal Cluster .
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
+-------------+-------------+----------+----------------------------+
|
||||
| data length | data chksum | reserved | compressed data |
|
||||
+-------------+-------------+----------+----------------------------+
|
||||
* chattr +c file
|
||||
* chattr +c dir; touch dir/file
|
||||
* mount w/ -o compress_extension=ext; touch file.ext
|
||||
|
||||
Compress metadata layout::
|
||||
|
||||
[Dnode Structure]
|
||||
+-----------------------------------------------+
|
||||
| cluster 1 | cluster 2 | ......... | cluster N |
|
||||
+-----------------------------------------------+
|
||||
. . . .
|
||||
. . . .
|
||||
. Compressed Cluster . . Normal Cluster .
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
|
||||
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|
||||
. .
|
||||
. .
|
||||
. .
|
||||
+-------------+-------------+----------+----------------------------+
|
||||
| data length | data chksum | reserved | compressed data |
|
||||
+-------------+-------------+----------+----------------------------+
|
@ -1,14 +1,18 @@
|
||||
uevents and GFS2
|
||||
==================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
uevents and GFS2
|
||||
================
|
||||
|
||||
During the lifetime of a GFS2 mount, a number of uevents are generated.
|
||||
This document explains what the events are and what they are used
|
||||
for (by gfs_controld in gfs2-utils).
|
||||
|
||||
A list of GFS2 uevents
|
||||
-----------------------
|
||||
======================
|
||||
|
||||
1. ADD
|
||||
------
|
||||
|
||||
The ADD event occurs at mount time. It will always be the first
|
||||
uevent generated by the newly created filesystem. If the mount
|
||||
@ -21,6 +25,7 @@ with no journal assigned), and read-only (with journal assigned) status
|
||||
of the filesystem respectively.
|
||||
|
||||
2. ONLINE
|
||||
---------
|
||||
|
||||
The ONLINE uevent is generated after a successful mount or remount. It
|
||||
has the same environment variables as the ADD uevent. The ONLINE
|
||||
@ -29,6 +34,7 @@ RDONLY are a relatively recent addition (2.6.32-rc+) and will not
|
||||
be generated by older kernels.
|
||||
|
||||
3. CHANGE
|
||||
---------
|
||||
|
||||
The CHANGE uevent is used in two places. One is when reporting the
|
||||
successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
|
||||
@ -52,6 +58,7 @@ cluster. For this reason the ONLINE uevent was used when adding a new
|
||||
uevent for a successful mount or remount.
|
||||
|
||||
4. OFFLINE
|
||||
----------
|
||||
|
||||
The OFFLINE uevent is only generated due to filesystem errors and is used
|
||||
as part of the "withdraw" mechanism. Currently this doesn't give any
|
||||
@ -59,6 +66,7 @@ information about what the error is, which is something that needs to
|
||||
be fixed.
|
||||
|
||||
5. REMOVE
|
||||
---------
|
||||
|
||||
The REMOVE uevent is generated at the end of an unsuccessful mount
|
||||
or at the end of a umount of the filesystem. All REMOVE uevents will
|
||||
@ -68,9 +76,10 @@ kobject subsystem.
|
||||
|
||||
|
||||
Information common to all GFS2 uevents (uevent environment variables)
|
||||
----------------------------------------------------------------------
|
||||
=====================================================================
|
||||
|
||||
1. LOCKTABLE=
|
||||
--------------
|
||||
|
||||
The LOCKTABLE is a string, as supplied on the mount command
|
||||
line (locktable=) or via fstab. It is used as a filesystem label
|
||||
@ -78,6 +87,7 @@ as well as providing the information for a lock_dlm mount to be
|
||||
able to join the cluster.
|
||||
|
||||
2. LOCKPROTO=
|
||||
-------------
|
||||
|
||||
The LOCKPROTO is a string, and its value depends on what is set
|
||||
on the mount command line, or via fstab. It will be either
|
||||
@ -85,12 +95,14 @@ lock_nolock or lock_dlm. In the future other lock managers
|
||||
may be supported.
|
||||
|
||||
3. JOURNALID=
|
||||
-------------
|
||||
|
||||
If a journal is in use by the filesystem (journals are not
|
||||
assigned for spectator mounts) then this will give the
|
||||
numeric journal id in all GFS2 uevents.
|
||||
|
||||
4. UUID=
|
||||
--------
|
||||
|
||||
With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
|
||||
into the filesystem superblock. If it exists, this will
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
Global File System
|
||||
------------------
|
||||
==================
|
||||
|
||||
https://fedorahosted.org/cluster/wiki/HomePage
|
||||
|
||||
@ -14,16 +17,18 @@ on one machine show up immediately on all other machines in the cluster.
|
||||
GFS uses interchangeable inter-node locking mechanisms, the currently
|
||||
supported mechanisms are:
|
||||
|
||||
lock_nolock -- allows gfs to be used as a local file system
|
||||
lock_nolock
|
||||
- allows gfs to be used as a local file system
|
||||
|
||||
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
||||
The dlm is found at linux/fs/dlm/
|
||||
lock_dlm
|
||||
- uses a distributed lock manager (dlm) for inter-node locking.
|
||||
The dlm is found at linux/fs/dlm/
|
||||
|
||||
Lock_dlm depends on user space cluster management systems found
|
||||
at the URL above.
|
||||
|
||||
To use gfs as a local file system, no external clustering systems are
|
||||
needed, simply:
|
||||
needed, simply::
|
||||
|
||||
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
||||
$ mount -t gfs2 /dev/block_device /dir
|
||||
@ -37,9 +42,12 @@ GFS2 is not on-disk compatible with previous versions of GFS, but it
|
||||
is pretty close.
|
||||
|
||||
The following man pages can be found at the URL above:
|
||||
|
||||
============ =============================================
|
||||
fsck.gfs2 to repair a filesystem
|
||||
gfs2_grow to expand a filesystem online
|
||||
gfs2_jadd to add journals to a filesystem online
|
||||
tunegfs2 to manipulate, examine and tune a filesystem
|
||||
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||
mkfs.gfs2 to make a filesystem
|
||||
============ =============================================
|
@ -1,11 +1,16 @@
|
||||
Note: This filesystem doesn't have a maintainer.
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================================
|
||||
Macintosh HFS Filesystem for Linux
|
||||
==================================
|
||||
|
||||
HFS stands for ``Hierarchical File System'' and is the filesystem used
|
||||
|
||||
.. Note:: This filesystem doesn't have a maintainer.
|
||||
|
||||
|
||||
HFS stands for ``Hierarchical File System`` and is the filesystem used
|
||||
by the Mac Plus and all later Macintosh models. Earlier Macintosh
|
||||
models used MFS (``Macintosh File System''), which is not supported,
|
||||
models used MFS (``Macintosh File System``), which is not supported,
|
||||
MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
|
||||
HFS but is extended in various areas. Use the hfsplus filesystem driver
|
||||
to access such filesystems from Linux.
|
||||
@ -49,25 +54,25 @@ Writing to HFS Filesystems
|
||||
HFS is not a UNIX filesystem, thus it does not have the usual features you'd
|
||||
expect:
|
||||
|
||||
o You can't modify the set-uid, set-gid, sticky or executable bits or the uid
|
||||
* You can't modify the set-uid, set-gid, sticky or executable bits or the uid
|
||||
and gid of files.
|
||||
o You can't create hard- or symlinks, device files, sockets or FIFOs.
|
||||
* You can't create hard- or symlinks, device files, sockets or FIFOs.
|
||||
|
||||
HFS does on the other have the concepts of multiple forks per file. These
|
||||
non-standard forks are represented as hidden additional files in the normal
|
||||
filesystems namespace which is kind of a cludge and makes the semantics for
|
||||
the a little strange:
|
||||
|
||||
o You can't create, delete or rename resource forks of files or the
|
||||
* You can't create, delete or rename resource forks of files or the
|
||||
Finder's metadata.
|
||||
o They are however created (with default values), deleted and renamed
|
||||
* They are however created (with default values), deleted and renamed
|
||||
along with the corresponding data fork or directory.
|
||||
o Copying files to a different filesystem will loose those attributes
|
||||
* Copying files to a different filesystem will loose those attributes
|
||||
that are essential for MacOS to work.
|
||||
|
||||
|
||||
Creating HFS filesystems
|
||||
===================================
|
||||
========================
|
||||
|
||||
The hfsutils package from Robert Leslie contains a program called
|
||||
hformat that can be used to create HFS filesystem. See
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================================
|
||||
Macintosh HFSPlus Filesystem for Linux
|
||||
======================================
|
||||
|
@ -1,13 +1,21 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================
|
||||
Read/Write HPFS 2.09
|
||||
====================
|
||||
|
||||
1998-2004, Mikulas Patocka
|
||||
|
||||
email: mikulas@artax.karlin.mff.cuni.cz
|
||||
homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
|
||||
:email: mikulas@artax.karlin.mff.cuni.cz
|
||||
:homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
|
||||
|
||||
CREDITS:
|
||||
Credits
|
||||
=======
|
||||
Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
|
||||
is taken from it
|
||||
|
||||
Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
|
||||
|
||||
Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
|
||||
|
||||
Mount options
|
||||
@ -50,6 +58,7 @@ timeshift=(-)nnn (default 0)
|
||||
|
||||
|
||||
File names
|
||||
==========
|
||||
|
||||
As in OS/2, filenames are case insensitive. However, shell thinks that names
|
||||
are case sensitive, so for example when you create a file FOO, you can use
|
||||
@ -64,6 +73,7 @@ access it under names 'a.', 'a..', 'a . . . ' etc.
|
||||
|
||||
|
||||
Extended attributes
|
||||
===================
|
||||
|
||||
On HPFS partitions, OS/2 can associate to each file a special information called
|
||||
extended attributes. Extended attributes are pairs of (key,value) where key is
|
||||
@ -88,6 +98,7 @@ values doesn't work.
|
||||
|
||||
|
||||
Symlinks
|
||||
========
|
||||
|
||||
You can do symlinks on HPFS partition, symlinks are achieved by setting extended
|
||||
attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
|
||||
@ -101,6 +112,7 @@ to analyze or change OS2SYS.INI.
|
||||
|
||||
|
||||
Codepages
|
||||
=========
|
||||
|
||||
HPFS can contain several uppercasing tables for several codepages and each
|
||||
file has a pointer to codepage its name is in. However OS/2 was created in
|
||||
@ -128,6 +140,7 @@ this codepage - if you don't try to do what I described above :-)
|
||||
|
||||
|
||||
Known bugs
|
||||
==========
|
||||
|
||||
HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
|
||||
should work. If you have OS/2 server, use only read-only mode. I don't know how
|
||||
@ -152,7 +165,8 @@ would result in directory tree splitting, that takes disk space. Workaround is
|
||||
to delete other files that are leaf (probability that the file is non-leaf is
|
||||
about 1/50) or to truncate file first to make some space.
|
||||
You encounter this problem only if you have many directories so that
|
||||
preallocated directory band is full i.e.
|
||||
preallocated directory band is full i.e.::
|
||||
|
||||
number_of_directories / size_of_filesystem_in_mb > 4.
|
||||
|
||||
You can't delete open directories.
|
||||
@ -174,6 +188,7 @@ anybody know what does it mean?
|
||||
|
||||
|
||||
What does "unbalanced tree" message mean?
|
||||
=========================================
|
||||
|
||||
Old versions of this driver created sometimes unbalanced dnode trees. OS/2
|
||||
chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
|
||||
@ -187,6 +202,7 @@ whole created by this driver, it is BUG - let me know about it.
|
||||
|
||||
|
||||
Bugs in OS/2
|
||||
============
|
||||
|
||||
When you have two (or more) lost directories pointing each to other, chkdsk
|
||||
locks up when repairing filesystem.
|
||||
@ -199,98 +215,139 @@ File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
|
||||
marks them as short (and writes "minor fs error corrected"). This bug is not in
|
||||
HPFS386.
|
||||
|
||||
Codepage bugs described above.
|
||||
Codepage bugs described above
|
||||
=============================
|
||||
|
||||
If you don't install fixpacks, there are many, many more...
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
0.90 First public release
|
||||
0.91 Fixed bug that caused shooting to memory when write_inode was called on
|
||||
open inode (rarely happened)
|
||||
0.92 Fixed a little memory leak in freeing directory inodes
|
||||
0.93 Fixed bug that locked up the machine when there were too many filenames
|
||||
with first 15 characters same
|
||||
Fixed write_file to zero file when writing behind file end
|
||||
0.94 Fixed a little memory leak when trying to delete busy file or directory
|
||||
0.95 Fixed a bug that i_hpfs_parent_dir was not updated when moving files
|
||||
1.90 First version for 2.1.1xx kernels
|
||||
1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk
|
||||
Fixed a race-condition when write_inode is called while deleting file
|
||||
Fixed a bug that could possibly happen (with very low probability) when
|
||||
using 0xff in filenames
|
||||
Rewritten locking to avoid race-conditions
|
||||
Mount option 'eas' now works
|
||||
Fsync no longer returns error
|
||||
Files beginning with '.' are marked hidden
|
||||
Remount support added
|
||||
Alloc is not so slow when filesystem becomes full
|
||||
Atimes are no more updated because it slows down operation
|
||||
Code cleanup (removed all commented debug prints)
|
||||
1.92 Corrected a bug when sync was called just before closing file
|
||||
1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it
|
||||
works with previous versions
|
||||
Fixed a possible problem with disks > 64G (but I don't have one, so I can't
|
||||
test it)
|
||||
Fixed a file overflow at 2G
|
||||
Added new option 'timeshift'
|
||||
Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
|
||||
read-only mode
|
||||
Fixed a bug that slowed down alloc and prevented allocating 100% space
|
||||
(this bug was not destructive)
|
||||
1.94 Added workaround for one bug in Linux
|
||||
Fixed one buffer leak
|
||||
Fixed some incompatibilities with large extended attributes (but it's still
|
||||
not 100% ok, I have no info on it and OS/2 doesn't want to create them)
|
||||
Rewritten allocation
|
||||
Fixed a bug with i_blocks (du sometimes didn't display correct values)
|
||||
Directories have no longer archive attribute set (some programs don't like
|
||||
it)
|
||||
Fixed a bug that it set badly one flag in large anode tree (it was not
|
||||
destructive)
|
||||
1.95 Fixed one buffer leak, that could happen on corrupted filesystem
|
||||
Fixed one bug in allocation in 1.94
|
||||
1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
|
||||
error sometimes when opening directories in PMSHELL)
|
||||
Fixed a possible bitmap race
|
||||
Fixed possible problem on large disks
|
||||
You can now delete open files
|
||||
Fixed a nondestructive race in rename
|
||||
1.97 Support for HPFS v3 (on large partitions)
|
||||
Fixed a bug that it didn't allow creation of files > 128M (it should be 2G)
|
||||
====== =========================================================================
|
||||
0.90 First public release
|
||||
0.91 Fixed bug that caused shooting to memory when write_inode was called on
|
||||
open inode (rarely happened)
|
||||
0.92 Fixed a little memory leak in freeing directory inodes
|
||||
0.93 Fixed bug that locked up the machine when there were too many filenames
|
||||
with first 15 characters same
|
||||
Fixed write_file to zero file when writing behind file end
|
||||
0.94 Fixed a little memory leak when trying to delete busy file or directory
|
||||
0.95 Fixed a bug that i_hpfs_parent_dir was not updated when moving files
|
||||
1.90 First version for 2.1.1xx kernels
|
||||
1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk
|
||||
Fixed a race-condition when write_inode is called while deleting file
|
||||
Fixed a bug that could possibly happen (with very low probability) when
|
||||
using 0xff in filenames.
|
||||
|
||||
Rewritten locking to avoid race-conditions
|
||||
|
||||
Mount option 'eas' now works
|
||||
|
||||
Fsync no longer returns error
|
||||
|
||||
Files beginning with '.' are marked hidden
|
||||
|
||||
Remount support added
|
||||
|
||||
Alloc is not so slow when filesystem becomes full
|
||||
|
||||
Atimes are no more updated because it slows down operation
|
||||
|
||||
Code cleanup (removed all commented debug prints)
|
||||
1.92 Corrected a bug when sync was called just before closing file
|
||||
1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it
|
||||
works with previous versions
|
||||
|
||||
Fixed a possible problem with disks > 64G (but I don't have one, so I can't
|
||||
test it)
|
||||
|
||||
Fixed a file overflow at 2G
|
||||
|
||||
Added new option 'timeshift'
|
||||
|
||||
Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
|
||||
read-only mode
|
||||
|
||||
Fixed a bug that slowed down alloc and prevented allocating 100% space
|
||||
(this bug was not destructive)
|
||||
1.94 Added workaround for one bug in Linux
|
||||
|
||||
Fixed one buffer leak
|
||||
|
||||
Fixed some incompatibilities with large extended attributes (but it's still
|
||||
not 100% ok, I have no info on it and OS/2 doesn't want to create them)
|
||||
|
||||
Rewritten allocation
|
||||
|
||||
Fixed a bug with i_blocks (du sometimes didn't display correct values)
|
||||
|
||||
Directories have no longer archive attribute set (some programs don't like
|
||||
it)
|
||||
|
||||
Fixed a bug that it set badly one flag in large anode tree (it was not
|
||||
destructive)
|
||||
1.95 Fixed one buffer leak, that could happen on corrupted filesystem
|
||||
|
||||
Fixed one bug in allocation in 1.94
|
||||
1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
|
||||
error sometimes when opening directories in PMSHELL)
|
||||
|
||||
Fixed a possible bitmap race
|
||||
|
||||
Fixed possible problem on large disks
|
||||
|
||||
You can now delete open files
|
||||
|
||||
Fixed a nondestructive race in rename
|
||||
1.97 Support for HPFS v3 (on large partitions)
|
||||
|
||||
ZFixed a bug that it didn't allow creation of files > 128M
|
||||
(it should be 2G)
|
||||
1.97.1 Changed names of global symbols
|
||||
|
||||
Fixed a bug when chmoding or chowning root directory
|
||||
1.98 Fixed a deadlock when using old_readdir
|
||||
Better directory handling; workaround for "unbalanced tree" bug in OS/2
|
||||
1.99 Corrected a possible problem when there's not enough space while deleting
|
||||
file
|
||||
Now it tries to truncate the file if there's not enough space when deleting
|
||||
Removed a lot of redundant code
|
||||
2.00 Fixed a bug in rename (it was there since 1.96)
|
||||
Better anti-fragmentation strategy
|
||||
2.01 Fixed problem with directory listing over NFS
|
||||
Directory lseek now checks for proper parameters
|
||||
Fixed race-condition in buffer code - it is in all filesystems in Linux;
|
||||
when reading device (cat /dev/hda) while creating files on it, files
|
||||
could be damaged
|
||||
2.02 Workaround for bug in breada in Linux. breada could cause accesses beyond
|
||||
end of partition
|
||||
2.03 Char, block devices and pipes are correctly created
|
||||
Fixed non-crashing race in unlink (Alexander Viro)
|
||||
Now it works with Japanese version of OS/2
|
||||
2.04 Fixed error when ftruncate used to extend file
|
||||
2.05 Fixed crash when got mount parameters without =
|
||||
Fixed crash when allocation of anode failed due to full disk
|
||||
Fixed some crashes when block io or inode allocation failed
|
||||
2.06 Fixed some crash on corrupted disk structures
|
||||
Better allocation strategy
|
||||
Reschedule points added so that it doesn't lock CPU long time
|
||||
It should work in read-only mode on Warp Server
|
||||
2.07 More fixes for Warp Server. Now it really works
|
||||
2.08 Creating new files is not so slow on large disks
|
||||
An attempt to sync deleted file does not generate filesystem error
|
||||
2.09 Fixed error on extremely fragmented files
|
||||
1.98 Fixed a deadlock when using old_readdir
|
||||
Better directory handling; workaround for "unbalanced tree" bug in OS/2
|
||||
1.99 Corrected a possible problem when there's not enough space while deleting
|
||||
file
|
||||
|
||||
Now it tries to truncate the file if there's not enough space when
|
||||
deleting
|
||||
|
||||
vim: set textwidth=80:
|
||||
Removed a lot of redundant code
|
||||
2.00 Fixed a bug in rename (it was there since 1.96)
|
||||
Better anti-fragmentation strategy
|
||||
2.01 Fixed problem with directory listing over NFS
|
||||
|
||||
Directory lseek now checks for proper parameters
|
||||
|
||||
Fixed race-condition in buffer code - it is in all filesystems in Linux;
|
||||
when reading device (cat /dev/hda) while creating files on it, files
|
||||
could be damaged
|
||||
2.02 Workaround for bug in breada in Linux. breada could cause accesses beyond
|
||||
end of partition
|
||||
2.03 Char, block devices and pipes are correctly created
|
||||
|
||||
Fixed non-crashing race in unlink (Alexander Viro)
|
||||
|
||||
Now it works with Japanese version of OS/2
|
||||
2.04 Fixed error when ftruncate used to extend file
|
||||
2.05 Fixed crash when got mount parameters without =
|
||||
|
||||
Fixed crash when allocation of anode failed due to full disk
|
||||
|
||||
Fixed some crashes when block io or inode allocation failed
|
||||
2.06 Fixed some crash on corrupted disk structures
|
||||
|
||||
Better allocation strategy
|
||||
|
||||
Reschedule points added so that it doesn't lock CPU long time
|
||||
|
||||
It should work in read-only mode on Warp Server
|
||||
2.07 More fixes for Warp Server. Now it really works
|
||||
2.08 Creating new files is not so slow on large disks
|
||||
|
||||
An attempt to sync deleted file does not generate filesystem error
|
||||
2.09 Fixed error on extremely fragmented files
|
||||
====== =========================================================================
|
@ -46,9 +46,53 @@ Documentation for filesystem implementations.
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
9p
|
||||
adfs
|
||||
affs
|
||||
afs
|
||||
autofs
|
||||
autofs-mount-control
|
||||
befs
|
||||
bfs
|
||||
btrfs
|
||||
ceph
|
||||
cramfs
|
||||
debugfs
|
||||
dlmfs
|
||||
ecryptfs
|
||||
efivarfs
|
||||
erofs
|
||||
ext2
|
||||
ext3
|
||||
f2fs
|
||||
gfs2
|
||||
gfs2-uevents
|
||||
hfs
|
||||
hfsplus
|
||||
hpfs
|
||||
fuse
|
||||
inotify
|
||||
isofs
|
||||
nilfs2
|
||||
nfs/index
|
||||
ntfs
|
||||
ocfs2
|
||||
ocfs2-online-filecheck
|
||||
omfs
|
||||
orangefs
|
||||
overlayfs
|
||||
proc
|
||||
qnx6
|
||||
ramfs-rootfs-initramfs
|
||||
relay
|
||||
romfs
|
||||
squashfs
|
||||
sysfs
|
||||
sysv-fs
|
||||
tmpfs
|
||||
ubifs
|
||||
ubifs-authentication.rst
|
||||
udf
|
||||
virtiofs
|
||||
vfat
|
||||
nfs/index
|
||||
zonefs
|
||||
|
@ -1,27 +1,36 @@
|
||||
inotify
|
||||
a powerful yet simple file change notification system
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============================================================
|
||||
Inotify - A Powerful yet Simple File Change Notification System
|
||||
===============================================================
|
||||
|
||||
|
||||
|
||||
Document started 15 Mar 2005 by Robert Love <rml@novell.com>
|
||||
|
||||
Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
|
||||
--Deleted obsoleted interface, just refer to manpages for user interface.
|
||||
|
||||
- Deleted obsoleted interface, just refer to manpages for user interface.
|
||||
|
||||
(i) Rationale
|
||||
|
||||
Q: What is the design decision behind not tying the watch to the open fd of
|
||||
Q:
|
||||
What is the design decision behind not tying the watch to the open fd of
|
||||
the watched object?
|
||||
|
||||
A: Watches are associated with an open inotify device, not an open file.
|
||||
A:
|
||||
Watches are associated with an open inotify device, not an open file.
|
||||
This solves the primary problem with dnotify: keeping the file open pins
|
||||
the file and thus, worse, pins the mount. Dnotify is therefore infeasible
|
||||
for use on a desktop system with removable media as the media cannot be
|
||||
unmounted. Watching a file should not require that it be open.
|
||||
|
||||
Q: What is the design decision behind using an-fd-per-instance as opposed to
|
||||
Q:
|
||||
What is the design decision behind using an-fd-per-instance as opposed to
|
||||
an fd-per-watch?
|
||||
|
||||
A: An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
A:
|
||||
An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
more fd's than are feasible to manage, and more fd's than are optimally
|
||||
select()-able. Yes, root can bump the per-process fd limit and yes, users
|
||||
can use epoll, but requiring both is a silly and extraneous requirement.
|
||||
@ -29,8 +38,8 @@ A: An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
spaces is thus sensible. The current design is what user-space developers
|
||||
want: Users initialize inotify, once, and add n watches, requiring but one
|
||||
fd and no twiddling with fd limits. Initializing an inotify instance two
|
||||
thousand times is silly. If we can implement user-space's preferences
|
||||
cleanly--and we can, the idr layer makes stuff like this trivial--then we
|
||||
thousand times is silly. If we can implement user-space's preferences
|
||||
cleanly--and we can, the idr layer makes stuff like this trivial--then we
|
||||
should.
|
||||
|
||||
There are other good arguments. With a single fd, there is a single
|
||||
@ -65,9 +74,11 @@ A: An fd-per-watch quickly consumes more file descriptors than are allowed,
|
||||
need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
|
||||
process can easily want more than one queue.
|
||||
|
||||
Q: Why the system call approach?
|
||||
Q:
|
||||
Why the system call approach?
|
||||
|
||||
A: The poor user-space interface is the second biggest problem with dnotify.
|
||||
A:
|
||||
The poor user-space interface is the second biggest problem with dnotify.
|
||||
Signals are a terrible, terrible interface for file notification. Or for
|
||||
anything, for that matter. The ideal solution, from all perspectives, is a
|
||||
file descriptor-based one that allows basic file I/O and poll/select.
|
64
Documentation/filesystems/isofs.rst
Normal file
64
Documentation/filesystems/isofs.rst
Normal file
@ -0,0 +1,64 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
ISO9660 Filesystem
|
||||
==================
|
||||
|
||||
Mount options that are the same as for msdos and vfat partitions.
|
||||
|
||||
========= ========================================================
|
||||
gid=nnn All files in the partition will be in group nnn.
|
||||
uid=nnn All files in the partition will be owned by user id nnn.
|
||||
umask=nnn The permission mask (see umask(1)) for the partition.
|
||||
========= ========================================================
|
||||
|
||||
Mount options that are the same as vfat partitions. These are only useful
|
||||
when using discs encoded using Microsoft's Joliet extensions.
|
||||
|
||||
============== =============================================================
|
||||
iocharset=name Character set to use for converting from Unicode to
|
||||
ASCII. Joliet filenames are stored in Unicode format, but
|
||||
Unix for the most part doesn't know how to deal with Unicode.
|
||||
There is also an option of doing UTF-8 translations with the
|
||||
utf8 option.
|
||||
utf8 Encode Unicode names in UTF-8 format. Default is no.
|
||||
============== =============================================================
|
||||
|
||||
Mount options unique to the isofs filesystem.
|
||||
|
||||
================= ============================================================
|
||||
block=512 Set the block size for the disk to 512 bytes
|
||||
block=1024 Set the block size for the disk to 1024 bytes
|
||||
block=2048 Set the block size for the disk to 2048 bytes
|
||||
check=relaxed Matches filenames with different cases
|
||||
check=strict Matches only filenames with the exact same case
|
||||
cruft Try to handle badly formatted CDs.
|
||||
map=off Do not map non-Rock Ridge filenames to lower case
|
||||
map=normal Map non-Rock Ridge filenames to lower case
|
||||
map=acorn As map=normal but also apply Acorn extensions if present
|
||||
mode=xxx Sets the permissions on files to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
overriderockperm Set permissions on files and directories according to
|
||||
'mode' and 'dmode' even though Rock Ridge extensions are
|
||||
present.
|
||||
nojoliet Ignore Joliet extensions if they are present.
|
||||
norock Ignore Rock Ridge extensions if they are present.
|
||||
hide Completely strip hidden files from the file system.
|
||||
showassoc Show files marked with the 'associated' bit
|
||||
unhide Deprecated; showing hidden files is now default;
|
||||
If given, it is a synonym for 'showassoc' which will
|
||||
recreate previous unhide behavior
|
||||
session=x Select number of session on multisession CD
|
||||
sbsector=xxx Session begins from sector xxx
|
||||
================= ============================================================
|
||||
|
||||
Recommended documents about ISO 9660 standard are located at:
|
||||
|
||||
- http://www.y-adagio.com/
|
||||
- ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
|
||||
|
||||
Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically
|
||||
identical with ISO 9660.", so it is a valid and gratis substitute of the
|
||||
official ISO specification.
|
@ -1,48 +0,0 @@
|
||||
Mount options that are the same as for msdos and vfat partitions.
|
||||
|
||||
gid=nnn All files in the partition will be in group nnn.
|
||||
uid=nnn All files in the partition will be owned by user id nnn.
|
||||
umask=nnn The permission mask (see umask(1)) for the partition.
|
||||
|
||||
Mount options that are the same as vfat partitions. These are only useful
|
||||
when using discs encoded using Microsoft's Joliet extensions.
|
||||
iocharset=name Character set to use for converting from Unicode to
|
||||
ASCII. Joliet filenames are stored in Unicode format, but
|
||||
Unix for the most part doesn't know how to deal with Unicode.
|
||||
There is also an option of doing UTF-8 translations with the
|
||||
utf8 option.
|
||||
utf8 Encode Unicode names in UTF-8 format. Default is no.
|
||||
|
||||
Mount options unique to the isofs filesystem.
|
||||
block=512 Set the block size for the disk to 512 bytes
|
||||
block=1024 Set the block size for the disk to 1024 bytes
|
||||
block=2048 Set the block size for the disk to 2048 bytes
|
||||
check=relaxed Matches filenames with different cases
|
||||
check=strict Matches only filenames with the exact same case
|
||||
cruft Try to handle badly formatted CDs.
|
||||
map=off Do not map non-Rock Ridge filenames to lower case
|
||||
map=normal Map non-Rock Ridge filenames to lower case
|
||||
map=acorn As map=normal but also apply Acorn extensions if present
|
||||
mode=xxx Sets the permissions on files to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
overriderockperm Set permissions on files and directories according to
|
||||
'mode' and 'dmode' even though Rock Ridge extensions are
|
||||
present.
|
||||
nojoliet Ignore Joliet extensions if they are present.
|
||||
norock Ignore Rock Ridge extensions if they are present.
|
||||
hide Completely strip hidden files from the file system.
|
||||
showassoc Show files marked with the 'associated' bit
|
||||
unhide Deprecated; showing hidden files is now default;
|
||||
If given, it is a synonym for 'showassoc' which will
|
||||
recreate previous unhide behavior
|
||||
session=x Select number of session on multisession CD
|
||||
sbsector=xxx Session begins from sector xxx
|
||||
|
||||
Recommended documents about ISO 9660 standard are located at:
|
||||
http://www.y-adagio.com/
|
||||
ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
|
||||
Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically
|
||||
identical with ISO 9660.", so it is a valid and gratis substitute of the
|
||||
official ISO specification.
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======
|
||||
NILFS2
|
||||
------
|
||||
======
|
||||
|
||||
NILFS2 is a log-structured file system (LFS) supporting continuous
|
||||
snapshotting. In addition to versioning capability of the entire file
|
||||
@ -25,9 +28,9 @@ available from the following download page. At least "mkfs.nilfs2",
|
||||
cleaner or garbage collector) are required. Details on the tools are
|
||||
described in the man pages included in the package.
|
||||
|
||||
Project web page: https://nilfs.sourceforge.io/
|
||||
Download page: https://nilfs.sourceforge.io/en/download.html
|
||||
List info: http://vger.kernel.org/vger-lists.html#linux-nilfs
|
||||
:Project web page: https://nilfs.sourceforge.io/
|
||||
:Download page: https://nilfs.sourceforge.io/en/download.html
|
||||
:List info: http://vger.kernel.org/vger-lists.html#linux-nilfs
|
||||
|
||||
Caveats
|
||||
=======
|
||||
@ -47,6 +50,7 @@ Mount options
|
||||
NILFS2 supports the following mount options:
|
||||
(*) == default
|
||||
|
||||
======================= =======================================================
|
||||
barrier(*) This enables/disables the use of write barriers. This
|
||||
nobarrier requires an IO stack which can support barriers, and
|
||||
if nilfs gets an error on a barrier write, it will
|
||||
@ -79,6 +83,7 @@ discard This enables/disables the use of discard/TRIM commands.
|
||||
nodiscard(*) The discard/TRIM commands are sent to the underlying
|
||||
block device when blocks are freed. This is useful
|
||||
for SSD devices and sparse/thinly-provisioned LUNs.
|
||||
======================= =======================================================
|
||||
|
||||
Ioctls
|
||||
======
|
||||
@ -87,9 +92,11 @@ There is some NILFS2 specific functionality which can be accessed by application
|
||||
through the system call interfaces. The list of all NILFS2 specific ioctls are
|
||||
shown in the table below.
|
||||
|
||||
Table of NILFS2 specific ioctls
|
||||
..............................................................................
|
||||
Table of NILFS2 specific ioctls:
|
||||
|
||||
============================== ===============================================
|
||||
Ioctl Description
|
||||
============================== ===============================================
|
||||
NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between
|
||||
checkpoint and snapshot state. This ioctl is
|
||||
used in chcp and mkcp utilities.
|
||||
@ -142,11 +149,12 @@ Table of NILFS2 specific ioctls
|
||||
NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and
|
||||
upper limit of segments in bytes. This ioctl
|
||||
is used by nilfs_resize utility.
|
||||
============================== ===============================================
|
||||
|
||||
NILFS2 usage
|
||||
============
|
||||
|
||||
To use nilfs2 as a local file system, simply:
|
||||
To use nilfs2 as a local file system, simply::
|
||||
|
||||
# mkfs -t nilfs2 /dev/block_device
|
||||
# mount -t nilfs2 /dev/block_device /dir
|
||||
@ -157,18 +165,20 @@ This will also invoke the cleaner through the mount helper program
|
||||
Checkpoints and snapshots are managed by the following commands.
|
||||
Their manpages are included in the nilfs-utils package above.
|
||||
|
||||
==== ===========================================================
|
||||
lscp list checkpoints or snapshots.
|
||||
mkcp make a checkpoint or a snapshot.
|
||||
chcp change an existing checkpoint to a snapshot or vice versa.
|
||||
rmcp invalidate specified checkpoint(s).
|
||||
==== ===========================================================
|
||||
|
||||
To mount a snapshot,
|
||||
To mount a snapshot::
|
||||
|
||||
# mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
|
||||
|
||||
where <cno> is the checkpoint number of the snapshot.
|
||||
|
||||
To unmount the NILFS2 mount point or snapshot, simply:
|
||||
To unmount the NILFS2 mount point or snapshot, simply::
|
||||
|
||||
# umount /dir
|
||||
|
||||
@ -181,7 +191,7 @@ Disk format
|
||||
A nilfs2 volume is equally divided into a number of segments except
|
||||
for the super block (SB) and segment #0. A segment is the container
|
||||
of logs. Each log is composed of summary information blocks, payload
|
||||
blocks, and an optional super root block (SR):
|
||||
blocks, and an optional super root block (SR)::
|
||||
|
||||
______________________________________________________
|
||||
| |SB| | Segment | Segment | Segment | ... | Segment | |
|
||||
@ -200,7 +210,7 @@ blocks, and an optional super root block (SR):
|
||||
|_blocks__|_________________|__|
|
||||
|
||||
The payload blocks are organized per file, and each file consists of
|
||||
data blocks and B-tree node blocks:
|
||||
data blocks and B-tree node blocks::
|
||||
|
||||
|<--- File-A --->|<--- File-B --->|
|
||||
_______________________________________________________________
|
||||
@ -213,7 +223,7 @@ files without data blocks or B-tree node blocks.
|
||||
|
||||
The organization of the blocks is recorded in the summary information
|
||||
blocks, which contains a header structure (nilfs_segment_summary), per
|
||||
file structures (nilfs_finfo), and per block structures (nilfs_binfo):
|
||||
file structures (nilfs_finfo), and per block structures (nilfs_binfo)::
|
||||
|
||||
_________________________________________________________________________
|
||||
| Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
|
||||
@ -223,7 +233,7 @@ file structures (nilfs_finfo), and per block structures (nilfs_binfo):
|
||||
The logs include regular files, directory files, symbolic link files
|
||||
and several meta data files. The mata data files are the files used
|
||||
to maintain file system meta data. The current version of NILFS2 uses
|
||||
the following meta data files:
|
||||
the following meta data files::
|
||||
|
||||
1) Inode file (ifile) -- Stores on-disk inodes
|
||||
2) Checkpoint file (cpfile) -- Stores checkpoints
|
||||
@ -232,7 +242,7 @@ the following meta data files:
|
||||
(DAT) block numbers. This file serves to
|
||||
make on-disk blocks relocatable.
|
||||
|
||||
The following figure shows a typical organization of the logs:
|
||||
The following figure shows a typical organization of the logs::
|
||||
|
||||
_________________________________________________________________________
|
||||
| Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR|
|
||||
@ -250,7 +260,7 @@ three special inodes, inodes for the DAT, cpfile, and sufile. Inodes
|
||||
of regular files, directories, symlinks and other special files, are
|
||||
included in the ifile. The inode of ifile itself is included in the
|
||||
corresponding checkpoint entry in the cpfile. Thus, the hierarchy
|
||||
among NILFS2 files can be depicted as follows:
|
||||
among NILFS2 files can be depicted as follows::
|
||||
|
||||
Super block (SB)
|
||||
|
|
@ -1,19 +1,21 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================
|
||||
The Linux NTFS filesystem driver
|
||||
================================
|
||||
|
||||
|
||||
Table of contents
|
||||
=================
|
||||
.. Table of contents
|
||||
|
||||
- Overview
|
||||
- Web site
|
||||
- Features
|
||||
- Supported mount options
|
||||
- Known bugs and (mis-)features
|
||||
- Using NTFS volume and stripe sets
|
||||
- The Device-Mapper driver
|
||||
- The Software RAID / MD driver
|
||||
- Limitations when using the MD driver
|
||||
- Overview
|
||||
- Web site
|
||||
- Features
|
||||
- Supported mount options
|
||||
- Known bugs and (mis-)features
|
||||
- Using NTFS volume and stripe sets
|
||||
- The Device-Mapper driver
|
||||
- The Software RAID / MD driver
|
||||
- Limitations when using the MD driver
|
||||
|
||||
|
||||
Overview
|
||||
@ -66,8 +68,10 @@ Features
|
||||
partition by creating a large file while in Windows and then loopback
|
||||
mounting the file while in Linux and creating a Linux filesystem on it that
|
||||
is used to install Linux on it.
|
||||
- A comparison of the two drivers using:
|
||||
- A comparison of the two drivers using::
|
||||
|
||||
time find . -type f -exec md5sum "{}" \;
|
||||
|
||||
run three times in sequence with each driver (after a reboot) on a 1.4GiB
|
||||
NTFS partition, showed the new driver to be 20% faster in total time elapsed
|
||||
(from 9:43 minutes on average down to 7:53). The time spent in user space
|
||||
@ -104,6 +108,7 @@ In addition to the generic mount options described by the manual page for the
|
||||
mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
|
||||
following mount options:
|
||||
|
||||
======================= =======================================================
|
||||
iocharset=name Deprecated option. Still supported but please use
|
||||
nls=name in the future. See description for nls=name.
|
||||
|
||||
@ -175,16 +180,22 @@ disable_sparse=<BOOL> If disable_sparse is specified, creation of sparse
|
||||
|
||||
errors=opt What to do when critical filesystem errors are found.
|
||||
Following values can be used for "opt":
|
||||
continue: DEFAULT, try to clean-up as much as
|
||||
|
||||
======== =========================================
|
||||
continue DEFAULT, try to clean-up as much as
|
||||
possible, e.g. marking a corrupt inode as
|
||||
bad so it is no longer accessed, and then
|
||||
continue.
|
||||
recover: At present only supported is recovery of
|
||||
recover At present only supported is recovery of
|
||||
the boot sector from the backup copy.
|
||||
If read-only mount, the recovery is done
|
||||
in memory only and not written to disk.
|
||||
Note that the options are additive, i.e. specifying:
|
||||
======== =========================================
|
||||
|
||||
Note that the options are additive, i.e. specifying::
|
||||
|
||||
errors=continue,errors=recover
|
||||
|
||||
means the driver will attempt to recover and if that
|
||||
fails it will clean-up as much as possible and
|
||||
continue.
|
||||
@ -202,12 +213,18 @@ mft_zone_multiplier= Set the MFT zone multiplier for the volume (this
|
||||
In general use the default. If you have a lot of small
|
||||
files then use a higher value. The values have the
|
||||
following meaning:
|
||||
|
||||
===== =================================
|
||||
Value MFT zone size (% of volume size)
|
||||
===== =================================
|
||||
1 12.5%
|
||||
2 25%
|
||||
3 37.5%
|
||||
4 50%
|
||||
===== =================================
|
||||
|
||||
Note this option is irrelevant for read-only mounts.
|
||||
======================= =======================================================
|
||||
|
||||
|
||||
Known bugs and (mis-)features
|
||||
@ -252,18 +269,18 @@ To create the table describing your volume you will need to know each of its
|
||||
components and their sizes in sectors, i.e. multiples of 512-byte blocks.
|
||||
|
||||
For NT4 fault tolerant volumes you can obtain the sizes using fdisk. So for
|
||||
example if one of your partitions is /dev/hda2 you would do:
|
||||
example if one of your partitions is /dev/hda2 you would do::
|
||||
|
||||
$ fdisk -ul /dev/hda
|
||||
$ fdisk -ul /dev/hda
|
||||
|
||||
Disk /dev/hda: 81.9 GB, 81964302336 bytes
|
||||
255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
|
||||
Units = sectors of 1 * 512 = 512 bytes
|
||||
Disk /dev/hda: 81.9 GB, 81964302336 bytes
|
||||
255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
|
||||
Units = sectors of 1 * 512 = 512 bytes
|
||||
|
||||
Device Boot Start End Blocks Id System
|
||||
/dev/hda1 * 63 4209029 2104483+ 83 Linux
|
||||
/dev/hda2 4209030 37768814 16779892+ 86 NTFS
|
||||
/dev/hda3 37768815 46170809 4200997+ 83 Linux
|
||||
Device Boot Start End Blocks Id System
|
||||
/dev/hda1 * 63 4209029 2104483+ 83 Linux
|
||||
/dev/hda2 4209030 37768814 16779892+ 86 NTFS
|
||||
/dev/hda3 37768815 46170809 4200997+ 83 Linux
|
||||
|
||||
And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
|
||||
33559785 sectors.
|
||||
@ -271,15 +288,17 @@ And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
|
||||
For Win2k and later dynamic disks, you can for example use the ldminfo utility
|
||||
which is part of the Linux LDM tools (the latest version at the time of
|
||||
writing is linux-ldm-0.0.8.tar.bz2). You can download it from:
|
||||
|
||||
http://www.linux-ntfs.org/
|
||||
|
||||
Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
|
||||
into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You
|
||||
will find the precompiled (i386) ldminfo utility there. NOTE: You will not be
|
||||
able to compile this yourself easily so use the binary version!
|
||||
|
||||
Then you would use ldminfo in dump mode to obtain the necessary information:
|
||||
Then you would use ldminfo in dump mode to obtain the necessary information::
|
||||
|
||||
$ ./ldminfo --dump /dev/hda
|
||||
$ ./ldminfo --dump /dev/hda
|
||||
|
||||
This would dump the LDM database found on /dev/hda which describes all of your
|
||||
dynamic disks and all the volumes on them. At the bottom you will see the
|
||||
@ -305,42 +324,36 @@ give you the correct information to do this.
|
||||
Assuming you know all your devices and their sizes things are easy.
|
||||
|
||||
For a linear raid the table would look like this (note all values are in
|
||||
512-byte sectors):
|
||||
512-byte sectors)::
|
||||
|
||||
--- cut here ---
|
||||
# Offset into Size of this Raid type Device Start sector
|
||||
# volume device of device
|
||||
0 1028161 linear /dev/hda1 0
|
||||
1028161 3903762 linear /dev/hdb2 0
|
||||
4931923 2103211 linear /dev/hdc1 0
|
||||
--- cut here ---
|
||||
# Offset into Size of this Raid type Device Start sector
|
||||
# volume device of device
|
||||
0 1028161 linear /dev/hda1 0
|
||||
1028161 3903762 linear /dev/hdb2 0
|
||||
4931923 2103211 linear /dev/hdc1 0
|
||||
|
||||
For a striped volume, i.e. raid level 0, you will need to know the chunk size
|
||||
you used when creating the volume. Windows uses 64kiB as the default, so it
|
||||
will probably be this unless you changes the defaults when creating the array.
|
||||
|
||||
For a raid level 0 the table would look like this (note all values are in
|
||||
512-byte sectors):
|
||||
512-byte sectors)::
|
||||
|
||||
--- cut here ---
|
||||
# Offset Size Raid Number Chunk 1st Start 2nd Start
|
||||
# into of the type of size Device in Device in
|
||||
# volume volume stripes device device
|
||||
0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0
|
||||
--- cut here ---
|
||||
# Offset Size Raid Number Chunk 1st Start 2nd Start
|
||||
# into of the type of size Device in Device in
|
||||
# volume volume stripes device device
|
||||
0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0
|
||||
|
||||
If there are more than two devices, just add each of them to the end of the
|
||||
line.
|
||||
|
||||
Finally, for a mirrored volume, i.e. raid level 1, the table would look like
|
||||
this (note all values are in 512-byte sectors):
|
||||
this (note all values are in 512-byte sectors)::
|
||||
|
||||
--- cut here ---
|
||||
# Ofs Size Raid Log Number Region Should Number Source Start Target Start
|
||||
# in of the type type of log size sync? of Device in Device in
|
||||
# vol volume params mirrors Device Device
|
||||
0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0
|
||||
--- cut here ---
|
||||
# Ofs Size Raid Log Number Region Should Number Source Start Target Start
|
||||
# in of the type type of log size sync? of Device in Device in
|
||||
# vol volume params mirrors Device Device
|
||||
0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0
|
||||
|
||||
If you are mirroring to multiple devices you can specify further targets at the
|
||||
end of the line.
|
||||
@ -353,17 +366,17 @@ to the "Target Device" or if you specified multiple target devices to all of
|
||||
them.
|
||||
|
||||
Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
|
||||
and hand it over to dmsetup to work with, like so:
|
||||
and hand it over to dmsetup to work with, like so::
|
||||
|
||||
$ dmsetup create myvolume1 /etc/ntfsvolume1
|
||||
$ dmsetup create myvolume1 /etc/ntfsvolume1
|
||||
|
||||
You can obviously replace "myvolume1" with whatever name you like.
|
||||
|
||||
If it all worked, you will now have the device /dev/device-mapper/myvolume1
|
||||
which you can then just use as an argument to the mount command as usual to
|
||||
mount the ntfs volume. For example:
|
||||
mount the ntfs volume. For example::
|
||||
|
||||
$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
|
||||
$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
|
||||
|
||||
(You need to create the directory /mnt/myvol1 first and of course you can use
|
||||
anything you like instead of /mnt/myvol1 as long as it is an existing
|
||||
@ -395,18 +408,18 @@ Windows by default uses a stripe chunk size of 64k, so you probably want the
|
||||
"chunk-size 64k" option for each raid-disk, too.
|
||||
|
||||
For example, if you have a stripe set consisting of two partitions /dev/hda5
|
||||
and /dev/hdb1 your /etc/raidtab would look like this:
|
||||
and /dev/hdb1 your /etc/raidtab would look like this::
|
||||
|
||||
raiddev /dev/md0
|
||||
raid-level 0
|
||||
nr-raid-disks 2
|
||||
nr-spare-disks 0
|
||||
persistent-superblock 0
|
||||
chunk-size 64k
|
||||
device /dev/hda5
|
||||
raid-disk 0
|
||||
device /dev/hdb1
|
||||
raid-disk 1
|
||||
raiddev /dev/md0
|
||||
raid-level 0
|
||||
nr-raid-disks 2
|
||||
nr-spare-disks 0
|
||||
persistent-superblock 0
|
||||
chunk-size 64k
|
||||
device /dev/hda5
|
||||
raid-disk 0
|
||||
device /dev/hdb1
|
||||
raid-disk 1
|
||||
|
||||
For linear raid, just change the raid-level above to "raid-level linear", for
|
||||
mirrors, change it to "raid-level 1", and for stripe sets with parity, change
|
||||
@ -427,7 +440,9 @@ Once the raidtab is setup, run for example raid0run -a to start all devices or
|
||||
raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
|
||||
|
||||
Then just use the mount command as usual to mount the ntfs volume using for
|
||||
example: mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
|
||||
example::
|
||||
|
||||
mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
|
||||
|
||||
It is advisable to do the mount read-only to see if the md volume has been
|
||||
setup correctly to avoid the possibility of causing damage to the data on the
|
@ -1,5 +1,8 @@
|
||||
OCFS2 online file check
|
||||
-----------------------
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================
|
||||
OCFS2 file system - online file check
|
||||
=====================================
|
||||
|
||||
This document will describe OCFS2 online file check feature.
|
||||
|
||||
@ -40,7 +43,7 @@ When there are errors in the OCFS2 filesystem, they are usually accompanied
|
||||
by the inode number which caused the error. This inode number would be the
|
||||
input to check/fix the file.
|
||||
|
||||
There is a sysfs directory for each OCFS2 file system mounting:
|
||||
There is a sysfs directory for each OCFS2 file system mounting::
|
||||
|
||||
/sys/fs/ocfs2/<devname>/filecheck
|
||||
|
||||
@ -50,34 +53,36 @@ communicate with kernel space, tell which file(inode number) will be checked or
|
||||
fixed. Currently, three operations are supported, which includes checking
|
||||
inode, fixing inode and setting the size of result record history.
|
||||
|
||||
1. If you want to know what error exactly happened to <inode> before fixing, do
|
||||
1. If you want to know what error exactly happened to <inode> before fixing, do::
|
||||
|
||||
# echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
|
||||
# cat /sys/fs/ocfs2/<devname>/filecheck/check
|
||||
# echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
|
||||
# cat /sys/fs/ocfs2/<devname>/filecheck/check
|
||||
|
||||
The output is like this:
|
||||
INO DONE ERROR
|
||||
39502 1 GENERATION
|
||||
The output is like this::
|
||||
|
||||
<INO> lists the inode numbers.
|
||||
<DONE> indicates whether the operation has been finished.
|
||||
<ERROR> says what kind of errors was found. For the detailed error numbers,
|
||||
please refer to the file linux/fs/ocfs2/filecheck.h.
|
||||
INO DONE ERROR
|
||||
39502 1 GENERATION
|
||||
|
||||
2. If you determine to fix this inode, do
|
||||
<INO> lists the inode numbers.
|
||||
<DONE> indicates whether the operation has been finished.
|
||||
<ERROR> says what kind of errors was found. For the detailed error numbers,
|
||||
please refer to the file linux/fs/ocfs2/filecheck.h.
|
||||
|
||||
# echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
|
||||
# cat /sys/fs/ocfs2/<devname>/filecheck/fix
|
||||
2. If you determine to fix this inode, do::
|
||||
|
||||
The output is like this:
|
||||
INO DONE ERROR
|
||||
39502 1 SUCCESS
|
||||
# echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
|
||||
# cat /sys/fs/ocfs2/<devname>/filecheck/fix
|
||||
|
||||
The output is like this:::
|
||||
|
||||
INO DONE ERROR
|
||||
39502 1 SUCCESS
|
||||
|
||||
This time, the <ERROR> column indicates whether this fix is successful or not.
|
||||
|
||||
3. The record cache is used to store the history of check/fix results. It's
|
||||
default size is 10, and can be adjust between the range of 10 ~ 100. You can
|
||||
adjust the size like this:
|
||||
adjust the size like this::
|
||||
|
||||
# echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set
|
||||
|
@ -1,5 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
OCFS2 filesystem
|
||||
==================
|
||||
================
|
||||
|
||||
OCFS2 is a general purpose extent based shared disk cluster file
|
||||
system with many similarities to ext3. It supports 64 bit inode
|
||||
numbers, and has automatically extending metadata groups which may
|
||||
@ -14,22 +18,26 @@ OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
|
||||
|
||||
All code copyright 2005 Oracle except when otherwise noted.
|
||||
|
||||
CREDITS:
|
||||
Credits
|
||||
=======
|
||||
|
||||
Lots of code taken from ext3 and other projects.
|
||||
|
||||
Authors in alphabetical order:
|
||||
Joel Becker <joel.becker@oracle.com>
|
||||
Zach Brown <zach.brown@oracle.com>
|
||||
Mark Fasheh <mfasheh@suse.com>
|
||||
Kurt Hackel <kurt.hackel@oracle.com>
|
||||
Tao Ma <tao.ma@oracle.com>
|
||||
Sunil Mushran <sunil.mushran@oracle.com>
|
||||
Manish Singh <manish.singh@oracle.com>
|
||||
Tiger Yang <tiger.yang@oracle.com>
|
||||
|
||||
- Joel Becker <joel.becker@oracle.com>
|
||||
- Zach Brown <zach.brown@oracle.com>
|
||||
- Mark Fasheh <mfasheh@suse.com>
|
||||
- Kurt Hackel <kurt.hackel@oracle.com>
|
||||
- Tao Ma <tao.ma@oracle.com>
|
||||
- Sunil Mushran <sunil.mushran@oracle.com>
|
||||
- Manish Singh <manish.singh@oracle.com>
|
||||
- Tiger Yang <tiger.yang@oracle.com>
|
||||
|
||||
Caveats
|
||||
=======
|
||||
Features which OCFS2 does not support yet:
|
||||
|
||||
- Directory change notification (F_NOTIFY)
|
||||
- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
|
||||
|
||||
@ -37,8 +45,10 @@ Mount options
|
||||
=============
|
||||
|
||||
OCFS2 supports the following mount options:
|
||||
|
||||
(*) == default
|
||||
|
||||
======================= ========================================================
|
||||
barrier=1 This enables/disables barriers. barrier=0 disables it,
|
||||
barrier=1 enables it.
|
||||
errors=remount-ro(*) Remount the filesystem read-only on an error.
|
||||
@ -104,3 +114,4 @@ journal_async_commit Commit block can be written to disk without waiting
|
||||
for descriptor blocks. If enabled older kernels cannot
|
||||
mount the device. This will enable 'journal_checksum'
|
||||
internally.
|
||||
======================= ========================================================
|
112
Documentation/filesystems/omfs.rst
Normal file
112
Documentation/filesystems/omfs.rst
Normal file
@ -0,0 +1,112 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================
|
||||
Optimized MPEG Filesystem (OMFS)
|
||||
================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
|
||||
and Rio Karma MP3 player. The filesystem is extent-based, utilizing
|
||||
block sizes from 2k to 8k, with hash-based directories. This
|
||||
filesystem driver may be used to read and write disks from these
|
||||
devices.
|
||||
|
||||
Note, it is not recommended that this FS be used in place of a general
|
||||
filesystem for your own streaming media device. Native Linux filesystems
|
||||
will likely perform better.
|
||||
|
||||
More information is available at:
|
||||
|
||||
http://linux-karma.sf.net/
|
||||
|
||||
Various utilities, including mkomfs and omfsck, are included with
|
||||
omfsprogs, available at:
|
||||
|
||||
http://bobcopeland.com/karma/
|
||||
|
||||
Instructions are included in its README.
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
OMFS supports the following mount-time options:
|
||||
|
||||
============ ========================================
|
||||
uid=n make all files owned by specified user
|
||||
gid=n make all files owned by specified group
|
||||
umask=xxx set permission umask to xxx
|
||||
fmask=xxx set umask to xxx for files
|
||||
dmask=xxx set umask to xxx for directories
|
||||
============ ========================================
|
||||
|
||||
Disk format
|
||||
===========
|
||||
|
||||
OMFS discriminates between "sysblocks" and normal data blocks. The sysblock
|
||||
group consists of super block information, file metadata, directory structures,
|
||||
and extents. Each sysblock has a header containing CRCs of the entire
|
||||
sysblock, and may be mirrored in successive blocks on the disk. A sysblock may
|
||||
have a smaller size than a data block, but since they are both addressed by the
|
||||
same 64-bit block number, any remaining space in the smaller sysblock is
|
||||
unused.
|
||||
|
||||
Sysblock header information::
|
||||
|
||||
struct omfs_header {
|
||||
__be64 h_self; /* FS block where this is located */
|
||||
__be32 h_body_size; /* size of useful data after header */
|
||||
__be16 h_crc; /* crc-ccitt of body_size bytes */
|
||||
char h_fill1[2];
|
||||
u8 h_version; /* version, always 1 */
|
||||
char h_type; /* OMFS_INODE_X */
|
||||
u8 h_magic; /* OMFS_IMAGIC */
|
||||
u8 h_check_xor; /* XOR of header bytes before this */
|
||||
__be32 h_fill2;
|
||||
};
|
||||
|
||||
Files and directories are both represented by omfs_inode::
|
||||
|
||||
struct omfs_inode {
|
||||
struct omfs_header i_head; /* header */
|
||||
__be64 i_parent; /* parent containing this inode */
|
||||
__be64 i_sibling; /* next inode in hash bucket */
|
||||
__be64 i_ctime; /* ctime, in milliseconds */
|
||||
char i_fill1[35];
|
||||
char i_type; /* OMFS_[DIR,FILE] */
|
||||
__be32 i_fill2;
|
||||
char i_fill3[64];
|
||||
char i_name[OMFS_NAMELEN]; /* filename */
|
||||
__be64 i_size; /* size of file, in bytes */
|
||||
};
|
||||
|
||||
Directories in OMFS are implemented as a large hash table. Filenames are
|
||||
hashed then prepended into the bucket list beginning at OMFS_DIR_START.
|
||||
Lookup requires hashing the filename, then seeking across i_sibling pointers
|
||||
until a match is found on i_name. Empty buckets are represented by block
|
||||
pointers with all-1s (~0).
|
||||
|
||||
A file is an omfs_inode structure followed by an extent table beginning at
|
||||
OMFS_EXTENT_START::
|
||||
|
||||
struct omfs_extent_entry {
|
||||
__be64 e_cluster; /* start location of a set of blocks */
|
||||
__be64 e_blocks; /* number of blocks after e_cluster */
|
||||
};
|
||||
|
||||
struct omfs_extent {
|
||||
__be64 e_next; /* next extent table location */
|
||||
__be32 e_extent_count; /* total # extents in this table */
|
||||
__be32 e_fill;
|
||||
struct omfs_extent_entry e_entry; /* start of extent entries */
|
||||
};
|
||||
|
||||
Each extent holds the block offset followed by number of blocks allocated to
|
||||
the extent. The final extent in each table is a terminator with e_cluster
|
||||
being ~0 and e_blocks being ones'-complement of the total number of blocks
|
||||
in the table.
|
||||
|
||||
If this table overflows, a continuation inode is written and pointed to by
|
||||
e_next. These have a header but lack the rest of the inode structure.
|
||||
|
@ -1,106 +0,0 @@
|
||||
Optimized MPEG Filesystem (OMFS)
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
|
||||
and Rio Karma MP3 player. The filesystem is extent-based, utilizing
|
||||
block sizes from 2k to 8k, with hash-based directories. This
|
||||
filesystem driver may be used to read and write disks from these
|
||||
devices.
|
||||
|
||||
Note, it is not recommended that this FS be used in place of a general
|
||||
filesystem for your own streaming media device. Native Linux filesystems
|
||||
will likely perform better.
|
||||
|
||||
More information is available at:
|
||||
|
||||
http://linux-karma.sf.net/
|
||||
|
||||
Various utilities, including mkomfs and omfsck, are included with
|
||||
omfsprogs, available at:
|
||||
|
||||
http://bobcopeland.com/karma/
|
||||
|
||||
Instructions are included in its README.
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
OMFS supports the following mount-time options:
|
||||
|
||||
uid=n - make all files owned by specified user
|
||||
gid=n - make all files owned by specified group
|
||||
umask=xxx - set permission umask to xxx
|
||||
fmask=xxx - set umask to xxx for files
|
||||
dmask=xxx - set umask to xxx for directories
|
||||
|
||||
Disk format
|
||||
===========
|
||||
|
||||
OMFS discriminates between "sysblocks" and normal data blocks. The sysblock
|
||||
group consists of super block information, file metadata, directory structures,
|
||||
and extents. Each sysblock has a header containing CRCs of the entire
|
||||
sysblock, and may be mirrored in successive blocks on the disk. A sysblock may
|
||||
have a smaller size than a data block, but since they are both addressed by the
|
||||
same 64-bit block number, any remaining space in the smaller sysblock is
|
||||
unused.
|
||||
|
||||
Sysblock header information:
|
||||
|
||||
struct omfs_header {
|
||||
__be64 h_self; /* FS block where this is located */
|
||||
__be32 h_body_size; /* size of useful data after header */
|
||||
__be16 h_crc; /* crc-ccitt of body_size bytes */
|
||||
char h_fill1[2];
|
||||
u8 h_version; /* version, always 1 */
|
||||
char h_type; /* OMFS_INODE_X */
|
||||
u8 h_magic; /* OMFS_IMAGIC */
|
||||
u8 h_check_xor; /* XOR of header bytes before this */
|
||||
__be32 h_fill2;
|
||||
};
|
||||
|
||||
Files and directories are both represented by omfs_inode:
|
||||
|
||||
struct omfs_inode {
|
||||
struct omfs_header i_head; /* header */
|
||||
__be64 i_parent; /* parent containing this inode */
|
||||
__be64 i_sibling; /* next inode in hash bucket */
|
||||
__be64 i_ctime; /* ctime, in milliseconds */
|
||||
char i_fill1[35];
|
||||
char i_type; /* OMFS_[DIR,FILE] */
|
||||
__be32 i_fill2;
|
||||
char i_fill3[64];
|
||||
char i_name[OMFS_NAMELEN]; /* filename */
|
||||
__be64 i_size; /* size of file, in bytes */
|
||||
};
|
||||
|
||||
Directories in OMFS are implemented as a large hash table. Filenames are
|
||||
hashed then prepended into the bucket list beginning at OMFS_DIR_START.
|
||||
Lookup requires hashing the filename, then seeking across i_sibling pointers
|
||||
until a match is found on i_name. Empty buckets are represented by block
|
||||
pointers with all-1s (~0).
|
||||
|
||||
A file is an omfs_inode structure followed by an extent table beginning at
|
||||
OMFS_EXTENT_START:
|
||||
|
||||
struct omfs_extent_entry {
|
||||
__be64 e_cluster; /* start location of a set of blocks */
|
||||
__be64 e_blocks; /* number of blocks after e_cluster */
|
||||
};
|
||||
|
||||
struct omfs_extent {
|
||||
__be64 e_next; /* next extent table location */
|
||||
__be32 e_extent_count; /* total # extents in this table */
|
||||
__be32 e_fill;
|
||||
struct omfs_extent_entry e_entry; /* start of extent entries */
|
||||
};
|
||||
|
||||
Each extent holds the block offset followed by number of blocks allocated to
|
||||
the extent. The final extent in each table is a terminator with e_cluster
|
||||
being ~0 and e_blocks being ones'-complement of the total number of blocks
|
||||
in the table.
|
||||
|
||||
If this table overflows, a continuation inode is written and pointed to by
|
||||
e_next. These have a header but lack the rest of the inode structure.
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========
|
||||
ORANGEFS
|
||||
========
|
||||
|
||||
@ -21,25 +24,25 @@ Orangefs features include:
|
||||
* Stateless
|
||||
|
||||
|
||||
MAILING LIST ARCHIVES
|
||||
Mailing List Archives
|
||||
=====================
|
||||
|
||||
http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
|
||||
|
||||
|
||||
MAILING LIST SUBMISSIONS
|
||||
Mailing List Submissions
|
||||
========================
|
||||
|
||||
devel@lists.orangefs.org
|
||||
|
||||
|
||||
DOCUMENTATION
|
||||
Documentation
|
||||
=============
|
||||
|
||||
http://www.orangefs.org/documentation/
|
||||
|
||||
|
||||
USERSPACE FILESYSTEM SOURCE
|
||||
Userspace Filesystem Source
|
||||
===========================
|
||||
|
||||
http://www.orangefs.org/download
|
||||
@ -48,16 +51,16 @@ Orangefs versions prior to 2.9.3 would not be compatible with the
|
||||
upstream version of the kernel client.
|
||||
|
||||
|
||||
RUNNING ORANGEFS ON A SINGLE SERVER
|
||||
Running ORANGEFS On a Single Server
|
||||
===================================
|
||||
|
||||
OrangeFS is usually run in large installations with multiple servers and
|
||||
clients, but a complete filesystem can be run on a single machine for
|
||||
development and testing.
|
||||
|
||||
On Fedora, install orangefs and orangefs-server.
|
||||
On Fedora, install orangefs and orangefs-server::
|
||||
|
||||
dnf -y install orangefs orangefs-server
|
||||
dnf -y install orangefs orangefs-server
|
||||
|
||||
There is an example server configuration file in
|
||||
/etc/orangefs/orangefs.conf. Change localhost to your hostname if
|
||||
@ -70,29 +73,29 @@ single line. Uncomment it and change the hostname if necessary. This
|
||||
controls clients which use libpvfs2. This does not control the
|
||||
pvfs2-client-core.
|
||||
|
||||
Create the filesystem.
|
||||
Create the filesystem::
|
||||
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
|
||||
Start the server.
|
||||
Start the server::
|
||||
|
||||
systemctl start orangefs-server
|
||||
systemctl start orangefs-server
|
||||
|
||||
Test the server.
|
||||
Test the server::
|
||||
|
||||
pvfs2-ping -m /pvfsmnt
|
||||
pvfs2-ping -m /pvfsmnt
|
||||
|
||||
Start the client. The module must be compiled in or loaded before this
|
||||
point.
|
||||
point::
|
||||
|
||||
systemctl start orangefs-client
|
||||
systemctl start orangefs-client
|
||||
|
||||
Mount the filesystem.
|
||||
Mount the filesystem::
|
||||
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
|
||||
|
||||
BUILDING ORANGEFS ON A SINGLE SERVER
|
||||
Building ORANGEFS on a Single Server
|
||||
====================================
|
||||
|
||||
Where OrangeFS cannot be installed from distribution packages, it may be
|
||||
@ -102,49 +105,51 @@ You can omit --prefix if you don't care that things are sprinkled around
|
||||
in /usr/local. As of version 2.9.6, OrangeFS uses Berkeley DB by
|
||||
default, we will probably be changing the default to LMDB soon.
|
||||
|
||||
./configure --prefix=/opt/ofs --with-db-backend=lmdb
|
||||
::
|
||||
|
||||
make
|
||||
./configure --prefix=/opt/ofs --with-db-backend=lmdb
|
||||
|
||||
make install
|
||||
make
|
||||
|
||||
Create an orangefs config file.
|
||||
make install
|
||||
|
||||
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
||||
Create an orangefs config file::
|
||||
|
||||
Create an /etc/pvfs2tab file.
|
||||
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
||||
|
||||
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
||||
/etc/pvfs2tab
|
||||
Create an /etc/pvfs2tab file::
|
||||
|
||||
Create the mount point you specified in the tab file if needed.
|
||||
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
||||
/etc/pvfs2tab
|
||||
|
||||
mkdir /pvfsmnt
|
||||
Create the mount point you specified in the tab file if needed::
|
||||
|
||||
Bootstrap the server.
|
||||
mkdir /pvfsmnt
|
||||
|
||||
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
|
||||
Bootstrap the server::
|
||||
|
||||
Start the server.
|
||||
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
|
||||
|
||||
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
|
||||
Start the server::
|
||||
|
||||
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
|
||||
|
||||
Now the server should be running. Pvfs2-ls is a simple
|
||||
test to verify that the server is running.
|
||||
test to verify that the server is running::
|
||||
|
||||
/opt/ofs/bin/pvfs2-ls /pvfsmnt
|
||||
/opt/ofs/bin/pvfs2-ls /pvfsmnt
|
||||
|
||||
If stuff seems to be working, load the kernel module and
|
||||
turn on the client core.
|
||||
turn on the client core::
|
||||
|
||||
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
|
||||
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
|
||||
|
||||
Mount your filesystem.
|
||||
Mount your filesystem::
|
||||
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||
|
||||
|
||||
RUNNING XFSTESTS
|
||||
Running xfstests
|
||||
================
|
||||
|
||||
It is useful to use a scratch filesystem with xfstests. This can be
|
||||
@ -159,21 +164,23 @@ Then there are two FileSystem sections: orangefs and scratch.
|
||||
|
||||
This change should be made before creating the filesystem.
|
||||
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
::
|
||||
|
||||
To run xfstests, create /etc/xfsqa.config.
|
||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||
|
||||
TEST_DIR=/orangefs
|
||||
TEST_DEV=tcp://localhost:3334/orangefs
|
||||
SCRATCH_MNT=/scratch
|
||||
SCRATCH_DEV=tcp://localhost:3334/scratch
|
||||
To run xfstests, create /etc/xfsqa.config::
|
||||
|
||||
Then xfstests can be run
|
||||
TEST_DIR=/orangefs
|
||||
TEST_DEV=tcp://localhost:3334/orangefs
|
||||
SCRATCH_MNT=/scratch
|
||||
SCRATCH_DEV=tcp://localhost:3334/scratch
|
||||
|
||||
./check -pvfs2
|
||||
Then xfstests can be run::
|
||||
|
||||
./check -pvfs2
|
||||
|
||||
|
||||
OPTIONS
|
||||
Options
|
||||
=======
|
||||
|
||||
The following mount options are accepted:
|
||||
@ -193,32 +200,32 @@ The following mount options are accepted:
|
||||
Distributed locking is being worked on for the future.
|
||||
|
||||
|
||||
DEBUGGING
|
||||
Debugging
|
||||
=========
|
||||
|
||||
If you want the debug (GOSSIP) statements in a particular
|
||||
source file (inode.c for example) go to syslog:
|
||||
source file (inode.c for example) go to syslog::
|
||||
|
||||
echo inode > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
No debugging (the default):
|
||||
No debugging (the default)::
|
||||
|
||||
echo none > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
Debugging from several source files:
|
||||
Debugging from several source files::
|
||||
|
||||
echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
All debugging:
|
||||
All debugging::
|
||||
|
||||
echo all > /sys/kernel/debug/orangefs/kernel-debug
|
||||
|
||||
Get a list of all debugging keywords:
|
||||
Get a list of all debugging keywords::
|
||||
|
||||
cat /sys/kernel/debug/orangefs/debug-help
|
||||
|
||||
|
||||
PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE
|
||||
Protocol between Kernel Module and Userspace
|
||||
============================================
|
||||
|
||||
Orangefs is a user space filesystem and an associated kernel module.
|
||||
@ -234,7 +241,8 @@ The kernel module implements a pseudo device that userspace
|
||||
can read from and write to. Userspace can also manipulate the
|
||||
kernel module through the pseudo device with ioctl.
|
||||
|
||||
THE BUFMAP:
|
||||
The Bufmap
|
||||
----------
|
||||
|
||||
At startup userspace allocates two page-size-aligned (posix_memalign)
|
||||
mlocked memory buffers, one is used for IO and one is used for readdir
|
||||
@ -250,7 +258,8 @@ copied from user space to kernel space with copy_from_user and is used
|
||||
to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
|
||||
then contains:
|
||||
|
||||
* refcnt - a reference counter
|
||||
* refcnt
|
||||
- a reference counter
|
||||
* desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
|
||||
partition size, which represents the filesystem's block size and
|
||||
is used for s_blocksize in super blocks.
|
||||
@ -259,17 +268,19 @@ then contains:
|
||||
* desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
|
||||
* total_size - the total size of the IO buffer.
|
||||
* page_count - the number of 4096 byte pages in the IO buffer.
|
||||
* page_array - a pointer to page_count * (sizeof(struct page*)) bytes
|
||||
* page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
|
||||
of kcalloced memory. This memory is used as an array of pointers
|
||||
to each of the pages in the IO buffer through a call to get_user_pages.
|
||||
* desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc))
|
||||
* desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
|
||||
bytes of kcalloced memory. This memory is further intialized:
|
||||
|
||||
user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
|
||||
structure. user_desc->ptr points to the IO buffer.
|
||||
|
||||
pages_per_desc = bufmap->desc_size / PAGE_SIZE
|
||||
offset = 0
|
||||
::
|
||||
|
||||
pages_per_desc = bufmap->desc_size / PAGE_SIZE
|
||||
offset = 0
|
||||
|
||||
bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
|
||||
bufmap->desc_array[0].array_count = pages_per_desc = 1024
|
||||
@ -293,7 +304,8 @@ then contains:
|
||||
* readdir_index_lock - a spinlock to protect readdir_index_array during
|
||||
update.
|
||||
|
||||
OPERATIONS:
|
||||
Operations
|
||||
----------
|
||||
|
||||
The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
|
||||
needs to communicate with userspace. Part of the op contains the "upcall"
|
||||
@ -308,13 +320,19 @@ in flight at any given time.
|
||||
|
||||
Ops are stateful:
|
||||
|
||||
* unknown - op was just initialized
|
||||
* waiting - op is on request_list (upward bound)
|
||||
* inprogr - op is in progress (waiting for downcall)
|
||||
* serviced - op has matching downcall; ok
|
||||
* purged - op has to start a timer since client-core
|
||||
* unknown
|
||||
- op was just initialized
|
||||
* waiting
|
||||
- op is on request_list (upward bound)
|
||||
* inprogr
|
||||
- op is in progress (waiting for downcall)
|
||||
* serviced
|
||||
- op has matching downcall; ok
|
||||
* purged
|
||||
- op has to start a timer since client-core
|
||||
exited uncleanly before servicing op
|
||||
* given up - submitter has given up waiting for it
|
||||
* given up
|
||||
- submitter has given up waiting for it
|
||||
|
||||
When some arbitrary userspace program needs to perform a
|
||||
filesystem operation on Orangefs (readdir, I/O, create, whatever)
|
||||
@ -389,10 +407,15 @@ union of structs, each of which is associated with a particular
|
||||
response type.
|
||||
|
||||
The several members outside of the union are:
|
||||
- int32_t type - type of operation.
|
||||
- int32_t status - return code for the operation.
|
||||
- int64_t trailer_size - 0 unless readdir operation.
|
||||
- char *trailer_buf - initialized to NULL, used during readdir operations.
|
||||
|
||||
``int32_t type``
|
||||
- type of operation.
|
||||
``int32_t status``
|
||||
- return code for the operation.
|
||||
``int64_t trailer_size``
|
||||
- 0 unless readdir operation.
|
||||
``char *trailer_buf``
|
||||
- initialized to NULL, used during readdir operations.
|
||||
|
||||
The appropriate member inside the union is filled out for any
|
||||
particular response.
|
||||
@ -449,18 +472,20 @@ Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
|
||||
made by the kernel side.
|
||||
|
||||
A buffer_list containing:
|
||||
|
||||
- a pointer to the prepared response to the request from the
|
||||
kernel (struct pvfs2_downcall_t).
|
||||
- and also, in the case of a readdir request, a pointer to a
|
||||
buffer containing descriptors for the objects in the target
|
||||
directory.
|
||||
|
||||
... is sent to the function (PINT_dev_write_list) which performs
|
||||
the writev.
|
||||
|
||||
PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
|
||||
|
||||
The first four elements of io_array are initialized like this for all
|
||||
responses:
|
||||
responses::
|
||||
|
||||
io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
|
||||
io_array[0].iov_len = sizeof(int32_t)
|
||||
@ -475,7 +500,7 @@ responses:
|
||||
of global variable vfs_request (vfs_request_t)
|
||||
io_array[3].iov_len = sizeof(pvfs2_downcall_t)
|
||||
|
||||
Readdir responses initialize the fifth element io_array like this:
|
||||
Readdir responses initialize the fifth element io_array like this::
|
||||
|
||||
io_array[4].iov_base = contents of member trailer_buf (char *)
|
||||
from out_downcall member of global variable
|
||||
@ -517,13 +542,13 @@ from a dentry is cheap, obtaining it from userspace is relatively expensive,
|
||||
hence the motivation to use the dentry when possible.
|
||||
|
||||
The timeout values d_time and getattr_time are jiffy based, and the
|
||||
code is designed to avoid the jiffy-wrap problem:
|
||||
code is designed to avoid the jiffy-wrap problem::
|
||||
|
||||
"In general, if the clock may have wrapped around more than once, there
|
||||
is no way to tell how much time has elapsed. However, if the times t1
|
||||
and t2 are known to be fairly close, we can reliably compute the
|
||||
difference in a way that takes into account the possibility that the
|
||||
clock may have wrapped between times."
|
||||
"In general, if the clock may have wrapped around more than once, there
|
||||
is no way to tell how much time has elapsed. However, if the times t1
|
||||
and t2 are known to be fairly close, we can reliably compute the
|
||||
difference in a way that takes into account the possibility that the
|
||||
clock may have wrapped between times."
|
||||
|
||||
from course notes by instructor Andy Wang
|
||||
from course notes by instructor Andy Wang
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================
|
||||
The QNX6 Filesystem
|
||||
===================
|
||||
|
||||
@ -14,10 +17,12 @@ Specification
|
||||
|
||||
qnx6fs shares many properties with traditional Unix filesystems. It has the
|
||||
concepts of blocks, inodes and directories.
|
||||
|
||||
On QNX it is possible to create little endian and big endian qnx6 filesystems.
|
||||
This feature makes it possible to create and use a different endianness fs
|
||||
for the target (QNX is used on quite a range of embedded systems) platform
|
||||
running on a different endianness.
|
||||
|
||||
The Linux driver handles endianness transparently. (LE and BE)
|
||||
|
||||
Blocks
|
||||
@ -26,6 +31,7 @@ Blocks
|
||||
The space in the device or file is split up into blocks. These are a fixed
|
||||
size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
|
||||
created.
|
||||
|
||||
Blockpointers are 32bit, so the maximum space that can be addressed is
|
||||
2^32 * 4096 bytes or 16TB
|
||||
|
||||
@ -50,6 +56,7 @@ Each of these root nodes holds information like total size of the stored
|
||||
data and the addressing levels in that specific tree.
|
||||
If the level value is 0, up to 16 direct blocks can be addressed by each
|
||||
node.
|
||||
|
||||
Level 1 adds an additional indirect addressing level where each indirect
|
||||
addressing block holds up to blocksize / 4 bytes pointers to data blocks.
|
||||
Level 2 adds an additional indirect addressing block level (so, already up
|
||||
@ -57,11 +64,13 @@ to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
|
||||
|
||||
Unused block pointers are always set to ~0 - regardless of root node,
|
||||
indirect addressing blocks or inodes.
|
||||
|
||||
Data leaves are always on the lowest level. So no data is stored on upper
|
||||
tree levels.
|
||||
|
||||
The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
|
||||
The Audi MMI 3G first superblock directly starts at byte 0.
|
||||
|
||||
Second superblock position can either be calculated from the superblock
|
||||
information (total number of filesystem blocks) or by taking the highest
|
||||
device address, zeroing the last 3 bytes and then subtracting 0x1000 from
|
||||
@ -84,6 +93,7 @@ Object mode field is POSIX format. (which makes things easier)
|
||||
|
||||
There are also pointers to the first 16 blocks, if the object data can be
|
||||
addressed with 16 direct blocks.
|
||||
|
||||
For more than 16 blocks an indirect addressing in form of another tree is
|
||||
used. (scheme is the same as the one used for the superblock root nodes)
|
||||
|
||||
@ -96,13 +106,18 @@ Directories
|
||||
A directory is a filesystem object and has an inode just like a file.
|
||||
It is a specially formatted file containing records which associate each
|
||||
name with an inode number.
|
||||
|
||||
'.' inode number points to the directory inode
|
||||
|
||||
'..' inode number points to the parent directory inode
|
||||
|
||||
Eeach filename record additionally got a filename length field.
|
||||
|
||||
One special case are long filenames or subdirectory names.
|
||||
|
||||
These got set a filename length field of 0xff in the corresponding directory
|
||||
record plus the longfile inode number also stored in that record.
|
||||
|
||||
With that longfilename inode number, the longfilename tree can be walked
|
||||
starting with the superblock longfilename root node pointers.
|
||||
|
||||
@ -111,6 +126,7 @@ Special files
|
||||
|
||||
Symbolic links are also filesystem objects with inodes. They got a specific
|
||||
bit in the inode mode field identifying them as symbolic link.
|
||||
|
||||
The directory entry file inode pointer points to the target file inode.
|
||||
|
||||
Hard links got an inode, a directory entry, but a specific mode bit set,
|
||||
@ -126,9 +142,11 @@ Long filenames
|
||||
|
||||
Long filenames are stored in a separate addressing tree. The staring point
|
||||
is the longfilename root node in the active superblock.
|
||||
|
||||
Each data block (tree leaves) holds one long filename. That filename is
|
||||
limited to 510 bytes. The first two starting bytes are used as length field
|
||||
for the actual filename.
|
||||
|
||||
If that structure shall fit for all allowed blocksizes, it is clear why there
|
||||
is a limit of 510 bytes for the actual filename stored.
|
||||
|
||||
@ -138,6 +156,7 @@ Bitmap
|
||||
The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap
|
||||
root node in the superblock and each bit in the bitmap represents one
|
||||
filesystem block.
|
||||
|
||||
The first block is block 0, which starts 0x1000 after superblock start.
|
||||
So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical
|
||||
address at which block 0 is located.
|
||||
@ -149,11 +168,14 @@ Bitmap system area
|
||||
------------------
|
||||
|
||||
The bitmap itself is divided into three parts.
|
||||
|
||||
First the system area, that is split into two halves.
|
||||
|
||||
Then userspace.
|
||||
|
||||
The requirement for a static, fixed preallocated system area comes from how
|
||||
qnx6fs deals with writes.
|
||||
|
||||
Each superblock got it's own half of the system area. So superblock #1
|
||||
always uses blocks from the lower half while superblock #2 just writes to
|
||||
blocks represented by the upper half bitmap system area bits.
|
@ -1,5 +1,11 @@
|
||||
ramfs, rootfs and initramfs
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================
|
||||
Ramfs, rootfs and initramfs
|
||||
===========================
|
||||
|
||||
October 17, 2005
|
||||
|
||||
Rob Landley <rob@landley.net>
|
||||
=============================
|
||||
|
||||
@ -99,14 +105,14 @@ out of that.
|
||||
All this differs from the old initrd in several ways:
|
||||
|
||||
- The old initrd was always a separate file, while the initramfs archive is
|
||||
linked into the linux kernel image. (The directory linux-*/usr is devoted
|
||||
to generating this archive during the build.)
|
||||
linked into the linux kernel image. (The directory ``linux-*/usr`` is
|
||||
devoted to generating this archive during the build.)
|
||||
|
||||
- The old initrd file was a gzipped filesystem image (in some file format,
|
||||
such as ext2, that needed a driver built into the kernel), while the new
|
||||
initramfs archive is a gzipped cpio archive (like tar only simpler,
|
||||
see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst). The
|
||||
kernel's cpio extraction code is not only extremely small, it's also
|
||||
see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).
|
||||
The kernel's cpio extraction code is not only extremely small, it's also
|
||||
__init text and data that can be discarded during the boot process.
|
||||
|
||||
- The program run by the old initrd (which was called /initrd, not /init) did
|
||||
@ -139,7 +145,7 @@ and living in usr/Kconfig) can be used to specify a source for the
|
||||
initramfs archive, which will automatically be incorporated into the
|
||||
resulting binary. This option can point to an existing gzipped cpio
|
||||
archive, a directory containing files to be archived, or a text file
|
||||
specification such as the following example:
|
||||
specification such as the following example::
|
||||
|
||||
dir /dev 755 0 0
|
||||
nod /dev/console 644 0 0 c 5 1
|
||||
@ -175,12 +181,12 @@ or extracting your own preprepared cpio files to feed to the kernel build
|
||||
(instead of a config file or directory).
|
||||
|
||||
The following command line can extract a cpio image (either by the above script
|
||||
or by the kernel build) back into its component files:
|
||||
or by the kernel build) back into its component files::
|
||||
|
||||
cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
|
||||
|
||||
The following shell script can create a prebuilt cpio archive you can
|
||||
use in place of the above config file:
|
||||
use in place of the above config file::
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
@ -202,14 +208,17 @@ use in place of the above config file:
|
||||
exit 1
|
||||
fi
|
||||
|
||||
Note: The cpio man page contains some bad advice that will break your initramfs
|
||||
archive if you follow it. It says "A typical way to generate the list
|
||||
of filenames is with the find command; you should give find the -depth option
|
||||
to minimize problems with permissions on directories that are unwritable or not
|
||||
searchable." Don't do this when creating initramfs.cpio.gz images, it won't
|
||||
work. The Linux kernel cpio extractor won't create files in a directory that
|
||||
doesn't exist, so the directory entries must go before the files that go in
|
||||
those directories. The above script gets them in the right order.
|
||||
.. Note::
|
||||
|
||||
The cpio man page contains some bad advice that will break your initramfs
|
||||
archive if you follow it. It says "A typical way to generate the list
|
||||
of filenames is with the find command; you should give find the -depth
|
||||
option to minimize problems with permissions on directories that are
|
||||
unwritable or not searchable." Don't do this when creating
|
||||
initramfs.cpio.gz images, it won't work. The Linux kernel cpio extractor
|
||||
won't create files in a directory that doesn't exist, so the directory
|
||||
entries must go before the files that go in those directories.
|
||||
The above script gets them in the right order.
|
||||
|
||||
External initramfs images:
|
||||
--------------------------
|
||||
@ -236,9 +245,10 @@ An initramfs archive is a complete self-contained root filesystem for Linux.
|
||||
If you don't already understand what shared libraries, devices, and paths
|
||||
you need to get a minimal root filesystem up and running, here are some
|
||||
references:
|
||||
http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
|
||||
http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
|
||||
http://www.linuxfromscratch.org/lfs/view/stable/
|
||||
|
||||
- http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
|
||||
- http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
|
||||
- http://www.linuxfromscratch.org/lfs/view/stable/
|
||||
|
||||
The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
|
||||
designed to be a tiny C library to statically link early userspace
|
||||
@ -255,7 +265,7 @@ name lookups, even when otherwise statically linked.)
|
||||
|
||||
A good first step is to get initramfs to run a statically linked "hello world"
|
||||
program as init, and test it under an emulator like qemu (www.qemu.org) or
|
||||
User Mode Linux, like so:
|
||||
User Mode Linux, like so::
|
||||
|
||||
cat > hello.c << EOF
|
||||
#include <stdio.h>
|
||||
@ -326,8 +336,8 @@ the above threads) is:
|
||||
|
||||
explained his reasoning:
|
||||
|
||||
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
|
||||
http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
|
||||
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
|
||||
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
|
||||
|
||||
and, most importantly, designed and implemented the initramfs code.
|
||||
|
@ -1,3 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================================
|
||||
relay interface (formerly relayfs)
|
||||
==================================
|
||||
|
||||
@ -108,6 +111,7 @@ The relay interface implements basic file operations for user space
|
||||
access to relay channel buffer data. Here are the file operations
|
||||
that are available and some comments regarding their behavior:
|
||||
|
||||
=========== ============================================================
|
||||
open() enables user to open an _existing_ channel buffer.
|
||||
|
||||
mmap() results in channel buffer being mapped into the caller's
|
||||
@ -136,13 +140,16 @@ poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
|
||||
close() decrements the channel buffer's refcount. When the refcount
|
||||
reaches 0, i.e. when no process or kernel client has the
|
||||
buffer open, the channel buffer is freed.
|
||||
=========== ============================================================
|
||||
|
||||
In order for a user application to make use of relay files, the
|
||||
host filesystem must be mounted. For example,
|
||||
host filesystem must be mounted. For example::
|
||||
|
||||
mount -t debugfs debugfs /sys/kernel/debug
|
||||
|
||||
NOTE: the host filesystem doesn't need to be mounted for kernel
|
||||
.. Note::
|
||||
|
||||
the host filesystem doesn't need to be mounted for kernel
|
||||
clients to create or use channels - it only needs to be
|
||||
mounted when user space applications need access to the buffer
|
||||
data.
|
||||
@ -154,7 +161,7 @@ The relay interface kernel API
|
||||
Here's a summary of the API the relay interface provides to in-kernel clients:
|
||||
|
||||
TBD(curr. line MT:/API/)
|
||||
channel management functions:
|
||||
channel management functions::
|
||||
|
||||
relay_open(base_filename, parent, subbuf_size, n_subbufs,
|
||||
callbacks, private_data)
|
||||
@ -162,17 +169,17 @@ TBD(curr. line MT:/API/)
|
||||
relay_flush(chan)
|
||||
relay_reset(chan)
|
||||
|
||||
channel management typically called on instigation of userspace:
|
||||
channel management typically called on instigation of userspace::
|
||||
|
||||
relay_subbufs_consumed(chan, cpu, subbufs_consumed)
|
||||
|
||||
write functions:
|
||||
write functions::
|
||||
|
||||
relay_write(chan, data, length)
|
||||
__relay_write(chan, data, length)
|
||||
relay_reserve(chan, length)
|
||||
|
||||
callbacks:
|
||||
callbacks::
|
||||
|
||||
subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
|
||||
buf_mapped(buf, filp)
|
||||
@ -180,7 +187,7 @@ TBD(curr. line MT:/API/)
|
||||
create_buf_file(filename, parent, mode, buf, is_global)
|
||||
remove_buf_file(dentry)
|
||||
|
||||
helper functions:
|
||||
helper functions::
|
||||
|
||||
relay_buf_full(buf)
|
||||
subbuf_start_reserve(buf, length)
|
||||
@ -215,41 +222,41 @@ the file(s) created in create_buf_file() and is called during
|
||||
relay_close().
|
||||
|
||||
Here are some typical definitions for these callbacks, in this case
|
||||
using debugfs:
|
||||
using debugfs::
|
||||
|
||||
/*
|
||||
* create_buf_file() callback. Creates relay file in debugfs.
|
||||
*/
|
||||
static struct dentry *create_buf_file_handler(const char *filename,
|
||||
struct dentry *parent,
|
||||
umode_t mode,
|
||||
struct rchan_buf *buf,
|
||||
int *is_global)
|
||||
{
|
||||
return debugfs_create_file(filename, mode, parent, buf,
|
||||
&relay_file_operations);
|
||||
}
|
||||
/*
|
||||
* create_buf_file() callback. Creates relay file in debugfs.
|
||||
*/
|
||||
static struct dentry *create_buf_file_handler(const char *filename,
|
||||
struct dentry *parent,
|
||||
umode_t mode,
|
||||
struct rchan_buf *buf,
|
||||
int *is_global)
|
||||
{
|
||||
return debugfs_create_file(filename, mode, parent, buf,
|
||||
&relay_file_operations);
|
||||
}
|
||||
|
||||
/*
|
||||
* remove_buf_file() callback. Removes relay file from debugfs.
|
||||
*/
|
||||
static int remove_buf_file_handler(struct dentry *dentry)
|
||||
{
|
||||
debugfs_remove(dentry);
|
||||
/*
|
||||
* remove_buf_file() callback. Removes relay file from debugfs.
|
||||
*/
|
||||
static int remove_buf_file_handler(struct dentry *dentry)
|
||||
{
|
||||
debugfs_remove(dentry);
|
||||
|
||||
return 0;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* relay interface callbacks
|
||||
*/
|
||||
static struct rchan_callbacks relay_callbacks =
|
||||
{
|
||||
.create_buf_file = create_buf_file_handler,
|
||||
.remove_buf_file = remove_buf_file_handler,
|
||||
};
|
||||
/*
|
||||
* relay interface callbacks
|
||||
*/
|
||||
static struct rchan_callbacks relay_callbacks =
|
||||
{
|
||||
.create_buf_file = create_buf_file_handler,
|
||||
.remove_buf_file = remove_buf_file_handler,
|
||||
};
|
||||
|
||||
And an example relay_open() invocation using them:
|
||||
And an example relay_open() invocation using them::
|
||||
|
||||
chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
|
||||
|
||||
@ -339,23 +346,23 @@ whether or not to actually move on to the next sub-buffer.
|
||||
|
||||
To implement 'no-overwrite' mode, the userspace client would provide
|
||||
an implementation of the subbuf_start() callback something like the
|
||||
following:
|
||||
following::
|
||||
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
void *subbuf,
|
||||
void *prev_subbuf,
|
||||
unsigned int prev_padding)
|
||||
{
|
||||
if (prev_subbuf)
|
||||
*((unsigned *)prev_subbuf) = prev_padding;
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
void *subbuf,
|
||||
void *prev_subbuf,
|
||||
unsigned int prev_padding)
|
||||
{
|
||||
if (prev_subbuf)
|
||||
*((unsigned *)prev_subbuf) = prev_padding;
|
||||
|
||||
if (relay_buf_full(buf))
|
||||
return 0;
|
||||
if (relay_buf_full(buf))
|
||||
return 0;
|
||||
|
||||
subbuf_start_reserve(buf, sizeof(unsigned int));
|
||||
subbuf_start_reserve(buf, sizeof(unsigned int));
|
||||
|
||||
return 1;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
If the current buffer is full, i.e. all sub-buffers remain unconsumed,
|
||||
the callback returns 0 to indicate that the buffer switch should not
|
||||
@ -370,20 +377,20 @@ ready sub-buffers will relay_buf_full() return 0, in which case the
|
||||
buffer switch can continue.
|
||||
|
||||
The implementation of the subbuf_start() callback for 'overwrite' mode
|
||||
would be very similar:
|
||||
would be very similar::
|
||||
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
void *subbuf,
|
||||
void *prev_subbuf,
|
||||
size_t prev_padding)
|
||||
{
|
||||
if (prev_subbuf)
|
||||
*((unsigned *)prev_subbuf) = prev_padding;
|
||||
static int subbuf_start(struct rchan_buf *buf,
|
||||
void *subbuf,
|
||||
void *prev_subbuf,
|
||||
size_t prev_padding)
|
||||
{
|
||||
if (prev_subbuf)
|
||||
*((unsigned *)prev_subbuf) = prev_padding;
|
||||
|
||||
subbuf_start_reserve(buf, sizeof(unsigned int));
|
||||
subbuf_start_reserve(buf, sizeof(unsigned int));
|
||||
|
||||
return 1;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
In this case, the relay_buf_full() check is meaningless and the
|
||||
callback always returns 1, causing the buffer switch to occur
|
@ -1,4 +1,8 @@
|
||||
ROMFS - ROM FILE SYSTEM
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
ROMFS - ROM File System
|
||||
=======================
|
||||
|
||||
This is a quite dumb, read only filesystem, mainly for initial RAM
|
||||
disks of installation disks. It has grown up by the need of having
|
||||
@ -51,9 +55,9 @@ the 16 byte padding for the name and the contents, also 16+14+15 = 45
|
||||
bytes. This is quite rare however, since most file names are longer
|
||||
than 3 bytes, and shorter than 15 bytes.
|
||||
|
||||
The layout of the filesystem is the following:
|
||||
The layout of the filesystem is the following::
|
||||
|
||||
offset content
|
||||
offset content
|
||||
|
||||
+---+---+---+---+
|
||||
0 | - | r | o | m | \
|
||||
@ -84,9 +88,9 @@ the source. This algorithm was chosen because although it's not quite
|
||||
reliable, it does not require any tables, and it is very simple.
|
||||
|
||||
The following bytes are now part of the file system; each file header
|
||||
must begin on a 16 byte boundary.
|
||||
must begin on a 16 byte boundary::
|
||||
|
||||
offset content
|
||||
offset content
|
||||
|
||||
+---+---+---+---+
|
||||
0 | next filehdr|X| The offset of the next file header
|
||||
@ -114,7 +118,9 @@ file is user and group 0, this should never be a problem for the
|
||||
intended use. The mapping of the 8 possible values to file types is
|
||||
the following:
|
||||
|
||||
== =============== ============================================
|
||||
mapping spec.info means
|
||||
== =============== ============================================
|
||||
0 hard link link destination [file header]
|
||||
1 directory first file's header
|
||||
2 regular file unused, must be zero [MBZ]
|
||||
@ -123,6 +129,7 @@ the following:
|
||||
5 char device - " -
|
||||
6 socket unused, MBZ
|
||||
7 fifo unused, MBZ
|
||||
== =============== ============================================
|
||||
|
||||
Note that hard links are specifically marked in this filesystem, but
|
||||
they will behave as you can expect (i.e. share the inode number).
|
||||
@ -158,24 +165,24 @@ to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
|
||||
Pending issues:
|
||||
|
||||
- Permissions and owner information are pretty essential features of a
|
||||
Un*x like system, but romfs does not provide the full possibilities.
|
||||
I have never found this limiting, but others might.
|
||||
Un*x like system, but romfs does not provide the full possibilities.
|
||||
I have never found this limiting, but others might.
|
||||
|
||||
- The file system is read only, so it can be very small, but in case
|
||||
one would want to write _anything_ to a file system, he still needs
|
||||
a writable file system, thus negating the size advantages. Possible
|
||||
solutions: implement write access as a compile-time option, or a new,
|
||||
similarly small writable filesystem for RAM disks.
|
||||
one would want to write _anything_ to a file system, he still needs
|
||||
a writable file system, thus negating the size advantages. Possible
|
||||
solutions: implement write access as a compile-time option, or a new,
|
||||
similarly small writable filesystem for RAM disks.
|
||||
|
||||
- Since the files are only required to have alignment on a 16 byte
|
||||
boundary, it is currently possibly suboptimal to read or execute files
|
||||
from the filesystem. It might be resolved by reordering file data to
|
||||
have most of it (i.e. except the start and the end) laying at "natural"
|
||||
boundaries, thus it would be possible to directly map a big portion of
|
||||
the file contents to the mm subsystem.
|
||||
boundary, it is currently possibly suboptimal to read or execute files
|
||||
from the filesystem. It might be resolved by reordering file data to
|
||||
have most of it (i.e. except the start and the end) laying at "natural"
|
||||
boundaries, thus it would be possible to directly map a big portion of
|
||||
the file contents to the mm subsystem.
|
||||
|
||||
- Compression might be an useful feature, but memory is quite a
|
||||
limiting factor in my eyes.
|
||||
limiting factor in my eyes.
|
||||
|
||||
- Where it is used?
|
||||
|
||||
@ -183,4 +190,5 @@ limiting factor in my eyes.
|
||||
|
||||
|
||||
Have fun,
|
||||
|
||||
Janos Farkas <chexum@shadow.banki.hu>
|
@ -1,7 +1,11 @@
|
||||
SQUASHFS 4.0 FILESYSTEM
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
Squashfs 4.0 Filesystem
|
||||
=======================
|
||||
|
||||
Squashfs is a compressed read-only filesystem for Linux.
|
||||
|
||||
It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
|
||||
directories. Inodes in the system are very small and all blocks are packed to
|
||||
minimise data overhead. Block sizes greater than 4K are supported up to a
|
||||
@ -15,31 +19,33 @@ needed.
|
||||
Mailing list: squashfs-devel@lists.sourceforge.net
|
||||
Web site: www.squashfs.org
|
||||
|
||||
1. FILESYSTEM FEATURES
|
||||
1. Filesystem Features
|
||||
----------------------
|
||||
|
||||
Squashfs filesystem features versus Cramfs:
|
||||
|
||||
============================== ========= ==========
|
||||
Squashfs Cramfs
|
||||
|
||||
Max filesystem size: 2^64 256 MiB
|
||||
Max file size: ~ 2 TiB 16 MiB
|
||||
Max files: unlimited unlimited
|
||||
Max directories: unlimited unlimited
|
||||
Max entries per directory: unlimited unlimited
|
||||
Max block size: 1 MiB 4 KiB
|
||||
Metadata compression: yes no
|
||||
Directory indexes: yes no
|
||||
Sparse file support: yes no
|
||||
Tail-end packing (fragments): yes no
|
||||
Exportable (NFS etc.): yes no
|
||||
Hard link support: yes no
|
||||
"." and ".." in readdir: yes no
|
||||
Real inode numbers: yes no
|
||||
32-bit uids/gids: yes no
|
||||
File creation time: yes no
|
||||
Xattr support: yes no
|
||||
ACL support: no no
|
||||
============================== ========= ==========
|
||||
Max filesystem size 2^64 256 MiB
|
||||
Max file size ~ 2 TiB 16 MiB
|
||||
Max files unlimited unlimited
|
||||
Max directories unlimited unlimited
|
||||
Max entries per directory unlimited unlimited
|
||||
Max block size 1 MiB 4 KiB
|
||||
Metadata compression yes no
|
||||
Directory indexes yes no
|
||||
Sparse file support yes no
|
||||
Tail-end packing (fragments) yes no
|
||||
Exportable (NFS etc.) yes no
|
||||
Hard link support yes no
|
||||
"." and ".." in readdir yes no
|
||||
Real inode numbers yes no
|
||||
32-bit uids/gids yes no
|
||||
File creation time yes no
|
||||
Xattr support yes no
|
||||
ACL support no no
|
||||
============================== ========= ==========
|
||||
|
||||
Squashfs compresses data, inodes and directories. In addition, inode and
|
||||
directory data are highly compacted, and packed on byte boundaries. Each
|
||||
@ -47,7 +53,7 @@ compressed inode is on average 8 bytes in length (the exact length varies on
|
||||
file type, i.e. regular file, directory, symbolic link, and block/char device
|
||||
inodes have different sizes).
|
||||
|
||||
2. USING SQUASHFS
|
||||
2. Using Squashfs
|
||||
-----------------
|
||||
|
||||
As squashfs is a read-only filesystem, the mksquashfs program must be used to
|
||||
@ -58,11 +64,11 @@ obtained from this site also.
|
||||
The squashfs-tools development tree is now located on kernel.org
|
||||
git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
|
||||
|
||||
3. SQUASHFS FILESYSTEM DESIGN
|
||||
3. Squashfs Filesystem Design
|
||||
-----------------------------
|
||||
|
||||
A squashfs filesystem consists of a maximum of nine parts, packed together on a
|
||||
byte alignment:
|
||||
byte alignment::
|
||||
|
||||
---------------
|
||||
| superblock |
|
||||
@ -229,15 +235,15 @@ location of the xattr list inside each inode, a 32-bit xattr id
|
||||
is stored. This xattr id is mapped into the location of the xattr
|
||||
list using a second xattr id lookup table.
|
||||
|
||||
4. TODOS AND OUTSTANDING ISSUES
|
||||
4. TODOs and Outstanding Issues
|
||||
-------------------------------
|
||||
|
||||
4.1 Todo list
|
||||
4.1 TODO list
|
||||
-------------
|
||||
|
||||
Implement ACL support.
|
||||
|
||||
4.2 Squashfs internal cache
|
||||
4.2 Squashfs Internal Cache
|
||||
---------------------------
|
||||
|
||||
Blocks in Squashfs are compressed. To avoid repeatedly decompressing
|
@ -1,32 +1,36 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
sysfs - _The_ filesystem for exporting kernel objects.
|
||||
=====================================================
|
||||
sysfs - _The_ filesystem for exporting kernel objects
|
||||
=====================================================
|
||||
|
||||
Patrick Mochel <mochel@osdl.org>
|
||||
|
||||
Mike Murphy <mamurph@cs.clemson.edu>
|
||||
|
||||
Revised: 16 August 2011
|
||||
Original: 10 January 2003
|
||||
:Revised: 16 August 2011
|
||||
:Original: 10 January 2003
|
||||
|
||||
|
||||
What it is:
|
||||
~~~~~~~~~~~
|
||||
|
||||
sysfs is a ram-based filesystem initially based on ramfs. It provides
|
||||
a means to export kernel data structures, their attributes, and the
|
||||
linkages between them to userspace.
|
||||
a means to export kernel data structures, their attributes, and the
|
||||
linkages between them to userspace.
|
||||
|
||||
sysfs is tied inherently to the kobject infrastructure. Please read
|
||||
Documentation/kobject.txt for more information concerning the kobject
|
||||
interface.
|
||||
interface.
|
||||
|
||||
|
||||
Using sysfs
|
||||
~~~~~~~~~~~
|
||||
|
||||
sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
|
||||
it by doing:
|
||||
it by doing::
|
||||
|
||||
mount -t sysfs sysfs /sys
|
||||
mount -t sysfs sysfs /sys
|
||||
|
||||
|
||||
Directory Creation
|
||||
@ -37,7 +41,7 @@ created for it in sysfs. That directory is created as a subdirectory
|
||||
of the kobject's parent, expressing internal object hierarchies to
|
||||
userspace. Top-level directories in sysfs represent the common
|
||||
ancestors of object hierarchies; i.e. the subsystems the objects
|
||||
belong to.
|
||||
belong to.
|
||||
|
||||
Sysfs internally stores a pointer to the kobject that implements a
|
||||
directory in the kernfs_node object associated with the directory. In
|
||||
@ -58,63 +62,63 @@ attributes.
|
||||
Attributes should be ASCII text files, preferably with only one value
|
||||
per file. It is noted that it may not be efficient to contain only one
|
||||
value per file, so it is socially acceptable to express an array of
|
||||
values of the same type.
|
||||
values of the same type.
|
||||
|
||||
Mixing types, expressing multiple lines of data, and doing fancy
|
||||
formatting of data is heavily frowned upon. Doing these things may get
|
||||
you publicly humiliated and your code rewritten without notice.
|
||||
you publicly humiliated and your code rewritten without notice.
|
||||
|
||||
|
||||
An attribute definition is simply:
|
||||
An attribute definition is simply::
|
||||
|
||||
struct attribute {
|
||||
char * name;
|
||||
struct module *owner;
|
||||
umode_t mode;
|
||||
};
|
||||
struct attribute {
|
||||
char * name;
|
||||
struct module *owner;
|
||||
umode_t mode;
|
||||
};
|
||||
|
||||
|
||||
int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
|
||||
void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
|
||||
int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
|
||||
void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
|
||||
|
||||
|
||||
A bare attribute contains no means to read or write the value of the
|
||||
attribute. Subsystems are encouraged to define their own attribute
|
||||
structure and wrapper functions for adding and removing attributes for
|
||||
a specific object type.
|
||||
a specific object type.
|
||||
|
||||
For example, the driver model defines struct device_attribute like:
|
||||
For example, the driver model defines struct device_attribute like::
|
||||
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
|
||||
int device_create_file(struct device *, const struct device_attribute *);
|
||||
void device_remove_file(struct device *, const struct device_attribute *);
|
||||
int device_create_file(struct device *, const struct device_attribute *);
|
||||
void device_remove_file(struct device *, const struct device_attribute *);
|
||||
|
||||
It also defines this helper for defining device attributes:
|
||||
It also defines this helper for defining device attributes::
|
||||
|
||||
#define DEVICE_ATTR(_name, _mode, _show, _store) \
|
||||
struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
|
||||
#define DEVICE_ATTR(_name, _mode, _show, _store) \
|
||||
struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
|
||||
|
||||
For example, declaring
|
||||
For example, declaring::
|
||||
|
||||
static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
|
||||
static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
|
||||
|
||||
is equivalent to doing:
|
||||
is equivalent to doing::
|
||||
|
||||
static struct device_attribute dev_attr_foo = {
|
||||
.attr = {
|
||||
.name = "foo",
|
||||
.mode = S_IWUSR | S_IRUGO,
|
||||
},
|
||||
.show = show_foo,
|
||||
.store = store_foo,
|
||||
};
|
||||
static struct device_attribute dev_attr_foo = {
|
||||
.attr = {
|
||||
.name = "foo",
|
||||
.mode = S_IWUSR | S_IRUGO,
|
||||
},
|
||||
.show = show_foo,
|
||||
.store = store_foo,
|
||||
};
|
||||
|
||||
Note as stated in include/linux/kernel.h "OTHER_WRITABLE? Generally
|
||||
considered a bad idea." so trying to set a sysfs file writable for
|
||||
@ -127,15 +131,21 @@ readable. The above case could be shortened to:
|
||||
static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
|
||||
|
||||
the list of helpers available to define your wrapper function is:
|
||||
__ATTR_RO(name): assumes default name_show and mode 0444
|
||||
__ATTR_WO(name): assumes a name_store only and is restricted to mode
|
||||
|
||||
__ATTR_RO(name):
|
||||
assumes default name_show and mode 0444
|
||||
__ATTR_WO(name):
|
||||
assumes a name_store only and is restricted to mode
|
||||
0200 that is root write access only.
|
||||
__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently
|
||||
__ATTR_RO_MODE(name, mode):
|
||||
fore more restrictive RO access currently
|
||||
only use case is the EFI System Resource Table
|
||||
(see drivers/firmware/efi/esrt.c)
|
||||
__ATTR_RW(name): assumes default name_show, name_store and setting
|
||||
__ATTR_RW(name):
|
||||
assumes default name_show, name_store and setting
|
||||
mode to 0644.
|
||||
__ATTR_NULL: which sets the name to NULL and is used as end of list
|
||||
__ATTR_NULL:
|
||||
which sets the name to NULL and is used as end of list
|
||||
indicator (see: kernel/workqueue.c)
|
||||
|
||||
Subsystem-Specific Callbacks
|
||||
@ -143,12 +153,12 @@ Subsystem-Specific Callbacks
|
||||
|
||||
When a subsystem defines a new attribute type, it must implement a
|
||||
set of sysfs operations for forwarding read and write calls to the
|
||||
show and store methods of the attribute owners.
|
||||
show and store methods of the attribute owners::
|
||||
|
||||
struct sysfs_ops {
|
||||
ssize_t (*show)(struct kobject *, struct attribute *, char *);
|
||||
ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
|
||||
};
|
||||
struct sysfs_ops {
|
||||
ssize_t (*show)(struct kobject *, struct attribute *, char *);
|
||||
ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
|
||||
};
|
||||
|
||||
[ Subsystems should have already defined a struct kobj_type as a
|
||||
descriptor for this type, which is where the sysfs_ops pointer is
|
||||
@ -157,29 +167,29 @@ stored. See the kobject documentation for more information. ]
|
||||
When a file is read or written, sysfs calls the appropriate method
|
||||
for the type. The method then translates the generic struct kobject
|
||||
and struct attribute pointers to the appropriate pointer types, and
|
||||
calls the associated methods.
|
||||
calls the associated methods.
|
||||
|
||||
|
||||
To illustrate:
|
||||
To illustrate::
|
||||
|
||||
#define to_dev(obj) container_of(obj, struct device, kobj)
|
||||
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
|
||||
#define to_dev(obj) container_of(obj, struct device, kobj)
|
||||
#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
|
||||
|
||||
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
struct device_attribute *dev_attr = to_dev_attr(attr);
|
||||
struct device *dev = to_dev(kobj);
|
||||
ssize_t ret = -EIO;
|
||||
static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
struct device_attribute *dev_attr = to_dev_attr(attr);
|
||||
struct device *dev = to_dev(kobj);
|
||||
ssize_t ret = -EIO;
|
||||
|
||||
if (dev_attr->show)
|
||||
ret = dev_attr->show(dev, dev_attr, buf);
|
||||
if (ret >= (ssize_t)PAGE_SIZE) {
|
||||
printk("dev_attr_show: %pS returned bad count\n",
|
||||
dev_attr->show);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
if (dev_attr->show)
|
||||
ret = dev_attr->show(dev, dev_attr, buf);
|
||||
if (ret >= (ssize_t)PAGE_SIZE) {
|
||||
printk("dev_attr_show: %pS returned bad count\n",
|
||||
dev_attr->show);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
|
||||
@ -188,11 +198,11 @@ Reading/Writing Attribute Data
|
||||
|
||||
To read or write attributes, show() or store() methods must be
|
||||
specified when declaring the attribute. The method types should be as
|
||||
simple as those defined for device attributes:
|
||||
simple as those defined for device attributes::
|
||||
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
|
||||
IOW, they should take only an object, an attribute, and a buffer as parameters.
|
||||
|
||||
@ -200,11 +210,11 @@ IOW, they should take only an object, an attribute, and a buffer as parameters.
|
||||
sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
|
||||
method. Sysfs will call the method exactly once for each read or
|
||||
write. This forces the following behavior on the method
|
||||
implementations:
|
||||
implementations:
|
||||
|
||||
- On read(2), the show() method should fill the entire buffer.
|
||||
- On read(2), the show() method should fill the entire buffer.
|
||||
Recall that an attribute should only be exporting one value, or an
|
||||
array of similar values, so this shouldn't be that expensive.
|
||||
array of similar values, so this shouldn't be that expensive.
|
||||
|
||||
This allows userspace to do partial reads and forward seeks
|
||||
arbitrarily over the entire file at will. If userspace seeks back to
|
||||
@ -218,10 +228,10 @@ implementations:
|
||||
|
||||
When writing sysfs files, userspace processes should first read the
|
||||
entire file, modify the values it wishes to change, then write the
|
||||
entire buffer back.
|
||||
entire buffer back.
|
||||
|
||||
Attribute method implementations should operate on an identical
|
||||
buffer when reading and writing values.
|
||||
buffer when reading and writing values.
|
||||
|
||||
Other notes:
|
||||
|
||||
@ -229,7 +239,7 @@ Other notes:
|
||||
file position.
|
||||
|
||||
- The buffer will always be PAGE_SIZE bytes in length. On i386, this
|
||||
is 4096.
|
||||
is 4096.
|
||||
|
||||
- show() methods should return the number of bytes printed into the
|
||||
buffer. This is the return value of scnprintf().
|
||||
@ -246,31 +256,31 @@ Other notes:
|
||||
through, be sure to return an error.
|
||||
|
||||
- The object passed to the methods will be pinned in memory via sysfs
|
||||
referencing counting its embedded object. However, the physical
|
||||
entity (e.g. device) the object represents may not be present. Be
|
||||
sure to have a way to check this, if necessary.
|
||||
referencing counting its embedded object. However, the physical
|
||||
entity (e.g. device) the object represents may not be present. Be
|
||||
sure to have a way to check this, if necessary.
|
||||
|
||||
|
||||
A very simple (and naive) implementation of a device attribute is:
|
||||
A very simple (and naive) implementation of a device attribute is::
|
||||
|
||||
static ssize_t show_name(struct device *dev, struct device_attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
|
||||
}
|
||||
static ssize_t show_name(struct device *dev, struct device_attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
|
||||
}
|
||||
|
||||
static ssize_t store_name(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count)
|
||||
{
|
||||
snprintf(dev->name, sizeof(dev->name), "%.*s",
|
||||
(int)min(count, sizeof(dev->name) - 1), buf);
|
||||
return count;
|
||||
}
|
||||
static ssize_t store_name(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count)
|
||||
{
|
||||
snprintf(dev->name, sizeof(dev->name), "%.*s",
|
||||
(int)min(count, sizeof(dev->name) - 1), buf);
|
||||
return count;
|
||||
}
|
||||
|
||||
static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
|
||||
static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
|
||||
|
||||
|
||||
(Note that the real implementation doesn't allow userspace to set the
|
||||
(Note that the real implementation doesn't allow userspace to set the
|
||||
name for a device.)
|
||||
|
||||
|
||||
@ -278,25 +288,25 @@ Top Level Directory Layout
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The sysfs directory arrangement exposes the relationship of kernel
|
||||
data structures.
|
||||
data structures.
|
||||
|
||||
The top level sysfs directory looks like:
|
||||
The top level sysfs directory looks like::
|
||||
|
||||
block/
|
||||
bus/
|
||||
class/
|
||||
dev/
|
||||
devices/
|
||||
firmware/
|
||||
net/
|
||||
fs/
|
||||
block/
|
||||
bus/
|
||||
class/
|
||||
dev/
|
||||
devices/
|
||||
firmware/
|
||||
net/
|
||||
fs/
|
||||
|
||||
devices/ contains a filesystem representation of the device tree. It maps
|
||||
directly to the internal kernel device tree, which is a hierarchy of
|
||||
struct device.
|
||||
struct device.
|
||||
|
||||
bus/ contains flat directory layout of the various bus types in the
|
||||
kernel. Each bus's directory contains two subdirectories:
|
||||
kernel. Each bus's directory contains two subdirectories::
|
||||
|
||||
devices/
|
||||
drivers/
|
||||
@ -331,71 +341,71 @@ Current Interfaces
|
||||
The following interface layers currently exist in sysfs:
|
||||
|
||||
|
||||
- devices (include/linux/device.h)
|
||||
----------------------------------
|
||||
Structure:
|
||||
devices (include/linux/device.h)
|
||||
--------------------------------
|
||||
Structure::
|
||||
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
|
||||
Declaring:
|
||||
Declaring::
|
||||
|
||||
DEVICE_ATTR(_name, _mode, _show, _store);
|
||||
DEVICE_ATTR(_name, _mode, _show, _store);
|
||||
|
||||
Creation/Removal:
|
||||
Creation/Removal::
|
||||
|
||||
int device_create_file(struct device *dev, const struct device_attribute * attr);
|
||||
void device_remove_file(struct device *dev, const struct device_attribute * attr);
|
||||
int device_create_file(struct device *dev, const struct device_attribute * attr);
|
||||
void device_remove_file(struct device *dev, const struct device_attribute * attr);
|
||||
|
||||
|
||||
- bus drivers (include/linux/device.h)
|
||||
--------------------------------------
|
||||
Structure:
|
||||
bus drivers (include/linux/device.h)
|
||||
------------------------------------
|
||||
Structure::
|
||||
|
||||
struct bus_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct bus_type *, char * buf);
|
||||
ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
|
||||
};
|
||||
struct bus_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct bus_type *, char * buf);
|
||||
ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
|
||||
};
|
||||
|
||||
Declaring:
|
||||
Declaring::
|
||||
|
||||
static BUS_ATTR_RW(name);
|
||||
static BUS_ATTR_RO(name);
|
||||
static BUS_ATTR_WO(name);
|
||||
static BUS_ATTR_RW(name);
|
||||
static BUS_ATTR_RO(name);
|
||||
static BUS_ATTR_WO(name);
|
||||
|
||||
Creation/Removal:
|
||||
Creation/Removal::
|
||||
|
||||
int bus_create_file(struct bus_type *, struct bus_attribute *);
|
||||
void bus_remove_file(struct bus_type *, struct bus_attribute *);
|
||||
int bus_create_file(struct bus_type *, struct bus_attribute *);
|
||||
void bus_remove_file(struct bus_type *, struct bus_attribute *);
|
||||
|
||||
|
||||
- device drivers (include/linux/device.h)
|
||||
-----------------------------------------
|
||||
device drivers (include/linux/device.h)
|
||||
---------------------------------------
|
||||
|
||||
Structure:
|
||||
Structure::
|
||||
|
||||
struct driver_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device_driver *, char * buf);
|
||||
ssize_t (*store)(struct device_driver *, const char * buf,
|
||||
size_t count);
|
||||
};
|
||||
struct driver_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device_driver *, char * buf);
|
||||
ssize_t (*store)(struct device_driver *, const char * buf,
|
||||
size_t count);
|
||||
};
|
||||
|
||||
Declaring:
|
||||
Declaring::
|
||||
|
||||
DRIVER_ATTR_RO(_name)
|
||||
DRIVER_ATTR_RW(_name)
|
||||
DRIVER_ATTR_RO(_name)
|
||||
DRIVER_ATTR_RW(_name)
|
||||
|
||||
Creation/Removal:
|
||||
Creation/Removal::
|
||||
|
||||
int driver_create_file(struct device_driver *, const struct driver_attribute *);
|
||||
void driver_remove_file(struct device_driver *, const struct driver_attribute *);
|
||||
int driver_create_file(struct device_driver *, const struct driver_attribute *);
|
||||
void driver_remove_file(struct device_driver *, const struct driver_attribute *);
|
||||
|
||||
|
||||
Documentation
|
@ -1,25 +1,40 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
SystemV Filesystem
|
||||
==================
|
||||
|
||||
It implements all of
|
||||
- Xenix FS,
|
||||
- SystemV/386 FS,
|
||||
- Coherent FS.
|
||||
|
||||
To install:
|
||||
|
||||
* Answer the 'System V and Coherent filesystem support' question with 'y'
|
||||
when configuring the kernel.
|
||||
* To mount a disk or a partition, use
|
||||
* To mount a disk or a partition, use::
|
||||
|
||||
mount [-r] -t sysv device mountpoint
|
||||
The file system type names
|
||||
|
||||
The file system type names::
|
||||
|
||||
-t sysv
|
||||
-t xenix
|
||||
-t coherent
|
||||
|
||||
may be used interchangeably, but the last two will eventually disappear.
|
||||
|
||||
Bugs in the present implementation:
|
||||
|
||||
- Coherent FS:
|
||||
|
||||
- The "free list interleave" n:m is currently ignored.
|
||||
- Only file systems with no filesystem name and no pack name are recognized.
|
||||
(See Coherent "man mkfs" for a description of these features.)
|
||||
(See Coherent "man mkfs" for a description of these features.)
|
||||
|
||||
- SystemV Release 2 FS:
|
||||
|
||||
The superblock is only searched in the blocks 9, 15, 18, which
|
||||
corresponds to the beginning of track 1 on floppy disks. No support
|
||||
for this FS on hard disk yet.
|
||||
@ -28,12 +43,14 @@ Bugs in the present implementation:
|
||||
These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
|
||||
* Linux fdisk reports on partitions
|
||||
|
||||
- Minix FS 0x81 Linux/Minix
|
||||
- Xenix FS ??
|
||||
- SystemV FS ??
|
||||
- Coherent FS 0x08 AIX bootable
|
||||
|
||||
* Size of a block or zone (data allocation unit on disk)
|
||||
|
||||
- Minix FS 1024
|
||||
- Xenix FS 1024 (also 512 ??)
|
||||
- SystemV FS 1024 (also 512 and 2048)
|
||||
@ -45,37 +62,51 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
all the block numbers (including the super block) are offset by one track.
|
||||
|
||||
* Byte ordering of "short" (16 bit entities) on disk:
|
||||
|
||||
- Minix FS little endian 0 1
|
||||
- Xenix FS little endian 0 1
|
||||
- SystemV FS little endian 0 1
|
||||
- Coherent FS little endian 0 1
|
||||
|
||||
Of course, this affects only the file system, not the data of files on it!
|
||||
|
||||
* Byte ordering of "long" (32 bit entities) on disk:
|
||||
|
||||
- Minix FS little endian 0 1 2 3
|
||||
- Xenix FS little endian 0 1 2 3
|
||||
- SystemV FS little endian 0 1 2 3
|
||||
- Coherent FS PDP-11 2 3 0 1
|
||||
|
||||
Of course, this affects only the file system, not the data of files on it!
|
||||
|
||||
* Inode on disk: "short", 0 means non-existent, the root dir ino is:
|
||||
- Minix FS 1
|
||||
- Xenix FS, SystemV FS, Coherent FS 2
|
||||
|
||||
================================= ==
|
||||
Minix FS 1
|
||||
Xenix FS, SystemV FS, Coherent FS 2
|
||||
================================= ==
|
||||
|
||||
* Maximum number of hard links to a file:
|
||||
- Minix FS 250
|
||||
- Xenix FS ??
|
||||
- SystemV FS ??
|
||||
- Coherent FS >=10000
|
||||
|
||||
=========== =========
|
||||
Minix FS 250
|
||||
Xenix FS ??
|
||||
SystemV FS ??
|
||||
Coherent FS >=10000
|
||||
=========== =========
|
||||
|
||||
* Free inode management:
|
||||
- Minix FS a bitmap
|
||||
|
||||
- Minix FS
|
||||
a bitmap
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
There is a cache of a certain number of free inodes in the super-block.
|
||||
When it is exhausted, new free inodes are found using a linear search.
|
||||
|
||||
* Free block management:
|
||||
- Minix FS a bitmap
|
||||
|
||||
- Minix FS
|
||||
a bitmap
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
Free blocks are organized in a "free list". Maybe a misleading term,
|
||||
since it is not true that every free block contains a pointer to
|
||||
@ -86,13 +117,18 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.
|
||||
|
||||
* Super-block location:
|
||||
- Minix FS block 1 = bytes 1024..2047
|
||||
- Xenix FS block 1 = bytes 1024..2047
|
||||
- SystemV FS bytes 512..1023
|
||||
- Coherent FS block 1 = bytes 512..1023
|
||||
|
||||
=========== ==========================
|
||||
Minix FS block 1 = bytes 1024..2047
|
||||
Xenix FS block 1 = bytes 1024..2047
|
||||
SystemV FS bytes 512..1023
|
||||
Coherent FS block 1 = bytes 512..1023
|
||||
=========== ==========================
|
||||
|
||||
* Super-block layout:
|
||||
- Minix FS
|
||||
|
||||
- Minix FS::
|
||||
|
||||
unsigned short s_ninodes;
|
||||
unsigned short s_nzones;
|
||||
unsigned short s_imap_blocks;
|
||||
@ -101,7 +137,9 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned short s_log_zone_size;
|
||||
unsigned long s_max_size;
|
||||
unsigned short s_magic;
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS::
|
||||
|
||||
unsigned short s_firstdatazone;
|
||||
unsigned long s_nzones;
|
||||
unsigned short s_fzone_count;
|
||||
@ -120,23 +158,33 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only
|
||||
char s_fname[6];
|
||||
char s_fpack[6];
|
||||
|
||||
then they differ considerably:
|
||||
Xenix FS
|
||||
|
||||
Xenix FS::
|
||||
|
||||
char s_clean;
|
||||
char s_fill[371];
|
||||
long s_magic;
|
||||
long s_type;
|
||||
SystemV FS
|
||||
|
||||
SystemV FS::
|
||||
|
||||
long s_fill[12 or 14];
|
||||
long s_state;
|
||||
long s_magic;
|
||||
long s_type;
|
||||
Coherent FS
|
||||
|
||||
Coherent FS::
|
||||
|
||||
unsigned long s_unique;
|
||||
|
||||
Note that Coherent FS has no magic.
|
||||
|
||||
* Inode layout:
|
||||
- Minix FS
|
||||
|
||||
- Minix FS::
|
||||
|
||||
unsigned short i_mode;
|
||||
unsigned short i_uid;
|
||||
unsigned long i_size;
|
||||
@ -144,7 +192,9 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned char i_gid;
|
||||
unsigned char i_nlinks;
|
||||
unsigned short i_zone[7+1+1];
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS::
|
||||
|
||||
unsigned short i_mode;
|
||||
unsigned short i_nlink;
|
||||
unsigned short i_uid;
|
||||
@ -155,38 +205,55 @@ These filesystems are rather similar. Here is a comparison with Minix FS:
|
||||
unsigned long i_mtime;
|
||||
unsigned long i_ctime;
|
||||
|
||||
* Regular file data blocks are organized as
|
||||
- Minix FS
|
||||
7 direct blocks
|
||||
1 indirect block (pointers to blocks)
|
||||
1 double-indirect block (pointer to pointers to blocks)
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
10 direct blocks
|
||||
1 indirect block (pointers to blocks)
|
||||
1 double-indirect block (pointer to pointers to blocks)
|
||||
1 triple-indirect block (pointer to pointers to pointers to blocks)
|
||||
|
||||
* Inode size, inodes per block
|
||||
- Minix FS 32 32
|
||||
- Xenix FS 64 16
|
||||
- SystemV FS 64 16
|
||||
- Coherent FS 64 8
|
||||
* Regular file data blocks are organized as
|
||||
|
||||
- Minix FS:
|
||||
|
||||
- 7 direct blocks
|
||||
- 1 indirect block (pointers to blocks)
|
||||
- 1 double-indirect block (pointer to pointers to blocks)
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS:
|
||||
|
||||
- 10 direct blocks
|
||||
- 1 indirect block (pointers to blocks)
|
||||
- 1 double-indirect block (pointer to pointers to blocks)
|
||||
- 1 triple-indirect block (pointer to pointers to pointers to blocks)
|
||||
|
||||
|
||||
=========== ========== ================
|
||||
Inode size inodes per block
|
||||
=========== ========== ================
|
||||
Minix FS 32 32
|
||||
Xenix FS 64 16
|
||||
SystemV FS 64 16
|
||||
Coherent FS 64 8
|
||||
=========== ========== ================
|
||||
|
||||
* Directory entry on disk
|
||||
- Minix FS
|
||||
|
||||
- Minix FS::
|
||||
|
||||
unsigned short inode;
|
||||
char name[14/30];
|
||||
- Xenix FS, SystemV FS, Coherent FS
|
||||
|
||||
- Xenix FS, SystemV FS, Coherent FS::
|
||||
|
||||
unsigned short inode;
|
||||
char name[14];
|
||||
|
||||
* Dir entry size, dir entries per block
|
||||
- Minix FS 16/32 64/32
|
||||
- Xenix FS 16 64
|
||||
- SystemV FS 16 64
|
||||
- Coherent FS 16 32
|
||||
=========== ============== =====================
|
||||
Dir entry size dir entries per block
|
||||
=========== ============== =====================
|
||||
Minix FS 16/32 64/32
|
||||
Xenix FS 16 64
|
||||
SystemV FS 16 64
|
||||
Coherent FS 16 32
|
||||
=========== ============== =====================
|
||||
|
||||
* How to implement symbolic links such that the host fsck doesn't scream:
|
||||
|
||||
- Minix FS normal
|
||||
- Xenix FS kludge: as regular files with chmod 1000
|
||||
- SystemV FS ??
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====
|
||||
Tmpfs
|
||||
=====
|
||||
|
||||
Tmpfs is a file system which keeps all files in virtual memory.
|
||||
|
||||
|
||||
@ -14,7 +20,7 @@ If you compare it to ramfs (which was the template to create tmpfs)
|
||||
you gain swapping and limit checking. Another similar thing is the RAM
|
||||
disk (/dev/ram*), which simulates a fixed size hard disk in physical
|
||||
RAM, where you have to create an ordinary filesystem on top. Ramdisks
|
||||
cannot swap and you do not have the possibility to resize them.
|
||||
cannot swap and you do not have the possibility to resize them.
|
||||
|
||||
Since tmpfs lives completely in the page cache and on swap, all tmpfs
|
||||
pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
||||
@ -26,7 +32,7 @@ tmpfs has the following uses:
|
||||
|
||||
1) There is always a kernel internal mount which you will not see at
|
||||
all. This is used for shared anonymous mappings and SYSV shared
|
||||
memory.
|
||||
memory.
|
||||
|
||||
This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
|
||||
set, the user visible part of tmpfs is not build. But the internal
|
||||
@ -34,7 +40,7 @@ tmpfs has the following uses:
|
||||
|
||||
2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
|
||||
POSIX shared memory (shm_open, shm_unlink). Adding the following
|
||||
line to /etc/fstab should take care of this:
|
||||
line to /etc/fstab should take care of this::
|
||||
|
||||
tmpfs /dev/shm tmpfs defaults 0 0
|
||||
|
||||
@ -56,15 +62,17 @@ tmpfs has the following uses:
|
||||
|
||||
tmpfs has three mount options for sizing:
|
||||
|
||||
size: The limit of allocated bytes for this tmpfs instance. The
|
||||
========= ============================================================
|
||||
size The limit of allocated bytes for this tmpfs instance. The
|
||||
default is half of your physical RAM without swap. If you
|
||||
oversize your tmpfs instances the machine will deadlock
|
||||
since the OOM handler will not be able to free that memory.
|
||||
nr_blocks: The same as size, but in blocks of PAGE_SIZE.
|
||||
nr_inodes: The maximum number of inodes for this instance. The default
|
||||
nr_blocks The same as size, but in blocks of PAGE_SIZE.
|
||||
nr_inodes The maximum number of inodes for this instance. The default
|
||||
is half of the number of your physical RAM pages, or (on a
|
||||
machine with highmem) the number of lowmem RAM pages,
|
||||
whichever is the lower.
|
||||
========= ============================================================
|
||||
|
||||
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
||||
can be changed on remount. The size parameter also accepts a suffix %
|
||||
@ -82,6 +90,7 @@ tmpfs has a mount option to set the NUMA memory allocation policy for
|
||||
all files in that instance (if CONFIG_NUMA is enabled) - which can be
|
||||
adjusted on the fly via 'mount -o remount ...'
|
||||
|
||||
======================== ==============================================
|
||||
mpol=default use the process allocation policy
|
||||
(see set_mempolicy(2))
|
||||
mpol=prefer:Node prefers to allocate memory from the given Node
|
||||
@ -89,6 +98,7 @@ mpol=bind:NodeList allocates memory only from nodes in NodeList
|
||||
mpol=interleave prefers to allocate from each node in turn
|
||||
mpol=interleave:NodeList allocates from each node of NodeList in turn
|
||||
mpol=local prefers to allocate memory from the local node
|
||||
======================== ==============================================
|
||||
|
||||
NodeList format is a comma-separated list of decimal numbers and ranges,
|
||||
a range being two hyphen-separated decimal numbers, the smallest and
|
||||
@ -98,9 +108,9 @@ A memory policy with a valid NodeList will be saved, as specified, for
|
||||
use at file creation time. When a task allocates a file in the file
|
||||
system, the mount option memory policy will be applied with a NodeList,
|
||||
if any, modified by the calling task's cpuset constraints
|
||||
[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed
|
||||
below. If the resulting NodeLists is the empty set, the effective memory
|
||||
policy for the file will revert to "default" policy.
|
||||
[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags,
|
||||
listed below. If the resulting NodeLists is the empty set, the effective
|
||||
memory policy for the file will revert to "default" policy.
|
||||
|
||||
NUMA memory allocation policies have optional flags that can be used in
|
||||
conjunction with their modes. These optional flags can be specified
|
||||
@ -109,6 +119,8 @@ See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of
|
||||
all available memory allocation policy mode flags and their effect on
|
||||
memory policy.
|
||||
|
||||
::
|
||||
|
||||
=static is equivalent to MPOL_F_STATIC_NODES
|
||||
=relative is equivalent to MPOL_F_RELATIVE_NODES
|
||||
|
||||
@ -128,9 +140,11 @@ on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
|
||||
To specify the initial root directory you can use the following mount
|
||||
options:
|
||||
|
||||
mode: The permissions as an octal number
|
||||
uid: The user id
|
||||
gid: The group id
|
||||
==== ==================================
|
||||
mode The permissions as an octal number
|
||||
uid The user id
|
||||
gid The group id
|
||||
==== ==================================
|
||||
|
||||
These options do not have any effect on remount. You can change these
|
||||
parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
|
||||
@ -141,9 +155,9 @@ will give you tmpfs instance on /mytmpfs which can allocate 10GB
|
||||
RAM/SWAP in 10240 inodes and it is only accessible by root.
|
||||
|
||||
|
||||
Author:
|
||||
:Author:
|
||||
Christoph Rohland <cr@sap.com>, 1.12.01
|
||||
Updated:
|
||||
:Updated:
|
||||
Hugh Dickins, 4 June 2007
|
||||
Updated:
|
||||
:Updated:
|
||||
KOSAKI Motohiro, 16 Mar 2010
|
@ -1,3 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
:orphan:
|
||||
|
||||
.. UBIFS Authentication
|
||||
@ -92,11 +94,11 @@ UBIFS Index & Tree Node Cache
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
|
||||
of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file
|
||||
contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes.
|
||||
Almost all types of nodes share a common header (`ubifs_ch`) containing basic
|
||||
of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
|
||||
contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
|
||||
Almost all types of nodes share a common header (``ubifs_ch``) containing basic
|
||||
information like node type, node length, a sequence number, etc. (see
|
||||
`fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT
|
||||
``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
|
||||
and some less important node types like padding nodes which are used to pad
|
||||
unusable content at the end of LEBs.
|
||||
|
||||
|
@ -1,5 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
UBI File System
|
||||
===============
|
||||
|
||||
Introduction
|
||||
=============
|
||||
============
|
||||
|
||||
UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
|
||||
Block Images". UBIFS is a flash file system, which means it is designed
|
||||
@ -79,6 +85,7 @@ Mount options
|
||||
|
||||
(*) == default.
|
||||
|
||||
==================== =======================================================
|
||||
bulk_read read more in one go to take advantage of flash
|
||||
media that read faster sequentially
|
||||
no_bulk_read (*) do not bulk-read
|
||||
@ -98,6 +105,7 @@ auth_key= specify the key used for authenticating the filesystem.
|
||||
auth_hash_name= The hash algorithm used for authentication. Used for
|
||||
both hashing and for creating HMACs. Typical values
|
||||
include "sha256" or "sha512"
|
||||
==================== =======================================================
|
||||
|
||||
|
||||
Quick usage instructions
|
||||
@ -107,12 +115,14 @@ The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
|
||||
where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
|
||||
UBI volume name.
|
||||
|
||||
Mount volume 0 on UBI device 0 to /mnt/ubifs:
|
||||
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||
Mount volume 0 on UBI device 0 to /mnt/ubifs::
|
||||
|
||||
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||
|
||||
Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
|
||||
name):
|
||||
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||
name)::
|
||||
|
||||
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||
|
||||
The following is an example of the kernel boot arguments to attach mtd0
|
||||
to UBI and mount volume "rootfs":
|
||||
@ -122,5 +132,6 @@ References
|
||||
==========
|
||||
|
||||
UBIFS documentation and FAQ/HOWTO at the MTD web site:
|
||||
http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||
http://www.linux-mtd.infradead.org/faq/ubifs.html
|
||||
|
||||
- http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||
- http://www.linux-mtd.infradead.org/faq/ubifs.html
|
@ -1,6 +1,8 @@
|
||||
*
|
||||
* Documentation/filesystems/udf.txt
|
||||
*
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
UDF file system
|
||||
===============
|
||||
|
||||
If you encounter problems with reading UDF discs using this driver,
|
||||
please report them according to MAINTAINERS file.
|
||||
@ -18,8 +20,10 @@ performance due to very poor read-modify-write support supplied internally
|
||||
by drive firmware.
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following mount options are supported:
|
||||
|
||||
=========== ======================================
|
||||
gid= Set the default group.
|
||||
umask= Set the default umask.
|
||||
mode= Set the default file permissions.
|
||||
@ -34,6 +38,7 @@ The following mount options are supported:
|
||||
longad Use long ad's (default)
|
||||
nostrict Unset strict conformance
|
||||
iocharset= Set the NLS character set
|
||||
=========== ======================================
|
||||
|
||||
The uid= and gid= options need a bit more explaining. They will accept a
|
||||
decimal numeric value and all inodes on that mount will then appear as
|
||||
@ -47,13 +52,17 @@ the interactive user will always see the files on the disk as belonging to him.
|
||||
|
||||
The remaining are for debugging and disaster recovery:
|
||||
|
||||
novrs Skip volume sequence recognition
|
||||
===== ================================
|
||||
novrs Skip volume sequence recognition
|
||||
===== ================================
|
||||
|
||||
The following expect a offset from 0.
|
||||
|
||||
========== =================================================
|
||||
session= Set the CDROM session (default= last session)
|
||||
anchor= Override standard anchor location. (default= 256)
|
||||
lastblock= Set the last block of the filesystem/
|
||||
========== =================================================
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
@ -62,5 +71,5 @@ For the latest version and toolset see:
|
||||
https://github.com/pali/udftools
|
||||
|
||||
Documentation on UDF and ECMA 167 is available FREE from:
|
||||
http://www.osta.org/
|
||||
http://www.ecma-international.org/
|
||||
- http://www.osta.org/
|
||||
- http://www.ecma-international.org/
|
@ -1,4 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================================
|
||||
ZoneFS - Zone filesystem for Zoned block devices
|
||||
================================================
|
||||
|
||||
Introduction
|
||||
============
|
||||
@ -29,6 +33,7 @@ Zoned block devices
|
||||
Zoned storage devices belong to a class of storage devices with an address
|
||||
space that is divided into zones. A zone is a group of consecutive LBAs and all
|
||||
zones are contiguous (there are no LBA gaps). Zones may have different types.
|
||||
|
||||
* Conventional zones: there are no access constraints to LBAs belonging to
|
||||
conventional zones. Any read or write access can be executed, similarly to a
|
||||
regular block device.
|
||||
@ -158,6 +163,7 @@ Format options
|
||||
--------------
|
||||
|
||||
Several optional features of zonefs can be enabled at format time.
|
||||
|
||||
* Conventional zone aggregation: ranges of contiguous conventional zones can be
|
||||
aggregated into a single larger file instead of the default one file per zone.
|
||||
* File ownership: The owner UID and GID of zone files is by default 0 (root)
|
||||
@ -249,7 +255,7 @@ permissions.
|
||||
Further action taken by zonefs I/O error recovery can be controlled by the user
|
||||
with the "errors=xxx" mount option. The table below summarizes the result of
|
||||
zonefs I/O error processing depending on the mount option and on the zone
|
||||
conditions.
|
||||
conditions::
|
||||
|
||||
+--------------+-----------+-----------------------------------------+
|
||||
| | | Post error state |
|
||||
@ -275,6 +281,7 @@ conditions.
|
||||
+--------------+-----------+-----------------------------------------+
|
||||
|
||||
Further notes:
|
||||
|
||||
* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
|
||||
error processing if no errors mount option is specified.
|
||||
* With the "errors=remount-ro" mount option, the change of the file access
|
||||
@ -302,6 +309,7 @@ Mount options
|
||||
zonefs define the "errors=<behavior>" mount option to allow the user to specify
|
||||
zonefs behavior in response to I/O errors, inode size inconsistencies or zone
|
||||
condition chages. The defined behaviors are as follow:
|
||||
|
||||
* remount-ro (default)
|
||||
* zone-ro
|
||||
* zone-offline
|
||||
@ -325,78 +333,78 @@ Examples
|
||||
--------
|
||||
|
||||
The following formats a 15TB host-managed SMR HDD with 256 MB zones
|
||||
with the conventional zones aggregation feature enabled.
|
||||
with the conventional zones aggregation feature enabled::
|
||||
|
||||
# mkzonefs -o aggr_cnv /dev/sdX
|
||||
# mount -t zonefs /dev/sdX /mnt
|
||||
# ls -l /mnt/
|
||||
total 0
|
||||
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
|
||||
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
|
||||
# mkzonefs -o aggr_cnv /dev/sdX
|
||||
# mount -t zonefs /dev/sdX /mnt
|
||||
# ls -l /mnt/
|
||||
total 0
|
||||
dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
|
||||
dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
|
||||
|
||||
The size of the zone files sub-directories indicate the number of files
|
||||
existing for each type of zones. In this example, there is only one
|
||||
conventional zone file (all conventional zones are aggregated under a single
|
||||
file).
|
||||
file)::
|
||||
|
||||
# ls -l /mnt/cnv
|
||||
total 137101312
|
||||
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0
|
||||
# ls -l /mnt/cnv
|
||||
total 137101312
|
||||
-rw-r----- 1 root root 140391743488 Nov 25 13:23 0
|
||||
|
||||
This aggregated conventional zone file can be used as a regular file.
|
||||
This aggregated conventional zone file can be used as a regular file::
|
||||
|
||||
# mkfs.ext4 /mnt/cnv/0
|
||||
# mount -o loop /mnt/cnv/0 /data
|
||||
# mkfs.ext4 /mnt/cnv/0
|
||||
# mount -o loop /mnt/cnv/0 /data
|
||||
|
||||
The "seq" sub-directory grouping files for sequential write zones has in this
|
||||
example 55356 zones.
|
||||
example 55356 zones::
|
||||
|
||||
# ls -lv /mnt/seq
|
||||
total 14511243264
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 1
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 2
|
||||
...
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55354
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55355
|
||||
# ls -lv /mnt/seq
|
||||
total 14511243264
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 1
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 2
|
||||
...
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55354
|
||||
-rw-r----- 1 root root 0 Nov 25 13:23 55355
|
||||
|
||||
For sequential write zone files, the file size changes as data is appended at
|
||||
the end of the file, similarly to any regular file system.
|
||||
the end of the file, similarly to any regular file system::
|
||||
|
||||
# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
|
||||
1+0 records in
|
||||
1+0 records out
|
||||
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
|
||||
# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
|
||||
1+0 records in
|
||||
1+0 records out
|
||||
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
|
||||
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
|
||||
|
||||
The written file can be truncated to the zone size, preventing any further
|
||||
write operation.
|
||||
write operation::
|
||||
|
||||
# truncate -s 268435456 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
|
||||
# truncate -s 268435456 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
|
||||
|
||||
Truncation to 0 size allows freeing the file zone storage space and restart
|
||||
append-writes to the file.
|
||||
append-writes to the file::
|
||||
|
||||
# truncate -s 0 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
|
||||
# truncate -s 0 /mnt/seq/0
|
||||
# ls -l /mnt/seq/0
|
||||
-rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
|
||||
|
||||
Since files are statically mapped to zones on the disk, the number of blocks of
|
||||
a file as reported by stat() and fstat() indicates the size of the file zone.
|
||||
a file as reported by stat() and fstat() indicates the size of the file zone::
|
||||
|
||||
# stat /mnt/seq/0
|
||||
File: /mnt/seq/0
|
||||
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
|
||||
Device: 870h/2160d Inode: 50431 Links: 1
|
||||
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
|
||||
Access: 2019-11-25 13:23:57.048971997 +0900
|
||||
Modify: 2019-11-25 13:52:25.553805765 +0900
|
||||
Change: 2019-11-25 13:52:25.553805765 +0900
|
||||
Birth: -
|
||||
# stat /mnt/seq/0
|
||||
File: /mnt/seq/0
|
||||
Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
|
||||
Device: 870h/2160d Inode: 50431 Links: 1
|
||||
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
|
||||
Access: 2019-11-25 13:23:57.048971997 +0900
|
||||
Modify: 2019-11-25 13:52:25.553805765 +0900
|
||||
Change: 2019-11-25 13:52:25.553805765 +0900
|
||||
Birth: -
|
||||
|
||||
The number of blocks of the file ("Blocks") in units of 512B blocks gives the
|
||||
maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone
|
Loading…
Reference in New Issue
Block a user