Presently, we use a tuple to attach a dict containing annotations
(comments and compile-time conditionals) to a tree node. This is
undesirable because dicts are difficult to strongly type; promoting it
to a real class allows us to name the values and types of the
annotations we are expecting.
In terms of typing, the Annotated<T> type serves as a generic container
where the annotated node's type is preserved, allowing for greater
specificity than we'd be able to provide without a generic.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-11-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The types will be used in forthcoming patches to add typing. These types
describe the layout and structure of the objects passed to
_tree_to_qlit, but lack the power to describe annotations until the next
commit.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-10-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This mimics how a typed object works, where 'if' and 'comment' are
always set, regardless of if they have a value set or not.
It is safe to do this because of the way that _tree_to_qlit processes
these values (using dict.get with a default of None), resulting in no
change of output from _tree_to_qlit. There are no other users of this
data.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-9-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This is only used to pass in a dictionary with a comment already set, so
skip the runaround and just accept the (optional) comment.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-8-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Returning two different types conditionally can be complicated to
type. Return one type for consistency.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-7-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
_tree_to_qlit is called recursively on dict values (isolated from their
keys); at such a point in generating output it is too late to apply an
ifcond. Similarly, comments do not necessarily have a "tidy" place they
can be printed in such a circumstance.
Forbid this usage by renaming "suppress_first_indent" to "dict_value" to
emphasize that indents are suppressed only for the benefit of dict
values; then add an assertion assuring we do not pass ifcond/comments
in this case.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-6-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Comment wrapped to conform to PEP 8]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
_make_tree might receive a dict (a SchemaInfo object) or some other type
(usually, a string) for its obj parameter. Adding features information
should arguably be performed by the caller at such a time when we know
the type of the object and don't have to re-interrogate it.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-5-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
At present, we open-code this in _make_tree itself; but if the structure
of the tree changes, this is brittle. Use an explicit recursive call to
_make_tree when appropriate to help keep the interior node typing
consistent.
A consequence of doing this is that the 'ifcond' key of the features
dict will be omitted when ifcond is false-ish, just like it is omitted
in top-level calls to _make_tree. This also increases consistency in our
handling of this property.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-4-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The introspect visitor is stateful, but expects that it will have a
schema to refer to. Add assertions that state this.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-3-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
It does happen to be a list (as of now), but we can describe it in more
general terms with no loss in accuracy to allow tuples and other
constructs.
In the future, we can write "ifcond: Sequence[str] = ()" as a default
parameter, which we could not do safely with a Mutable type like a List.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20210216021809.134886-2-jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
- expose vdev name in PCI memory registration
- new hwprofile plugin
- bunch of style cleanups to contrib/plugins
- fix call signature of inline instrumentation
- re-factor the io_recompile code to push specialisation into hooks
- add some acceptance tests for the plugins
- clean-up and remove CF_NOCACHE handling from TCG
- fix instrumentation of cpu_io_recompile sections
- expand tests to check inline and cb count the same
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmAuJFkACgkQ+9DbCVqe
KkTIGQf/e5FhBWxTIO3Y+N9CppnZM2gmo1U4kEOuCoJxa079/Jy9j7l7ZeWicmTR
hhatxPKcKzodHGCSTJ9m+g9T8ye/hBgDgsuM1MZMQNtdZ8enEhU+z9V6MVmbRf55
s31HUksf7kiY+ahBVCjEE41f/JGLNiOBltmNQtfDap7LvcEXXLp+CLvYfiMmtPW3
0aZfKFAbfaEj2x4WSv7tjFsnzFUTFYkwsgVmcuDbI6uD5sX864W0XPdEjTrB1pPO
r3qJ26yZeDrJEyMykQodWbKK12NIuLAtOCezc05UwpXmSKOueBdyEehNtraIQT1k
TnMW2Vs10jDWN0RdEY2II7KZbcwTDQ==
=/I6l
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stsquad/tags/pull-plugin-updates-180221-1' into staging
Plugin updates:
- expose vdev name in PCI memory registration
- new hwprofile plugin
- bunch of style cleanups to contrib/plugins
- fix call signature of inline instrumentation
- re-factor the io_recompile code to push specialisation into hooks
- add some acceptance tests for the plugins
- clean-up and remove CF_NOCACHE handling from TCG
- fix instrumentation of cpu_io_recompile sections
- expand tests to check inline and cb count the same
# gpg: Signature made Thu 18 Feb 2021 08:24:57 GMT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-plugin-updates-180221-1: (23 commits)
tests/acceptance: add a memory callback check
tests/plugin: allow memory plugin to do both inline and callbacks
tests/acceptance: add a new tests to detect counting errors
accel/tcg: allow plugin instrumentation to be disable via cflags
accel/tcg: remove CF_NOCACHE and special cases
accel/tcg: re-factor non-RAM execution code
accel/tcg: cache single instruction TB on pending replay exception
accel/tcg: actually cache our partial icount TB
tests/acceptance: add a new set of tests to exercise plugins
tests/plugin: expand insn test to detect duplicate instructions
target/sh4: Create superh_io_recompile_replay_branch
target/mips: Create mips_io_recompile_replay_branch
accel/tcg: Create io_recompile_replay_branch hook
exec: Move TranslationBlock typedef to qemu/typedefs.h
accel/tcg/plugin-gen: fix the call signature for inline callbacks
contrib: Open brace '{' following struct go on the same line
contrib: space required after that ','
contrib: Add spaces around operator
contrib: Fix some code style problems, ERROR: "foo * bar" should be "foo *bar"
contrib: Don't use '#' flag of printf format
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This test makes sure that the inline and callback based memory checks
count the same number of accesses.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210213130325.14781-24-alex.bennee@linaro.org>
This is going to be useful for acceptance tests that check both types
are being called the same number of times, especially when icount is
enabled.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210213130325.14781-23-alex.bennee@linaro.org>
The insn plugin has a simple heuristic to detect if an instruction is
detected running twice in a row. Check the plugin log after the run
and pass accordingly.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210213130325.14781-22-alex.bennee@linaro.org>
When icount is enabled and we recompile an MMIO access we end up
double counting the instruction execution. To avoid this we introduce
the CF_MEMI cflag which only allows memory instrumentation for the
next TB (which won't yet have been counted). As this is part of the
hashed compile flags we will only execute the generated TB while
coming out of a cpu_io_recompile.
While we are at it delete the old TODO. We might as well keep the
translation handy as it's likely you will repeatedly hit it on each
MMIO access.
Reported-by: Aaron Lindsay <aaron@os.amperecomputing.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Aaron Lindsay <aaron@os.amperecomputing.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-21-alex.bennee@linaro.org>
Now we no longer generate CF_NOCACHE blocks we can remove a bunch of
the special case handling for them. While we are at it we can remove
the unused tb->orig_tb field and save a few bytes on the TB structure.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-20-alex.bennee@linaro.org>
There is no real need to use CF_NOCACHE here. As long as the TB isn't
linked to other TBs or included in the QHT or jump cache then it will
only get executed once.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>
Again there is no reason to jump through the nocache hoops to execute
a single instruction block. We do have to add an additional wrinkle to
the cpu_handle_interrupt case to ensure we let through a TB where we
have specifically disabled icount for the block.
As the last user of cpu_exec_nocache we can now remove the function.
Further clean-up will follow in subsequent patches.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-18-alex.bennee@linaro.org>
When we exit a block under icount with instructions left to execute we
might need a shorter than normal block to take us to the next
deterministic event. Instead of creating a throwaway block on demand
we use the existing compile flags mechanism to ensure we fetch (or
compile and fetch) a block with exactly the number of instructions we
need.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-17-alex.bennee@linaro.org>
This is just a simple test to count the instructions executed by a
kernel. However a later test will detect a failure condition when
icount is enabled.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210213130325.14781-16-alex.bennee@linaro.org>
A duplicate insn is one that is appears to be executed twice in a row.
This is currently possible due to -icount and cpu_io_recompile()
causing a re-translation of a block. On it's own this won't trigger
any tests though.
The heuristics that the plugin use can't deal with the x86 rep
instruction which (validly) will look like executing the same
instruction several times. To avoid problems later we tweak the rules
for x86 to run the "inline" version of the plugin. This also has the
advantage of increasing coverage of the plugin code (see bugfix in
previous commit).
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-15-alex.bennee@linaro.org>
Move the code from accel/tcg/translate-all.c to target/sh4/cpu.c.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210208233906.479571-5-richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-14-alex.bennee@linaro.org>
Move the code from accel/tcg/translate-all.c to target/mips/cpu.c.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210208233906.479571-4-richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-13-alex.bennee@linaro.org>
Create a hook in which to split out the mips and
sh4 ifdefs from cpu_io_recompile.
[AJB: s/stoped/stopped/]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210208233906.479571-3-richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-12-alex.bennee@linaro.org>
This also means we don't need an extra declaration of
the structure in hw/core/cpu.h.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210208233906.479571-2-richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-11-alex.bennee@linaro.org>
A recent change to the handling of constants in TCG changed the
pattern of ops emitted for a constant add. We no longer emit a mov and
the constant can be applied directly to the TCG_op_add arguments. This
was causing SEGVs when running the insn plugin with arg=inline. Fix
this by updating copy_add_i64 to do the right thing while also adding
a comment at the top of the append section as an aide memoir if
something like this happens again.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Emilio G. Cota <cota@braap.org>
Message-Id: <20210213130325.14781-10-alex.bennee@linaro.org>
I found some style problems whil check the code using checkpatch.pl.
This commit fixs the issue below:
ERROR: that open brace { should be on the previous line
Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-6-zhouyang789@huawei.com>
Message-Id: <20210213130325.14781-9-alex.bennee@linaro.org>
I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the issue below:
ERROR: space required after that ','
Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-5-zhouyang789@huawei.com>
Message-Id: <20210213130325.14781-8-alex.bennee@linaro.org>
I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the issue below:
ERROR: spaces required around that '*'
Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-4-zhouyang789@huawei.com>
Message-Id: <20210213130325.14781-7-alex.bennee@linaro.org>
I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the issue below:
ERROR: "foo * bar" should be "foo *bar"
Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-3-zhouyang789@huawei.com>
Message-Id: <20210213130325.14781-6-alex.bennee@linaro.org>
I am reading contrib related code and found some style problems while
check the code using checkpatch.pl. This commit fixs the misuse of
'#' flag of printf format
Signed-off-by: zhouyang <zhouyang789@huawei.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210118031004.1662363-2-zhouyang789@huawei.com>
Message-Id: <20210213130325.14781-5-alex.bennee@linaro.org>
This is a plugin intended to help with profiling access to various
bits of system hardware. It only really makes sense for system
emulation.
It takes advantage of the recently exposed helper API that allows us
to see the device name (memory region name) associated with a device.
You can specify arg=read or arg=write to limit the tracking to just
reads or writes (by default it does both).
The pattern option:
-plugin ./tests/plugin/libhwprofile.so,arg=pattern
will allow you to see the access pattern to devices, eg:
gic_cpu @ 0xffffffc010040000
off:00000000, 8, 1, 8, 1
off:00000000, 4, 1, 4, 1
off:00000000, 2, 1, 2, 1
off:00000000, 1, 1, 1, 1
The source option:
-plugin ./tests/plugin/libhwprofile.so,arg=source
will track the virtual source address of the instruction making the
access:
pl011 @ 0xffffffc010031000
pc:ffffffc0104c785c, 1, 4, 0, 0
pc:ffffffc0104c7898, 1, 4, 0, 0
pc:ffffffc010512bcc, 2, 1867, 0, 0
You cannot mix source and pattern.
Finally the match option allow you to limit the tracking to just the
devices you care about.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Robert Foley <robert.foley@linaro.org>
Reviewed-by: Robert Foley <robert.foley@linaro.org>
Message-Id: <20210213130325.14781-4-alex.bennee@linaro.org>
This may well end up being anonymous but it should always be unique.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Clement Deschamps <clement.deschamps@greensocs.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-3-alex.bennee@linaro.org>
When viewing/debugging memory regions it is sometimes hard to figure
out which PCI device something belongs to. Make the names unique by
including the vdev name in the name string.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210213130325.14781-2-alex.bennee@linaro.org>
Vivek's support for new FUSE KILLPRIV_V2
and some smaller cleanups.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmAsEDgACgkQBRYzHrxb
/ee1Hg/+PXJXLzTZMPdM97BLGyE5k7jgUXfYSCC+VTC87PR/nOUV7LAU12Lqdmiz
NZG9GoqdRusw4vjc1FF/GcqJYWERrknWDjbpNB3rUfP1I87mWuMXk9HKfubnqP78
JeeLx6O2q7O8V9CLj6lFcX96fa8umSJjYvpHB7jGuvmfRUgNOa3f9QiU7I9ySThn
lYCcpqbd/k27eFGAzjy5T5l1SVZTnbVWsM0QoIjRZP27aublECDKJva9owX8m2AC
50QAoS5hhNIc1mYe45xAB3fibWFh8oD3Onfr1HPRjDLOyL9F8rXZeQibzLLIFwIc
uC2fYQ07Mywt91f6s6ns9IAxfbnUtujK2Wxcc38xr7Cs5WoHXFhqjPBYPuWTl3Mo
XFBRH2J5CQKsU0BswOTdYo6DRwJqASSXyD6aEm/Rl5GibIOPE86+UedV7mPLJNYM
6cWbC9zxlS7ImjUxrTHu4zaVCLJz4AO+Z12uT4KbbieDS6mRznrBwZbkiEho9wR1
A5UVTs7vAWpthdHJFvImV4SwjkmcYx+LhQcIqK7HdJIG9fRP4jnrnsHhCxaQQPIn
4yl+jgQbN45imRazdIiFTCIJ1MwGU7dxOX3OQI6FjTpGENHve3XuO1h1c025TtcD
7gheE0sgVSsddEMAvl8zQ1ZW/eQyL/aTI4ctgUkxiwN5gct9L+k=
=3eKx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210216' into staging
virtiofsd pull 2021-02-16
Vivek's support for new FUSE KILLPRIV_V2
and some smaller cleanups.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Tue 16 Feb 2021 18:34:32 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert-gitlab/tags/pull-virtiofs-20210216:
virtiofsd: Do not use a thread pool by default
viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2
virtiofsd: Save error code early at the failure callsite
tools/virtiofsd: Replace the word 'whitelist'
virtiofsd: vu_dispatch locking should never fail
virtiofsd: Allow to build it without the tools
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently we created a thread pool (With 64 max threads per pool) for
each virtqueue. We hoped that this will provide us with better scalability
and performance.
But in practice, we are getting better numbers in most of the cases
when we don't create a thread pool at all and a single thread per
virtqueue receives the request and processes it.
Hence, I am proposing that we switch to no thread pool by default
(equivalent of --thread-pool-size=0). This will provide out of
box better performance to most of the users. In fact other users
have confirmed that not using a thread pool gives them better
numbers. So why not use this as default. It can be changed when
somebody can fix the issues with thread pool performance.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210210182744.27324-2-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd
can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2".
By default this is enabled as long as client supports it
Enabling this option helps with performance in write path. Without this
option, currently every write is first preceeded with a getxattr() operation
to find out if security.capability is set. (Write is supposed to clear
security.capability). With this option enabled, server is signing up for
clearing security.capability on every WRITE and also clearing suid/sgid
subject to certain rules. This gets rid of extra getxattr() call for every
WRITE and improves performance. This is true when virtiofsd is run with
option -o xattr.
What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server implementation.
It needs to adhere to following rules. Thanks to Miklos for this summary.
- clear "security.capability" on write, truncate and chown unconditionally
- clear suid/sgid in case of following. Note, sgid is cleared only if
group executable bit is set.
o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
o setattr has FATTR_UID or FATTR_GID
o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
o write has FUSE_WRITE_KILL_SUIDGID
>From Linux VFS client perspective, here are the requirements.
- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
only if caller does not have CAP_FSETID as well as file has group execute
permission.
virtiofsd implementation has not changed much to adhere to above ruls. And
reason being that current assumption is that we are running on Linux
and on top of filesystems like ext4/xfs which already follow above rules.
On write, truncate, chown, seucurity.capability is cleared. And virtiofsd
drops CAP_FSETID if need be and that will lead to clearing of suid/sgid.
But if virtiofsd is running on top a filesystem which breaks above assumptions,
then it will have to take extra actions to emulate above. That's a TODO
for later when need arises.
Note: create normally is supposed to be called only when file does not
exist. So generally there should not be any question of clearing
setuid/setgid. But it is possible that after client checks that
file is not present, some other client creates file on server
and this race can trigger sending FUSE_CREATE. In that case, if
O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID
is also set.
v3:
- Resolved conflicts due to lo_inode_open() changes.
- Moved capability code in lo_do_open() so that both lo_open() and
lo_create() can benefit from common code.
- Dropped changes to kernel headers as these are part of qemu already.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210208224024.43555-3-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Change error code handling slightly in lo_setattr(). Right now we seem
to jump to out_err and assume that "errno" is valid and use that to
send reply.
But if caller has to do some other operations before jumping to out_err,
then it does the dance of first saving errno to saverr and the restore
errno before jumping to out_err. This makes it more confusing.
I am about to make more changes where caller will have to do some
work after error before jumping to out_err. I found it easier to
change the convention a bit. That is caller saves error in "saverr"
before jumping to out_err. And out_err uses "saverr" to send error
back and does not rely on "errno" having actual error.
v3: Resolved conflicts in lo_setattr() due to lo_inode_open() changes.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210208224024.43555-2-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Follow the inclusive terminology from the "Conscious Language in your
Open Source Projects" guidelines [*] and replace the words "whitelist"
appropriately.
[*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210205171817.2108907-3-philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
pthread_rwlock_rdlock() and pthread_rwlock_wrlock() can fail if a
deadlock condition is detected or the current thread already owns
the lock. They can also fail, like pthread_rwlock_unlock(), if the
mutex wasn't properly initialized. None of these are ever expected
to happen with fv_VuDev::vu_dispatch_rwlock.
Some users already check the return value and assert, some others
don't. Introduce rdlock/wrlock/unlock wrappers that just do the
former and use them everywhere for improved consistency and
robustness.
This is just cleanup. It doesn't fix any actual issue.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210203182434.93870-1-groug@kaod.org>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This changed the Meson build script to allow virtiofsd be built even
though the tools build is disabled, thus honoring the --enable-virtiofsd
option.
Fixes: cece116c93 (configure: add option for virtiofsd)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-Id: <20210201211456.1133364-2-wainersm@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Record/replay provides REPLAY_CLOCK_LOCKED macro to access
the clock when vm_clock_seqlock is locked. This macro is
needed because replay internals operate icount. In locked case
replay use icount_get_raw_locked for icount request, which prevents
excess locking which leads to deadlock. But previously only
record code used *_locked function and replay did not.
Therefore sometimes clock access lead to deadlocks.
This patch fixes clock access for replay too and uses *_locked
icount access function.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <161347990483.1313189.8371838968343494161.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Otherwise the call to event_notifier_set() is a nop, which causes
the SLOF firmware on POWER to hang when booting from a virtio-scsi
device:
virtio_scsi_dataplane_start()
virtio_scsi_vring_init()
virtio_bus_set_host_notifier() <- assign == true
event_notifier_init() <- active == 1
event_notifier_set() <- fails right away if !e->initialized
Fixes: e34e47eb28 ("event_notifier: handle initialization failure better")
Cc: mlevitsk@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210216120247.1293569-1-groug@kaod.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The CPUID function 1 has a bit called OSXSAVE which tells user space the
status of the CR4.OSXSAVE bit. Our generic CPUID function injects that bit
based on the status of CR4.
With Hypervisor.framework, we do not synchronize full CPU state often enough
for this function to see the CR4 update before guest user space asks for it.
To be on the save side, let's just always synchronize it when we receive a
CPUID(1) request. That way we can set the bit with real confidence.
Reported-by: Asad Ali <asad@osaro.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20210123004129.6364-1-agraf@csgraf.de>
[RB: resolved conflict with another CPUID change]
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some guests (ex. Darwin-XNU) can attemp to read this MSR to retrieve and
validate CPU topology comparing it to ACPI MADT content
MSR description from Intel Manual:
35H: MSR_CORE_THREAD_COUNT: Configured State of Enabled Processor Core
Count and Logical Processor Count
Bits 15:0 THREAD_COUNT The number of logical processors that are
currently enabled in the physical package
Bits 31:16 Core_COUNT The number of processor cores that are currently
enabled in the physical package
Bits 63:32 Reserved
Signed-off-by: Vladislav Yaroshchuk <yaroshchuk2000@gmail.com>
Message-Id: <20210113205323.33310-1-yaroshchuk2000@gmail.com>
[RB: reordered MSR definition and dropped u suffix from shift offset]
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The hvf i386 has a few struct and cpp definitions that are never
used. Remove them.
Suggested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20210120224444.71840-3-agraf@csgraf.de>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For `-accel hvf` cpu_x86_cpuid() is wrapped with hvf_cpu_x86_cpuid() to
add paravirtualization cpuid leaf 0x40000010
https://lkml.org/lkml/2008/10/1/246
Leaf 0x40000010, Timing Information:
EAX: (Virtual) TSC frequency in kHz.
EBX: (Virtual) Bus (local apic timer) frequency in kHz.
ECX, EDX: RESERVED (Per above, reserved fields are set to zero).
On macOS TSC and APIC Bus frequencies can be readed by sysctl call with
names `machdep.tsc.frequency` and `hw.busfrequency`
This options is required for Darwin-XNU guest to be synchronized with
host
Leaf 0x40000000 not exposes HVF leaving hypervisor signature empty
Signed-off-by: Vladislav Yaroshchuk <yaroshchuk2000@gmail.com>
Message-Id: <20210122150518.3551-1-yaroshchuk2000@gmail.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This prevents illegal instruction on cpus that do not support xgetbv.
Buglink: https://bugs.launchpad.net/qemu/+bug/1758819
Reviewed-by: Cameron Esfahani <dirty@apple.com>
Signed-off-by: Hill Ma <maahiuzeon@gmail.com>
Message-Id: <X/6OJ7qk0W6bHkHQ@Hills-Mac-Pro.local>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>