In this mode, the DSPI controller uses PIO to transfer word by word. In
comparison, in EOQ mode the 4-word deep FIFO is being used, hence the
current logic will need some adaptation for which I do not have the
hardware (Coldfire) to test. It is not clear what is the timing of DMA
transfers and whether timestamping in the driver brings any overall
performance increase compared to regular timestamping done in the core.
Short phc2sys summary after 58 minutes of running on LS1021A-TSN with
interrupts disabled during the critical section:
offset: min -26251 max 16416 mean -21.8672 std dev 863.416
delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
lost servo lock 3 times
Summary of the same phc2sys service running for 120 minutes with
interrupts disabled:
offset: min -378 max 381 mean -0.0083089 std dev 101.495
delay: min 4720 max 5920 mean 5129.38 std dev 154.899
lost servo lock 0 times
The minimum delay (pre to post time) in nanoseconds is the same, but the
maximum delay is quite a bit higher, due to interrupts getting sometimes
executed and interfering with the measurement. Hence set disable_irqs
whenever possible (aka when the driver runs in poll mode - otherwise it
would be a contradiction in terms).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-4-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
SPI is one of the interfaces used to access devices which have a POSIX
clock driver (real time clocks, 1588 timers etc). The fact that the SPI
bus is slow is not what the main problem is, but rather the fact that
drivers don't take a constant amount of time in transferring data over
SPI. When there is a high delay in the readout of time, there will be
uncertainty in the value that has been read out of the peripheral.
When that delay is constant, the uncertainty can at least be
approximated with a certain accuracy which is fine more often than not.
Timing jitter occurs all over in the kernel code, and is mainly caused
by having to let go of the CPU for various reasons such as preemption,
servicing interrupts, going to sleep, etc. Another major reason is CPU
dynamic frequency scaling.
It turns out that the problem of retrieving time from a SPI peripheral
with high accuracy can be solved by the use of "PTP system
timestamping" - a mechanism to correlate the time when the device has
snapshotted its internal time counter with the Linux system time at that
same moment. This is sufficient for having a precise time measurement -
it is not necessary for the whole SPI transfer to be transmitted "as
fast as possible", or "as low-jitter as possible". The system has to be
low-jitter for a very short amount of time to be effective.
This patch introduces a PTP system timestamping mechanism in struct
spi_transfer. This is to be used by SPI device drivers when they need to
know the exact time at which the underlying device's time was
snapshotted. More often than not, SPI peripherals have a very exact
timing for when their SPI-to-interconnect bridge issues a transaction
for snapshotting and reading the time register, and that will be
dependent on when the SPI-to-interconnect bridge figures out that this
is what it should do, aka as soon as it sees byte N of the SPI transfer.
Since spi_device drivers are the ones who'd know best how the peripheral
behaves in this regard, expose a mechanism in spi_transfer which allows
them to specify which word (or word range) from the transfer should be
timestamped.
Add a default implementation of the PTP system timestamping in the SPI
core. This is not going to be satisfactory performance-wise, but should
at least increase the likelihood that SPI device drivers will use PTP
system timestamping in the future.
There are 3 entry points from the core towards the SPI controller
drivers:
- transfer_one: The driver is passed individual spi_transfers to
execute. This is the easiest to timestamp.
- transfer_one_message: The core passes the driver an entire spi_message
(a potential batch of spi_transfers). The core puts the same pre and
post timestamp to all transfers within a message. This is not ideal,
but nothing better can be done by default anyway, since the core has
no insight into how the driver batches the transfers.
- transfer: Like transfer_one_message, but for unqueued drivers (i.e.
the driver implements its own queue scheduling).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-3-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This driver doesn't do anything with the match for the device node. The
logic is the same as looking to see if a device node exists or not
because this driver wouldn't probe unless there is a device node match
when the device is created from DT. Just test for the presence of the
device node to simplify and avoid referencing a potentially undefined
match table when CONFIG_OF=n.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: <linux-spi@vger.kernel.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20191004214334.149976-9-swboyd@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
With this patch, the "interrupts" property from the device tree bindings
is ignored, even if present, if the driver runs in TCFQ mode.
Switching to using the DSPI in poll mode has several distinct
benefits:
- With interrupts, the DSPI driver in TCFQ mode raises an IRQ after each
transmitted word. There is more time wasted for the "waitq" event than
for actual I/O. And the DSPI IRQ count can easily get the largest in
/proc/interrupts on Freescale boards with attached SPI devices.
- The SPI I/O time is both lower, and more consistently so. Attached to
some Freescale devices are either PTP switches, or SPI RTCs. For
reading time off of a SPI slave device, it is important that all SPI
transfers take a deterministic time to complete.
- In poll mode there is much less time spent by the CPU in hardirq
context, which helps with the response latency of the system, and at
the same time there is more control over when interrupts must be
disabled (to get a precise timestamp measurement, which will come in a
future patch): win-win.
On the LS1021A-TSN board, where the SPI device is a SJA1105 PTP switch
(with a bits_per_word=8 driver), I created a "benchmark" where I
periodically transferred a 12-byte message once per second, for 120
seconds. I then recorded the time before putting the first byte in the
TX FIFO, and the time after reading the last byte from the RX FIFO. That
is the transfer delay in nanoseconds.
Interrupt mode:
delay: min 125120 max 168320 mean 150286 std dev 17675.3
Poll mode:
delay: min 69440 max 119040 mean 70312.9 std dev 8065.34
Both the mean latency and the standard deviation are more than 50% lower
in poll mode than in interrupt mode, and the 'max' in poll mode is lower
than the 'min' in interrupt mode. This is with an 'ondemand' governor on
an otherwise idle system - therefore running mostly at 600 MHz out of a
max of 1200 MHz.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20191001205216.32115-1-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Different platforms have different Master with different SourceID on
AHB bus. The 0X0E Master ID is used by cluster 3 in case of LS2088A.
So, patch introduce an invalid master id variable to fix invalid
mastered on different platforms.
Signed-off-by: Suresh Gupta <suresh.gupta@nxp.com>
Signed-off-by: Kuldeep Singh <kuldeep.singh@nxp.com>
Link: https://lore.kernel.org/r/1569920356-8953-1-git-send-email-kuldeep.singh@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This change provides the dspi_slave_abort() function, which is a callback
for slave_abort() method of SPI controller generic driver.
As in the SPI slave mode the transmission is driven by master, any
distortion may cause the slave to enter undefined internal state.
To avoid this problem the dspi_slave_abort() terminates all pending and
ongoing DMA transactions (with sync) and clears internal FIFOs.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Link: https://lore.kernel.org/r/20190924110547.14770-3-lukma@denx.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Simplify this function implementation by using a known wrapper function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://lore.kernel.org/r/178bb78e-714f-645f-d819-5732870c4272@web.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Simplify this function implementation by using a known wrapper function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://lore.kernel.org/r/225b76ca-a367-4bef-d8ce-42c7af9242a5@web.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Simplify this function implementation by using a known wrapper function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://lore.kernel.org/r/478e0df1-e800-8cf1-f9b3-d72f8e26aa0b@web.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Simplify this function implementation by using a known wrapper function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://lore.kernel.org/r/230495a7-b754-bc6a-05e0-059a6b6c643d@web.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Make use of a core helper to ensure the desired width is respected
when calling spi-mem operators.
Suggested-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20190919202504.9619-2-miquel.raynal@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
AV32 support has been from the kernel a few release ago, but there was
still some specific macro for this architecture in this driver. Lets
remove it.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Link: https://lore.kernel.org/r/20190919154034.7489-1-gregory.clement@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The Renesas RZ/N1 SPI Controller is based on the Synopsys DW SSI, but has
additional registers for software CS control and DMA. This patch does not
address the changes required for DMA support, it simply adds the compatible
string. The CS registers are not needed as Linux can use gpios for the CS
signals.
Signed-off-by: Gareth Williams <gareth.williams.jx@renesas.com>
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Link: https://lore.kernel.org/r/1568793876-9009-5-git-send-email-gareth.williams.jx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Enable runtime PM so that the clock used to access the registers in the
peripheral is turned on using a clock domain.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Gareth Williams <gareth.williams.jx@renesas.com>
Link: https://lore.kernel.org/r/1568793876-9009-4-git-send-email-gareth.williams.jx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXYAIeQAKCRBiEGxRG/Sl
2/SzAQDEnoNxzV/R5kWFd+2kmFeY3cll0d99KMrWJ8om+kje6QD/cXxZHzFm+T1L
UPF66k76oOODV7cyndjXnTnRXbeCRAM=
=Szby
-----END PGP SIGNATURE-----
Merge tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
"In this cycle we've finally managed to contribute the patch set
sorting out LED naming issues. Besides that there are many changes
scattered among various LED class drivers and triggers.
LED naming related improvements:
- add new 'function' and 'color' fwnode properties and deprecate
'label' property which has been frequently abused for conveying
vendor specific names that have been available in sysfs anyway
- introduce a set of standard LED_FUNCTION* definitions
- introduce a set of standard LED_COLOR_ID* definitions
- add a new {devm_}led_classdev_register_ext() API with the
capability of automatic LED name composition basing on the
properties available in the passed fwnode; the function is
backwards compatible in a sense that it uses 'label' data, if
present in the fwnode, for creating LED name
- add tools/leds/get_led_device_info.sh script for retrieving LED
vendor, product and bus names, if applicable; it also performs
basic validation of an LED name
- update following drivers and their DT bindings to use the new LED
registration API:
- leds-an30259a, leds-gpio, leds-as3645a, leds-aat1290, leds-cr0014114,
leds-lm3601x, leds-lm3692x, leds-lp8860, leds-lt3593, leds-sc27xx-blt
Other LED class improvements:
- replace {devm_}led_classdev_register() macros with inlines
- allow to call led_classdev_unregister() unconditionally
- switch to use fwnode instead of be stuck with OF one
LED triggers improvements:
- led-triggers:
- fix dereferencing of null pointer
- fix a memory leak bug
- ledtrig-gpio:
- GPIO 0 is valid
Drop superseeded apu2/3 support from leds-apu since for apu2+ a newer,
more complete driver exists, based on a generic driver for the AMD
SOCs gpio-controller, supporting LEDs as well other devices:
- drop profile field from priv data
- drop iosize field from priv data
- drop enum_apu_led_platform_types
- drop superseeded apu2/3 led support
- add pr_fmt prefix for better log output
- fix error message on probing failure
Other misc fixes and improvements to existing LED class drivers:
- leds-ns2, leds-max77650:
- add of_node_put() before return
- leds-pwm, leds-is31fl32xx:
- use struct_size() helper
- leds-lm3697, leds-lm36274, leds-lm3532:
- switch to use fwnode_property_count_uXX()
- leds-lm3532:
- fix brightness control for i2c mode
- change the define for the fs current register
- fixes for the driver for stability
- add full scale current configuration
- dt: Add property for full scale current.
- avoid potentially unpaired regulator calls
- move static keyword to the front of declarations
- fix optional led-max-microamp prop error handling
- leds-max77650:
- add of_node_put() before return
- add MODULE_ALIAS()
- Switch to fwnode property API
- leds-as3645a:
- fix misuse of strlcpy
- leds-netxbig:
- add of_node_put() in netxbig_leds_get_of_pdata()
- remove legacy board-file support
- leds-is31fl319x:
- simplify getting the adapter of a client
- leds-ti-lmu-common:
- fix coccinelle issue
- move static keyword to the front of declaration
- leds-syscon:
- use resource managed variant of device register
- leds-ktd2692:
- fix a typo in the name of a constant
- leds-lp5562:
- allow firmware files up to the maximum length
- leds-an30259a:
- fix typo
- leds-pca953x:
- include the right header"
* tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (72 commits)
leds: lm3532: Fix optional led-max-microamp prop error handling
led: triggers: Fix dereferencing of null pointer
leds: ti-lmu-common: Move static keyword to the front of declaration
leds: lm3532: Move static keyword to the front of declarations
leds: trigger: gpio: GPIO 0 is valid
leds: pwm: Use struct_size() helper
leds: is31fl32xx: Use struct_size() helper
leds: ti-lmu-common: Fix coccinelle issue in TI LMU
leds: lm3532: Avoid potentially unpaired regulator calls
leds: syscon: Use resource managed variant of device register
leds: Replace {devm_}led_classdev_register() macros with inlines
leds: Allow to call led_classdev_unregister() unconditionally
leds: lm3532: Add full scale current configuration
dt: lm3532: Add property for full scale current.
leds: lm3532: Fixes for the driver for stability
leds: lm3532: Change the define for the fs current register
leds: lm3532: Fix brightness control for i2c mode
leds: Switch to use fwnode instead of be stuck with OF one
leds: max77650: Switch to fwnode property API
led: triggers: Fix a memory leak bug
...
RST conversion is happily mostly behind us.
- A new document on reproducible builds.
- We finally got around to zapping the documentation for hardware support
that was removed in 2004; one doesn't want to rush these things.
- The usual assortment of fixes, typo corrections, etc.
You'll still find a handful of annoying conflicts against other trees,
mostly tied to the last RST conversions; resolutions are straightforward
and the linux-next ones are good.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl1/J4IACgkQF0NaE2wM
flhYogf9EgYozCe8RocSq+JjJpZOSFjIGDQv+GwTjOBIdqgO9tSIaY/p0wSkYKil
jYXyMDF+Xwr8podsUep2F7akBM7j9XJ+XBGJcfOna0ypC9xoejMgWt9fU3YvaWge
dQJxIQ/iwkDlKNx6uOYgKysLUWFS0EP/nzPhqBo4bZZzhugvrR46D/nQqFNmGihd
l9yLalJtP5mC0XRUv3hpdAFFFKxdC0R3BGOel2V+slSClp0LEgpdMAuMaKydEDI3
Ch9ZpIp8fB8kqONCs9/X6083WRsDOMe28KgeGrGHo4Jla6u51QBLQjSVKttFv7xk
051yNJwDWMxgl+A4gyNLDPXM7Gd7HQ==
=v4dp
-----END PGP SIGNATURE-----
Merge tag 'docs-5.4' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's a somewhat calmer cycle for docs this time, as the churn of the
mass RST conversion is happily mostly behind us.
- A new document on reproducible builds.
- We finally got around to zapping the documentation for hardware
support that was removed in 2004; one doesn't want to rush these
things.
- The usual assortment of fixes, typo corrections, etc"
* tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
Documentation: kbuild: Add document about reproducible builds
docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
Documentation: Add "earlycon=sbi" to the admin guide
doc🔒 remove reference to clever use of read-write lock
devices.txt: improve entry for comedi (char major 98)
docs: mtd: Update spi nor reference driver
doc: arm64: fix grammar dtb placed in no attributes region
Documentation: sysrq: don't recommend 'S' 'U' before 'B'
mailmap: Update email address for Quentin Perret
docs: ftrace: clarify when tracing is disabled by the trace file
docs: process: fix broken link
Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
Documentation/arm/sa1100: Remove some obsolete documentation
docs/zh_CN: update Chinese howto.rst for latexdocs making
Documentation: virt: Fix broken reference to virt tree's index
docs: Fix typo on pull requests guide
kernel-doc: Allow anonymous enum
Documentation: sphinx: Don't parse socket() as identifier reference
Documentation: sphinx: Add missing comma to list of strings
...
The branch contains driver changes that are tightly
connected to SoC specific code. Aside from smaller
cleanups and bug fixes, here is a list of the notable
changes.
New device drivers:
- The Turris Mox router has a new "moxtet" bus driver
for its on-board pluggable extension bus. The
same platform also gains a firmware driver.
- The Samsung Exynos family gains a new Chipid driver
exporting using the soc device sysfs interface
- A similar socinfo driver for Qualcomm Snapdragon
chips.
- A firmware driver for the NXP i.MX DSP IPC protocol
using shared memory and a mailbox
Other changes:
- The i.MX reset controller driver now supports the
NXP i.MX8MM chip
- Amlogic SoC specific drivers gain support for
the S905X3 and A311D chips
- A rework of the TI Davinci framebuffer driver to
allow important cleanups in the platform code
- A couple of device drivers for removed ARM SoC
platforms are removed. Most of the removals were
picked up by other maintainers, this contains
whatever was left.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdf6SUAAoJEJpsee/mABjZAfwP/01bXBOlGVusNH2zuh8IUSHb
//5sTdWpwa2ugRekLOJUOjo2p9Fu70yH6xr4RUHI0rcRjZA0xR3bZPx45gI8LRHQ
tfb25LaKqfgZjWMCJ8due1Lh7B6ffOQukryMtM/LoiCtqsy7b6aThEKaLpM9/Owl
t53o4wKaVQJK5He9JQom9NOZidkl7tYLHmDQTOXhX2UEA/i45vtfjdsEBvoFPbTx
+bYvlqs+SWlpDJk29j+oBOeKadPF+TFboLDiUCxH44MC3OsH51zjtKVBRTtbNMkb
ek/ci5x9hCeHcYSEigNq2EMzEln09Yxyvjk8U/jLiJ1h1kz3p5MjqJbVMF1rYXpe
ALuAwinM8Zv2o5/UOCkiQTWq79PtpOKHZKpNBXkaJ8kyqBLMSy8Fs3hCvXrDnjnQ
TC8jX7UBqHRV2rbQIYehAQAxTvcRgTbqusQGLkUJInlux6go57LoMYHPABpHftJV
kRdVeT0KzdCz1pvQwyekIog5hPLNTBi4jw6eQcOgeENvAea1MJa8lMMfKcVbIdS0
ZVvxLl+K6noEKAv5lSeHAzjXq+cQFr3zDCsWy351mJETDHmE8zjsaHN1SgbRYLEk
ZqzNwUYaPYBis38g85qaY/TSsJrWJ+jP8u7s9HTw3Oywg8SRy5vtW177s00/9VOd
PYZ2UpqUeX8cdvggqUUU
=lxFi
-----END PGP SIGNATURE-----
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"This contains driver changes that are tightly connected to SoC
specific code. Aside from smaller cleanups and bug fixes, here is a
list of the notable changes.
New device drivers:
- The Turris Mox router has a new "moxtet" bus driver for its
on-board pluggable extension bus. The same platform also gains a
firmware driver.
- The Samsung Exynos family gains a new Chipid driver exporting using
the soc device sysfs interface
- A similar socinfo driver for Qualcomm Snapdragon chips.
- A firmware driver for the NXP i.MX DSP IPC protocol using shared
memory and a mailbox
Other changes:
- The i.MX reset controller driver now supports the NXP i.MX8MM chip
- Amlogic SoC specific drivers gain support for the S905X3 and A311D
chips
- A rework of the TI Davinci framebuffer driver to allow important
cleanups in the platform code
- A couple of device drivers for removed ARM SoC platforms are
removed. Most of the removals were picked up by other maintainers,
this contains whatever was left"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (123 commits)
bus: uniphier-system-bus: use devm_platform_ioremap_resource()
soc: ti: ti_sci_pm_domains: Add support for exclusive and shared access
dt-bindings: ti_sci_pm_domains: Add support for exclusive and shared access
firmware: ti_sci: Allow for device shared and exclusive requests
bus: imx-weim: remove incorrect __init annotations
fbdev: remove w90x900/nuc900 platform drivers
spi: remove w90x900 driver
net: remove w90p910-ether driver
net: remove ks8695 driver
firmware: turris-mox-rwtm: Add sysfs documentation
firmware: Add Turris Mox rWTM firmware driver
dt-bindings: firmware: Document cznic,turris-mox-rwtm binding
bus: moxtet: fix unsigned comparison to less than zero
bus: moxtet: remove set but not used variable 'dummy'
ARM: scoop: Use the right include
dt-bindings: power: add Amlogic Everything-Else power domains bindings
soc: amlogic: Add support for Everything-Else power domains controller
fbdev: da8xx: use resource management for dma
fbdev: da8xx-fb: drop a redundant if
fbdev: da8xx-fb: use devm_platform_ioremap_resource()
...
The BCM2835 SPI driver currently sets the SPI_CONTROLLER_MUST_TX flag.
When performing an RX-only transfer, this flag causes the SPI core to
allocate and DMA-map a dummy buffer which is copied to the TX FIFO.
The dummy buffer is necessary because the chip is not capable of
automatically clocking out null bytes.
Avoid the overhead induced by the dummy buffer by preallocating a
reusable DMA transaction which fills the TX FIFO by cyclically copying
from the zero page. The transaction requires very little CPU time to
submit and generates no interrupts while running. Specifics are
provided in kerneldoc comments.
[Nathan Chancellor contributed a DMA mapping fixup for an early version
of this commit, hence his Signed-off-by.]
Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Link: https://lore.kernel.org/r/f45920af18dbf06e34129bbc406f53dc9c5d1075.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The BCM2835 SPI driver currently sets the SPI_CONTROLLER_MUST_RX flag.
When performing a TX-only transfer, this flag causes the SPI core to
allocate and DMA-map a dummy buffer into which the RX FIFO contents are
copied. The dummy buffer is necessary because the chip is not capable
of disabling the receiver or automatically throwing away received data.
Not reading the RX FIFO isn't an option either since transmission is
halted once it's full.
Avoid the overhead induced by the dummy buffer by preallocating a
reusable DMA transaction which cyclically clears the RX FIFO. The
transaction requires very little CPU time to submit and generates no
interrupts while running. Specifics are provided in kerneldoc comments.
With a ks8851 Ethernet chip attached to the SPI controller, I am seeing
a 30 us reduction in ping time with this commit (1.819 ms vs. 1.849 ms,
average of 100,000 packets) as well as a 2% reduction in CPU time
(75:08 vs. 76:39 for transmission of 5 GByte over the SPI bus).
The commit uses the TX DMA interrupt to signal completion of a transfer.
This interrupt is raised once all bytes have been written to the
TX FIFO and it is then necessary to busy-wait for the TX FIFO to become
empty before the transfer can be finalized. As an alternative approach,
I have explored using the SPI controller's DONE interrupt to detect
completion. This interrupt is signaled when the TX FIFO becomes empty,
avoiding the need to busy-wait. However latency deteriorates compared
to the present commit and surprisingly, CPU time is slightly higher as
well:
It turns out that in 45% of the cases, no busy-waiting is needed at all
and in 76% of the cases, less than 10 busy-wait iterations are
sufficient for the TX FIFO to drain. This was measured on an RT kernel.
On a vanilla kernel, wakeup latency is worse and thus fewer iterations
are needed. The measurements were made with an SPI clock of 20 MHz,
they may differ slightly for slower or faster clock speeds.
Previously we always used the RX DMA interrupt to signal completion of a
transfer. Using the TX DMA interrupt now introduces a race condition:
TX DMA is always started before RX DMA so that bytes are already clocked
out while RX DMA is still being set up. But if a TX-only transfer is
very short, then the TX DMA interrupt may occur before RX DMA is set up.
If the interrupt happens to occur on the same CPU, setup of RX DMA may
even be delayed until after the interrupt was handled.
I've solved this by having the TX DMA callback clear the RX FIFO while
busy-waiting for the TX FIFO to drain, thus avoiding a dependency on
setup of RX DMA. Additionally, I am using a lock-free mechanism with
two flags, tx_dma_active and rx_dma_active plus memory barriers to
terminate RX DMA either by the TX DMA callback or immediately after
setting it up, whichever wins the race. I've explored an alternative
approach which temporarily disables the TX DMA callback until RX DMA
has been set up (using tasklet_disable(), local_bh_disable() or
local_irq_save()), but the performance was minimally worse.
[Nathan Chancellor contributed a DMA mapping fixup for an early version
of this commit, hence his Signed-off-by.]
Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Link: https://lore.kernel.org/r/874949385f28251e2dcaa9494e39a27b50e9f9e4.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The BCM2835 SPI driver needs to set up the clock polarity in its
->prepare_message() hook before spi_transfer_one_message() asserts chip
select to avoid a gratuitous clock signal edge (cf. commit acace73df2
("spi: bcm2835: set up spi-mode before asserting cs-gpio")).
Precalculate the CS register value (which selects the clock polarity)
once in ->setup() and use that cached value in ->prepare_message() and
->transfer_one(). This avoids one MMIO read per message and one per
transfer, yielding a small latency improvement. Additionally, a
forthcoming commit will use the precalculated value to derive the
register value for clearing the RX FIFO, which will eliminate the need
for an RX dummy buffer when performing TX-only DMA transfers.
Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/r/d17c1d7fcdc97fffa961b8737cfd80eeb14f9416.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
__spi_alloc_controller() uses a single allocation to accommodate struct
spi_controller and the driver-private data, but places the latter behind
the former. This order does not guarantee cacheline alignment of the
driver-private data. (It does guarantee cacheline alignment of struct
spi_controller but the structure doesn't make any use of that property.)
Round up struct spi_controller to cacheline size. A forthcoming commit
leverages this to grant DMA access to driver-private data of the BCM2835
SPI master.
An alternative, less economical approach would be to use two allocations.
A third approach consists of reversing the order to conserve memory.
But Mark Brown is concerned that it may result in a performance penalty
on architectures that don't like unaligned accesses.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/01625b9b26b93417fb09d2c15ad02dfe9cdbbbe5.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The BCM2835 SPI driver uses a flag to keep track of whether a DMA
transfer is in progress.
The flag is used to avoid terminating DMA channels multiple times if a
transfer finishes orderly while simultaneously the SPI core invokes the
->handle_err() callback because the transfer took too long. However
terminating DMA channels multiple times is perfectly fine, so the flag
is unnecessary for this particular purpose.
The flag is also used to avoid invoking bcm2835_spi_undo_prologue()
multiple times under this race condition. However multiple *concurrent*
invocations can no longer happen since commit 2527704d84 ("spi:
bcm2835: Synchronize with callback on DMA termination") because the
->handle_err() callback now uses the _sync() variant when terminating
DMA channels.
The only raison d'être of the flag is therefore that
bcm2835_spi_undo_prologue() cannot cope with multiple *sequential*
invocations. Achieve that by setting tx_prologue to 0 at the end of
the function. Subsequent invocations thus become no-ops.
With that, the dma_pending flag becomes unnecessary, so drop it.
Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/r/062b03b7f86af77a13ce0ec3b22e0bdbfcfba10d.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 3bd7f6589f ("spi: bcm2835: Overcome sglist entry length
limitation") amended the BCM2835 SPI driver with support for DMA
transfers whose buffers are not aligned to 4 bytes and require more than
one sglist entry.
When testing this feature with upcoming commits to speed up TX-only and
RX-only transfers, I noticed that SPI transmission sometimes breaks.
A function introduced by the commit, bcm2835_spi_transfer_prologue(),
performs one or two PIO transmissions as a prologue to the actual DMA
transmission. It turns out that the breakage goes away if the DONE bit
in the CS register is set when ending such a PIO transmission.
The DONE bit signifies emptiness of the TX FIFO. According to the spec,
the bit is of type RO, so writing it should never have any effect.
Perhaps the spec is wrong and the bit is actually of type RW1C.
E.g. the I2C controller on the BCM2835 does have an RW1C DONE bit which
needs to be cleared by the driver. Another, possibly more likely
explanation is that it's a hardware erratum since the issue does not
occur consistently.
Either way, amend bcm2835_spi_transfer_prologue() to always write the
DONE bit.
Usually a transmission is ended by bcm2835_spi_reset_hw(). If the
transmission was successful, the TX FIFO is empty and thus the DONE bit
is set when bcm2835_spi_reset_hw() reads the CS register. The bit is
then written back to the register, so we happen to do the right thing.
However if DONE is not set, e.g. because transmission is aborted with
a non-empty TX FIFO, the bit won't be written by bcm2835_spi_reset_hw()
and it seems possible that transmission might subsequently break. To be
on the safe side, likewise amend bcm2835_spi_reset_hw() to always write
the bit.
Tested-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/r/edb004dff4af6106f6bfcb89e1a96391e96eb857.1564825752.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Simplify this function implementation by using a known function.
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/b2dd074a-1693-3aea-42b4-da1f5ec155c4@web.de
Signed-off-by: Mark Brown <broonie@kernel.org>
This helps a bit with line fitting now (the list_first_entry call) as
well as during the next patch which needs to iterate through all
transfers of ctlr->cur_msg so it timestamps them.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-2-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
drivers/spi/spi-npcm-fiu.c: In function npcm_fiu_read:
drivers/spi/spi-npcm-fiu.c:472:9: warning:
variable retlen set but not used [-Wunused-but-set-variable]
It is never used, so remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190905072436.23932-1-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-37-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-36-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-35-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-34-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-33-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-32-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-31-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-30-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-29-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-28-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-27-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-26-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-25-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-24-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-23-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-22-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-21-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190904135918.25352-20-yuehaibing@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>