2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-24 13:13:57 +08:00
Commit Graph

1015401 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
f274e29626 interconnect changes for 5.14
Here are changes for the 5.14-rc1 merge window consisting of interconnect
 driver updates.
 
 Driver changes:
 - New driver for SC7280 platforms.
 
 Signed-off-by: Georgi Djakov <djakov@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJg0c/OAAoJEIDQzArG2BZjNckP/jN5l3aYw/mbmffC1CUGa+nb
 OOzHoUdX8m9J5zC7ecjBBxsBnRPLY84CjzziM+RVab15N+QmQEW5tp4KiUfgJqIs
 bfmun8g+FxbSS71rCuykTH1/dbBGVixDFXVkquWNiBQwE+RRMGF0dOYs4BeXGf8B
 sqORnNrHSGjuXzg5CCVTJ6d45O46lFZHx2/4bxR1FbcrQqrhRiRkrlM6FlI/coyR
 +9cfZOd8+lDSC6j/S9XkDT75fHHvdpXNH/7wlZnyOlOHpE2CoEduu0OySrAEO9J9
 l1iEK9apgm+6hoC1xngbR84mYVLjlSaeih/fQJ5/0yrgbxWA6zyunEnUOuPDPOhZ
 2Ghgxwf8+TwhRI1PGO6rLxpQ7xldz36U6pJOPU91cJJjl71Ix8KtfbfpY5Phbj6R
 zM/TGLBuNEHN8+wSIyh766mnv6gLzfYbexyjIc3PoZgx8EytZOBPVcEfWlJxnDni
 k/EPPa/aUiytGCVl0WwDrMhJlITMpGaZfSxuTLyAblECcRxxk8gvMsFcEFk8HW8y
 WnonClC39kPK7+9CVzkPq1CMvetdbmCWsIiwvZohah5MDpg+ao8ZGe6uk8yZnOSQ
 3GME8Gv0OhUI1u9T9VNuRC4xrGmokKYcgOPsqgVKJMkmoyeh6oA1jE0w9IxhJkbZ
 AJgbpRg0hMvMFK+C+Wy8
 =ihE7
 -----END PGP SIGNATURE-----

Merge tag 'icc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next

Georgi writes:

interconnect changes for 5.14

Here are changes for the 5.14-rc1 merge window consisting of interconnect
driver updates.

Driver changes:
- New driver for SC7280 platforms.

Signed-off-by: Georgi Djakov <djakov@kernel.org>

* tag 'icc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
  interconnect: qcom: Add SC7280 interconnect provider driver
  dt-bindings: interconnect: Add Qualcomm SC7280 DT bindings
2021-06-22 22:03:25 +02:00
Tomas Winkler
4029238364 mei: revamp mei extension header structure layout.
The mei extension header was build as array of flexible structures
which will not work if actually more headers are added.
(Currently only vtag header was used).
Sparse reports:

drivers/misc/mei/hw.h:253:32: warning: array of flexible structures

Use basic type u8 for the variable sized extension.
Define explicitly mei_ext_hdr_vtag structure.
And also fix mei_ext_next() function to point correctly to the
end of the header.

Note: the headers are part of firmware interface and need to be __packed.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20210621193756.134027-2-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-22 12:40:31 +02:00
Tamar Mashiah
09f8c33a4c mei: fix kdoc in the driver
Over time the functions were renamed,
but this was not always reflected in kdoc, fix that.

Signed-off-by: Tamar Mashiah <tamar.mashiah@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20210621193756.134027-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-22 12:40:31 +02:00
Greg Kroah-Hartman
8254ee0e0a This tag contains habanalabs driver changes for v5.14:
- Change communication protocol with f/w. The new protocl allows better
   backward compatibility between different f/w versions and is more
   stable.
 - Send hard-reset cause to f/w after a hard-reset has happened.
 - Move to indirection when generating interrupts to f/w.
 - Better progress and error messages during the f/w load stage.
 - Recognize that f/w is with enabled security according to device ID.
 - Add validity check to event queue mechanism.
 - Add new event from f/w that will indicate a daemon has been terminated
   inside the f/w.
 
 - Move to TLB cache range invalidation in the device's MMU.
 - Disable memory scrubbing by default for performance.
 
 - Many fixes for sparse/smatch reported errors.
 - Enable by default stop-on-err in the ASIC.
 - Move to ASYNC device probing to speedup loading of driver in server
   with multiple devices.
 - Fix to stop using disabled NIC ports when doing collective operation.
 - Use standard error codes instead of positive values.
 - Add support for resetting device after user has finished using it.
 - Add debugfs option to avoid reset when a CS has got stuck.
 - Add print of the last 8 CS pointers in case of error in QMANs.
 - Add statistics on opening of the FD of a device.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE7TEboABC71LctBLFZR1NuKta54AFAmDRrdoTHG9nYWJiYXlA
 a2VybmVsLm9yZwAKCRBlHU24q1rngHvyCACheRh5ExpPOvFZkPT6l4pGekx1vJwy
 tJsYmILk5mWprRczeSskilxQMFZOGTQzQqN01c/Bl/94eyCcdmBoLNCFcrtDYcjh
 pBZft4tUWMKToPX3j5gpMospXg+CBsEIsltxKrrlQ1ZxgY5JWmcg1NZOTU32yMvC
 /9rpTxdpfda6870hY0kfoXjRfCAReENQCQkCNWi/DONmtneOmpDgJC7AQgW8gQcm
 pQBFwjvF3aweO5/R9pvJa3QhuwY5nWQDsLKGJvcPNThpEYJ230Yh6N33KQQUNsaz
 4Y5pUl5MS8Z6qz2Yd79bnRolWTSDP2QQhHRUnx7vh2rRsJKzr1QGP6Ck
 =MixT
 -----END PGP SIGNATURE-----

Merge tag 'misc-habanalabs-next-2021-06-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v5.14:

- Change communication protocol with f/w. The new protocl allows better
  backward compatibility between different f/w versions and is more
  stable.
- Send hard-reset cause to f/w after a hard-reset has happened.
- Move to indirection when generating interrupts to f/w.
- Better progress and error messages during the f/w load stage.
- Recognize that f/w is with enabled security according to device ID.
- Add validity check to event queue mechanism.
- Add new event from f/w that will indicate a daemon has been terminated
  inside the f/w.

- Move to TLB cache range invalidation in the device's MMU.
- Disable memory scrubbing by default for performance.

- Many fixes for sparse/smatch reported errors.
- Enable by default stop-on-err in the ASIC.
- Move to ASYNC device probing to speedup loading of driver in server
  with multiple devices.
- Fix to stop using disabled NIC ports when doing collective operation.
- Use standard error codes instead of positive values.
- Add support for resetting device after user has finished using it.
- Add debugfs option to avoid reset when a CS has got stuck.
- Add print of the last 8 CS pointers in case of error in QMANs.
- Add statistics on opening of the FD of a device.

* tag 'misc-habanalabs-next-2021-06-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (72 commits)
  habanalabs/gaudi: refactor hard-reset related code
  habanalabs/gaudi: add support for NIC DERR
  habanalabs: add validity check for signal cs
  habanalabs: get lower/upper 32 bits via masking
  habanalabs: allow reset upon device release
  debugfs: add skip_reset_on_timeout option
  habanalabs: fix typo
  habanalabs/gaudi: correct driver events numbering
  habanalabs: remove a rogue #ifdef
  habanalabs/gaudi: print last QM PQEs on error
  habanalabs/goya: add '__force' attribute to suppress false alarm
  habanalabs: added open_stats info ioctl
  habanalabs/gaudi: set the correct rc in case of err
  habanalabs/gaudi: update coresight configuration
  habanalabs: remove node from list before freeing the node
  habanalabs: set rc as 'valid' in case of intentional func exit
  habanalabs: zero complex structures using memset
  habanalabs: print more info when failing to pin user memory
  habanalabs: Fix an error handling path in 'hl_pci_probe()'
  habanalabs: print firmware versions
  ...
2021-06-22 12:37:19 +02:00
Greg Kroah-Hartman
1730a594ac soundwire updates for 5.14-rc1
Updates for v5.14-rc1 are:
 
 - Core has odd updates including improving clock stop codes, write api,
   handling ENODATA etc
 
  - Drivers has Big move of Intel driver to be aux dev and minor updates
    to Intel/cadence driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmDRcwkACgkQfBQHDyUj
 g0eDQg//RfLrqVuEiLRk0D3shOvxJszEtZkBsEkgXOcB6FlQyh/WG63GE8bwdW1W
 hanPmKrJqwH3zqPwKZlocVnbJ6GaAg8lY31rOiMfoj71cRBG5odGA4DZDx1KYe0l
 JdJxFmgaOgny1hZgeWXdjoODYXg8+a6R/Cl0HoZexA7lk2bK7tehfBTftuejDIzx
 W5NmvjJ4MueHfANE3dHfY+4PGMOtEuuSg50VFaE+xfoMeBXzikvWshtDsuC9R7ym
 dLYXfnKlWp55jhk1e0KWTKDyfzsWptQSOThYiHVgU5WkaBI8lHLKctscIxfianT0
 9RIcGCwuQDz2ARn0KLbsIEVyJdNsGmUAGg/tnLvsgHsg+AyvATuBVS5Q3t0QFdal
 NyDaGMhrGa1U+thx9n99zurNrjU+rOUS8BxjHKioCpJ/+/DQ12FEO8wF5T8LwLlB
 0zQyXbFB0KFYRyWvfWLFC1Hua8NlBTgrfwPhtjPyCRpUtIx/rrW1hwiNFVTw5SC/
 qaGnOFXlZ3vSManHCFZHk9ZC5SAahBtfLb7gorVyJT5Sl2amwV+o0NWcoZg+cgoj
 EElZThYhKY5FjEhvhtfXE4/vmQs1KVaNQ5FYIq2mCPzTI9MEuc8uzYkomXTmHoAY
 LGD3MwbLRQcwaXEKwSCvg07bPDM891GiHOUEjP6RBMsBBvVb6Mg=
 =JGCd
 -----END PGP SIGNATURE-----

Merge tag 'soundwire-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-next

Vinod writes:

soundwire updates for 5.14-rc1

Updates for v5.14-rc1 are:

- Core has odd updates including improving clock stop codes, write api,
  handling ENODATA etc

 - Drivers has Big move of Intel driver to be aux dev and minor updates
   to Intel/cadence driver

* tag 'soundwire-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: stream: Fix test for DP prepare complete
  soundwire: bus: Make sdw_nwrite() data pointer argument const
  soundwire: intel: move to auxiliary bus
  soundwire: cadence: remove the repeated declaration
  soundwire: dmi-quirks: remove duplicate initialization
  soundwire: cadence_master: always set CMD_ACCEPT
  soundwire: bus: add missing \n in dynamic debug
  soundwire: bus: handle -ENODATA errors in clock stop/start sequences
  soundwire: add missing kernel-doc description
  soundwire: bus: only use CLOCK_STOP_MODE0 and fix confusions
  soundwire: bandwidth allocation: improve error messages
  soundwire/ASoC: add leading zeroes in peripheral device name
2021-06-22 12:36:29 +02:00
Koby Elbaz
b7a71fddc0 habanalabs/gaudi: refactor hard-reset related code
There is code related to hard-reset, which is done in gaudi specific
code. However, this code can be used by future ASICs and therefore it
is better to move it to the common code section.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21 10:21:51 +03:00
Ofir Bitton
6c31f494d8 habanalabs/gaudi: add support for NIC DERR
We add support for NIC DERR ECC error events, in case this error
is received a device reset will be performed.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21 10:21:28 +03:00
farah kassabri
3817b352aa habanalabs: add validity check for signal cs
In preparation for a new feature that allows the user to reserve
signals ahead of submissions, we need to change a current assumption
in the code.

Currently, the driver uses 2 SOBs to support signal CS. When the first
SOB reaches max value, the driver switches to the other one and assumes
that when it will need to switch back to the first one, all of the
signals have already been handled.

This assumption won't hold when the new feature will be added, because
using signal reservation, the driver can reach the max SOB value very
fast.

The change is to add a validity check when submitting a signal CS, to
make sure the previous SOB is available (all the signals attached to
it indeed finished).

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21 10:16:52 +03:00
Koby Elbaz
69dbbbadad habanalabs: get lower/upper 32 bits via masking
fix multiple similar occurrences of the following sparse warning:
'warning: cast truncates bits from constant value
(7ffc113000 becomes fc113000)'

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21 10:16:29 +03:00
Ofir Bitton
23bace677a habanalabs: allow reset upon device release
We introduce a new type of reset which is reset upon device release.
This reset is very similar to soft reset except the fact it is
performed only upon device release and not upon user sysfs request
nor TDR.

The purpose of this reset is to make sure the device is returned to
IDLE state after the current user has finished working with the device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21 10:14:34 +03:00
Richard Fitzgerald
3d3e88e336 soundwire: stream: Fix test for DP prepare complete
In sdw_prep_deprep_slave_ports(), after the wait_for_completion()
the DP prepare status register is read. If this indicates that the
port is now prepared, the code should continue with the port setup.
It is irrelevant whether the wait_for_completion() timed out if the
port is now ready.

The previous implementation would always fail if the
wait_for_completion() timed out, even if the port was reporting
successful prepare.

This patch also fixes a minor bug where the return from sdw_read()
was not checked for error - any error code with LSBits clear could
be misinterpreted as a successful port prepare.

Fixes: 79df15b7d3 ("soundwire: Add helpers for ports operations")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210618144745.30629-1-rf@opensource.cirrus.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-06-20 16:46:18 +05:30
Richard Fitzgerald
031e668bc1 soundwire: bus: Make sdw_nwrite() data pointer argument const
Idiomatically, write functions should take const pointers to the
data buffer, as they don't change the data. They are also likely
to be called from functions that receive a const data pointer.

Internally the pointer is passed to function/structs shared with
the read functions, requiring a cast, but this is an implementation
detail that should be hidden by the public API.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20210616145901.29402-1-rf@opensource.cirrus.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-06-20 16:46:14 +05:30
Yuri Nudelman
4d041216c8 debugfs: add skip_reset_on_timeout option
To be able to debug long-running CS better, without changing the
userspace code, we are adding a new option through debugfs interface
to skip the reset of the device in case of CS timeout.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:43 +03:00
Zvika Yehudai
38e19d0b87 habanalabs: fix typo
fix a spelling error in comment

Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Ofir Bitton
7d5ba005cf habanalabs/gaudi: correct driver events numbering
Currently driver sends fc interrupt id to FW instead of using
cpu interrupt id. We intend to fix that and keep backward
compatibility by using the same interrupt values.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Oded Gabbay
5bdc657320 habanalabs: remove a rogue #ifdef
There was a rogue #ifdef that crept into the upstream code for
backwards compatibility which isn't needed of course.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Ohad Sharabi
2718e1d322 habanalabs/gaudi: print last QM PQEs on error
In case QMAN has an error and stop_on_err is true, print specific
information of the "offending" command buffer batch.

If the error occurred on one of the higher CPs, the CQ pointer and size
will be printed along with (up to) last 8 PQEs of the stream.

If the error occurred in the lower CP, the CQ pointer and size will be
printed along with (up to) last 8 PQEs of ALL upper CPs as we have no
way to know which upper CP sent the job there.

This is done so higher SW levels will be able to debug their CS by
extracting the raw data of the offending command buffer batch and
examine those offline to detect the issue.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Koby Elbaz
f18cb6b58e habanalabs/goya: add '__force' attribute to suppress false alarm
fix (suppress) the following sparse warnings:
'warning: cast removes address space of expression'

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Yuri Nudelman
e307b302be habanalabs: added open_stats info ioctl
In a system with multiple ASICs, there is a need to provide monitoring
tools with information on how long a device was opened and how many
times a device was opened.

Therefore, we add a new opcode to the INFO ioctl to provide that
information.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Koby Elbaz
1f7ef4bf41 habanalabs/gaudi: set the correct rc in case of err
fix the following smatch warnings:
gaudi_internal_cb_pool_init() warn: missing error code 'rc'

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Tal Albo
ba662265fe habanalabs/gaudi: update coresight configuration
Update STMTCSR and STMSYNCR values in order to reduce amount of sync
packets

Signed-off-by: Tal Albo <talbo@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Koby Elbaz
f5eb7bf0c4 habanalabs: remove node from list before freeing the node
fix the following smatch warnings:

goya_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list

gaudi_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Koby Elbaz
11d5cb8b95 habanalabs: set rc as 'valid' in case of intentional func exit
fix the following smatch warnings:
hl_fw_static_init_cpu() warn: missing error code 'rc'

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Koby Elbaz
b538888c3e habanalabs: zero complex structures using memset
fix the following sparse warnings:
'warning: Using plain integer as NULL pointer'
'warning: missing braces around initializer'

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Tomer Tayar
f5d6e39eb2 habanalabs: print more info when failing to pin user memory
pin_user_pages_fast() might fail and return a negative number, or pin
less pages than requested and return the number of the pages that were
pinned.
For the latter, it is informative to print also the memory size and the
number of requested pages.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Christophe JAILLET
3002f467a0 habanalabs: Fix an error handling path in 'hl_pci_probe()'
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.

Fixes: 2e5eda4681 ("habanalabs: PCIe Advanced Error Reporting support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Oded Gabbay
c9d2f5cf27 habanalabs: print firmware versions
Firmware in habanalabs devices is composed of several components.
During device initialization, we read these versions from the device.
Print them during device initialization to allow better visibility in
automated systems.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Omer Shpigelman
4efb6b2b46 habanalabs: add hard reset timeout for PLDM
Hard reset flow on PLDM might take more than 2 minutes.
Hence add a dedicated hard reset timeout of 6 minutes for PLDM.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Bharat Jauhari
4b09901cf7 habanalabs: enable dram scramble before linux f/w
In current code, for dynamic f/w loading flow, DRAM scrambling is
enabled post Linux fit image is loaded to the card. This can cause the
device CPU to go into reset state.

The correct sequence should be:
1. Load boot fit image
2. Enable scrambling
3. Load Linux fit image

This commit aligns the DRAM scrambling enabling with the static f/w load
flow.

Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton
358526be82 habanalabs: enable stop on error for all QMANs and engines
If there is an error in the QMAN/engine, there is no point of trying
to continue running the workload. It is better to stop to allow the
user to debug the program.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ohad Sharabi
e1222c2794 habanalabs: report EQ fault during heartbeat
In case we have EQ fault we would like to know about it.
For this, a status bitmask was added in which EQ_FAULT bit is
set by FW in case of EQ fault.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Koby Elbaz
12d133deb3 habanalabs: small code refactoring
Use datatype defines instead of hard coded values,
and rename set_fixed_properties function.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Oded Gabbay
f1a29770b2 habanalabs/gaudi: use standard error codes
When there is an ECC error in the HBM, return a standard error code,
-EIO in this case, and not a positive value.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ohad Sharabi
0f37510ca3 habanalabs: fix mask to obtain page offset
When converting virtual address to physical we need to add correct
offset to the physical page.

For this we need to use mask that include ALL bits of page offset.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ohad Sharabi
6a785e368a habanalabs: skip valid test for boot_dev_sts regs
Get rid of the need to check if boot_dev_sts is valid on every access
to value read from these registers.

This is done by storing the register value in hdev props ONLY if
register is enabled.

This way if register is NOT enabled all capability bits will not be set.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton
84586de496 habanalabs: reset device upon FD close if not idle
If device is not idle after user closes the FD we must reset device
as next user that will try to open FD will encounter a non-functional
device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Yuri Nudelman
8e8125f192 habanalabs: add debug flag to prevent failure on timeout
Sometimes it is useful to allow the command to continue running despite
the timeout occurred, to differentiate between really stuck or just very
time consuming commands. This can be achieved by passing a new debug
flag alongside the cs, HL_CS_FLAGS_SKIP_RESET_ON_TIMEOUT.

Anyway, if the timeout occurred, a warning print shall be issued,
however this shall not fail the submission.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton
254fac6d1a habanalabs/gaudi: add FW alive event support
In order for driver to be aware of process or thread crashes inside
GAUDI's CPU, we introduce a new event which contains all relevant
information. Upon event reception, driver will dump information and
will reset the device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton
a39725819c habanalabs/gaudi: don't use disabled ports in collective wait
In the collective wait, we put jobs on the QMANs of all the NICs. The
code takes into account if a port is disabled only in case of PCI card.
When this info arrives from the f/w, the code doesn't take it into
account, and it tries to schedule jobs on NICs that aren't enabled and
thats a bug.

To fix this, after the f/w sends us the list of disabled ports, we
update the state of the QMANs according to that list. In addition,
we need to update the HW_CAP bits so the collective wait operation
will not try to use those QMANs. We also need to update the collective
master monitor mask.

Moreover, we need to add a protection for such future cases and in case
the user will try to submit work to those QMANs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Oded Gabbay
5a967fb3a7 habanalabs/gaudi: update to latest f/w specs
Update the firmware interface files to their latest version.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton
5bc691d849 habanalabs/gaudi: split host irq interfaces towards FW
Current implementation uses a single interrupt interface towards
FW, this interface is causing races between interrupt types.
We split this interface to interface per interrupt type.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Oded Gabbay
135ade0c6a habanalabs: prefer ASYNC device probing
There is no dependency when probing multiple devices so indicate to the
kernel that it can probe our devices in ASYNC fashion.

This shortens insmod of the driver from ~2 minutes to 20 seconds on
a system with 8 devices.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Tomer Tayar
ae151bcfab habanalabs/gaudi: add ARB to QM stop on error masks
Update the QM stop on error masks to also stop on ARB errors.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Oded Gabbay
9081021029 habanalabs/gaudi: don't use nic_ports_mask in compute
nic_ports_mask is used by the networking part of the driver.
In the compute part, we use the HW_CAP bits to select what is active
and what is not.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Koby Elbaz
b92c637c5f habanalabs/gaudi: set the correct cpu_id on MME2_QM failure
This fix was applied since there was an incorrect reported CPU ID to GIC
such that an error in MME2 QMAN aliased to be an arriving from DMA0_QM.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Oded Gabbay
a60d075c81 habanalabs/gaudi: refactor reset code
After all the latest changes to the reset code, there were some
redundancy and errors in the flows.

If the Linux FIT is loaded to the ASIC CPU, we need to communicate
with it only via GIC. If it is not loaded, we need to either use
COMMS protocol (for newer f/w) or MSG_TO_CPU register (for older f/w).

In addition, if we halted the device CPU then we need to mark that
the driver will do the reset, regardless of the capabilities.

Also, to prevent false errors, we need to keep track whether the
device CPU was already halted. If so, we shouldn't try to halt it
again.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Ohad Sharabi
4cb4508c86 habanalabs: track security status using positive logic
Using negative logic (i.e. fw_security_disabled) is confusing.

Modify the flag to use positive logic (fw_security_enabled).

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Koby Elbaz
4080308e33 habanalabs/gaudi: use COMMS to reset device / halt CPU
This is needed because legacy FW 'communication' protocol will soon
become obsolete.
Because COMMS is a boot protocol, communicating through it is supported
only until Linux is loaded to the device CPU, where in that case we
will fallback to the former implementation.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Koby Elbaz
3649eaea27 habanalabs/gaudi: disable GIC usage if security is enabled
Security is set based on PCI ID, and after reading preboot status bits.
GIC usage is set in both scenarios since GIC can't be used when security
is enabled.
Moreover, writing to GIC/SP is enabled only after Linux is fully loaded.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Koby Elbaz
7feffb6815 habanalabs: read preboot status bits in an earlier stage
On newer releases, host won't be able to trigger an interrupt directly
to the ASIC GIC controller.
To be able to decide whether GIC can/not be used, we must read device's
preboot status bits in a stage that precedes the possible first use of
GIC (when device is in dirty state).

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00