This patch does some re-org on the kernel_queue structure. It takes out
all the function pointers from the structure and puts them in a new structure,
called kernel_queue_ops. Then, it puts an instance of that structure
inside kernel_queue.
This re-org is done to prepare the KQ module to support more than one AMD APU
(Kaveri).
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch starts to add support for the VI APU in the DQM module.
Because most (more than 90%) of the DQM code is shared among AMD's APUs, we
chose a design that performs most/all the code in the shared DQM file
(kfd_device_queue_manager.c). If there is H/W specific code to be executed,
than it is written in an asic-specific extension function for that H/W.
That asic-specific extension function is called from the shared function at the
appropriate time. This requires that for every asic-specific extension function
that is implemented in a specific ASIC, there will be an equivalent
implementation in ALL ASICs, even if those implementations are just stubs.
That way we achieve:
- Maintainability: by having one copy of most of the code, we only need to
fix bugs at one locations
- Readability: very clear what is the shared code and what is done per ASIC
- Extensibility: very easy to add new H/W specific files/functions
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch does some re-org on the device_queue_manager structure. It takes out
all the function pointers from the structure and puts them in a new structure,
called device_queue_manager_ops. Then, it puts an instance of that structure
inside device_queue_manager.
This re-org is done to prepare the DQM module to support more than one AMD APU
(Kaveri).
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Instead of creating a BUG if trying to free a NULL GART sub-allocation object,
just return 0 (success).
This is done to mirror behavior of kfree.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch rewrites destroy_queue_nocpsch() as the current logic that is
implemented in the function is completely flawed.
This function is used only in non-HWS mode.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
The MQDs for CI and VI are different. Therefore, the MQD manager module need to
be H/W specific.
This patch splits the current MQD manager into three files:
- kfd_mqd_manager.c, which contains common functions and initializes the
specific mqd manager module according to the H/W
- kfd_mqd_manager_cik.c, which contains Kaveri specific functions. This is
basically the old kfd_mqd_manager.c
- kfd_mqd_manager_vi.c, which will contain VI specific functions. Currently it
is not implemented except for returning NULL on initialization.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a new property to kfd_device_info structure. That structure
holds information that is H/W specific.
The new property is called asic_family and its purpose is to distinguish
between different asic families in amdkfd operations, mainly in QCM (queue
control & management)
This patch also adds a new enum, to select different ASICs. We set the current
kfd_device_info instance as Kaveri and create a new instance which describes
the new AMD APU, codenamed 'Carrizo'.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
As the MQD types are common across all AMD GPUs/APUs, let's remove the CIK part
from the name.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds new fields to the queue_properties structure. The new fields
are relevant only for queues running on AMD GPU VI architecture.
The eop_ring_buffer_address and eop_ring_buffer_size describe an
end-of-pipe queue which is assigned to the MQD. In CI, the EOP queue was per
pipeline and in VI it is per queue.
The ctx_save_restore_area_address and ctx_save_restore_area_size describe a
memory area that is designated to allow the CP to do context save/restore in
mid-wave state.
This patch also modifies the set_queue_properties_from_user() (called from
kfd_ioctl_create_queue()) to check and copy those new parameters.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Because amdkfd will need to work both with radeon and amdgpu, don't include
header files that are in radeon's folder.
Instead, use the common amd include folder and move amdkfd specific defines to
amdkfd header files.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch creates a new file, cik_structs.h, and puts the cik_mqd and
cik_sdma_rlc_registers structures in that file.
The new file is placed in a common include folder under the drm/amd folder, so
it will be shared among all amd drm drivers.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch removes a call to kfd-->kgd interface function that is doing H/W
initialization. That function is moved into radeon to be part of the common
H/W initialization sequence. The interface function will be deleted.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch moves to radeon the initialization of compute vmid.
That initializations was done in kfd-->kgd interface, but doing it in radeon
as part of radeon's H/W initialization routines is more appropriate.
In addition, this simplifies the kfd-->kgd interface.
The patch removes the function from the interface file and from the interface
declaration file.
The function initializes memory apertures to fixed base/limit address and non
cached memory types.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch change the calls throughout the amdkfd driver from the old kfd-->kgd
interface to the new kfd gtt sa inside amdkfd
v2: change the new call in sdma code that appeared because of the sdma feature
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch changes the calls to allocate the gart memory for amdkfd from the
old interface (radeon_sa) to the new one (kfd_gtt_sa)
The new gart sub-allocator is initialized with chunk size equal to 512 bytes.
This is because the KV MQD is 512 Bytes and most of the sub-allocations are
MQDs.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch makes the gart's buffer size calculation more accurate. This buffer
is needed per GPU.
It takes into account maximum number of MQDs, runlist packets, kernel queues
and reserves 512KB for other misc allocations.
The total size is just shy of 4MB, for 32 processes and 128 queues per
process, which are the defaults for amdkfd kernel module parameters.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds new kfd gtt sub-allocator functions that service the amdkfd
driver when it wants to use gtt memory.
The sub-allocator uses a bitmap to handle the memory area that was transferred
to it during init. It divides the memory area into chunks, according to chunk
size parameter.
The allocation function will allocate contiguous chunks from that memory area,
according to the requested size. If the requested size is smaller than the
chunk size, a single chunk will be allocated.
v2: Do some more verifications on parameters that are passed into
kfd_gtt_sa_init()
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds new fields to kfd_dev struct that are necessary for the new kfd
gtt sa module
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds two new functions to the kfd-->kgd interface:
init_gtt_mem_allocation, which allocate a large enough buffer on the amdkfd
needs, such as mqds, hpds, kernel queue, fence and runlists. This function
is only called once per GPU device. The size of the allocated buffer is
based on the maximum number of HSA processes and maximum number of queues
per HSA process (two amdkfd kernel module parameters).
free_gtt_mem, which frees a buffer that was allocated on the gart aperture.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alexey Skidanov <Alexey.skidanov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch passes the correct queue type to pqm_create_queue() instead of a
fixed KFD_QUEUE_TYPE_COMPUTE type.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a check to the create queue ioctl path, which identifies SDMA
queue type that is sent by userspace.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds support for SDMA user-mode queues to the QCM - the Queue
management system that manages queues-per-device and queues-per-process.
v2: Remove calls to interface function that initializes sdma engines.
v3: Use the new names of some of the defines.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds support for SDMA mqd operations:
- init_mqd_sdma
- uninit_mqd_sdma
- load_mqd_sdma
- update_mqd_sdma
- destroy_mqd_sdma
- is_occupied_sdma
It also adds SDMA queue information to some private structures of amdkfd.
v3: Use the new names of some of the defines.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds three new functions to the kfd2kgd interface:
- hqd_sdma_load() - Loads SDMA mqd to a H/W SDMA hqd slot. Used only in no HWS
mode.
- hqd_sdma_is_occupied() - Checks if an SDMA hqd slot is occupied. Used only
in no HWS mode.
- hqd_sdma_destroy() - Destructs and preempts the SDMA queue assigned to
that SDMA hqd slot. Used only in no HWS mode.
These functions are needed to support SDMA queues scheduling when using no HWS
mode (used for debug or bring-up).
v2: Removed init_sdma_engines() from interface. Initialization is done in
radeon.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch splits the current kfd_get_process_device_data() to two
functions, one that specifically creates a pdd and another one which
just do lookup.
This is done to enhance the readability and maintainability of the code.
Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the number of watch points to the node capabilities in the
topology module
Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Since the user space may call open() more that once from the same process,
the aperture initialization should be moved from kfd_open()
Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch displays the firmware version of the microcode that is currently
running in the MEC.
This is needed for the HSA RT, so it could differentiate its behavior based on
fw version. e.g. workarounds for bugs in fw
v2: Send the KGD_ENGINE_MEC1 as a parameter to the get_fw_version()
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a new interface to the kfd-->kgd interface.
The new interface function retrieves the firmware version that is currently in
use by the MEC engine. The firmware was uploaded to the MEC engine by the kgd
(radeon).
v2: Added parameter of engine type to interface function
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch checks if the process that opens the /dev/kfd device is 32-bit
process. If so, it returns -EPERM and prints a warning message in dmesg.
This is done to prevent 32-bit user processes from using amdkfd, and hence, HSA
features.
AMD's HSA userspace stack will also support only 64-bit processes on Linux.
Reviewed-by: Alexey Skidanov <alexey.skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
In function acquire_packet_buffer() we may return -ENOMEM. In that case, we
should set the *buffer_ptr to NULL, so that calling functions which check the
*buffer_ptr value as a criteria for success, will know that
acquire_packet_buffer() failed.
Reviewed-by: Alexey Skidanov <alexey.skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
srcu callbacks are running in atomic context, we can't allocate using
__GFP_WAIT.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
All the bit operations (such as find_first_zero_bit()) read sizeof(long) bytes
at a time. If we allocated less than sizeof(long) bytes for the bitmask we
would be accessing invalid memory when working with the bitmask.
Change the allocator to allocate sizeof(long) multiples for the bitmask.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This is dead code. We don't need to unbind here, we can just return
directly.
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The mqds array members are not freed when dqm is uninitialized.
Reviewed-by: Ben Goz <Ben.Goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The call to kernel_queue_uninit(NULL) will trigger a BUG(), and also the
error code is incorrect.
Fixes: 45102048f7 ('amdkfd: Add process queue manager module')
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
There is a typo here so the errors from kfd_bind_process_to_device()
are not detected.
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch removes the dependency of amdkfd upon DRM_AMDGPU symbol in amdkfd's
Kconfig file.
This is done because amdgpu driver is not yet upstreamed and therefore,
DRM_AMDGPU symbol is not present in any Kconfig file.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch fixes a compilation error when using certain configuration by
including the file io.h in kfd_doorbell.c
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
amdkfd uses cpu_relax() in its sync_with_hw() function. Because cpu_relax() is
defined as 'REP; NOP' on x86_64, it will block the CPU from servicing
IOMMU PPR requests.
This may cause a deadlock, because sync_with_hw() won't be completed
until the PPR request has been served.
Therefore, we need to use schedule() instead of cpu_relax() as it is the
minimum requirement to allow other threads to execute.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
struct device_process_node was allocated during process registration but
not released at process deregistration.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch was done due to sparse warning. It changes the definition of
doorbell_ptr in queue_properties to be with __iomem attribute, so it would
match the type which the doorbell module functions are returning.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>