summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2023-08-16drm/amd: flush any delayed gfxoff on suspend entryMario Limonciello
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity is happening during entry. This is because GFXOFF was scheduled as delayed but RLC gets disabled in s2idle entry sequence which will hang GFX IP if not already in GFXOFF. To help this problem, flush any delayed work for GFXOFF early in s2idle entry sequence to ensure that it's off when RLC is changed. commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during suspend without delay") modified power gating flow so that if called in s0ix that it ensured that GFXOFF wasn't put in work queue but instead processed immediately. This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly called as part of the suspend entry code. Remove that dead code. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-16drm/amdgpu: skip fence GFX interrupts disable/enable for S0ixTim Huang
GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are disabled when S0ix suspend but fails to re-enable when resume because of the GFX is in GFXOFF. [ 203.349571] [drm] Fence fallback timer expired on ring sdma0 [ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0 [ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0 For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers to configure the fence driver interrupts for rings that belong to GFX. The interrupts configuration will be restored by GFXOFF exit. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-16drm/amdgpu: skip xcp drm device allocation when out of drm resourceJames Zhu
Return 0 when drm device alloc failed with -ENOSPC in order to allow amdgpu drive loading. But the xcp without drm device node assigned won't be visiable in user space. This helps amdgpu driver loading on system which has more than 64 nodes, the current limitation. The proposal to add more drm nodes is discussed in public, which will support up to 2^20 nodes totally. kernel drm: https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/ libdrm: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16drm/amdgpu: disable mcbp if parameter zero is setJiadong Zhu
The parameter amdgpu_mcbp shall have priority against the default value calculated from the chip version. User could disable mcbp by setting the parameter mcbp as zero. v2: do not trigger preemption in sw ring muxer when mcbp is disabled. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-09drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOVAlex Deucher
This is only required for SR-IOV world switches, but it adds additional latency leading to reduced performance in some benchmarks. Disable for now on bare metal. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-09drm/amdgpu: fix possible UAF in amdgpu_cs_pass1()Alex Deucher
Since the gang_size check is outside of chunk parsing loop, we need to reset i before we free the chunk data. Suggested by Ye Zhang (@VAR10CK) of Baidu Security. Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-08-09drm/amdgpu: Match against exact bootloader statusLijo Lazar
On PSP v13.x ASICs, boot loader will set only the MSB to 1 and clear the least significant bits for any command submission. Hence match against the exact register value, otherwise a register value of all 0xFFs also could falsely indicate that boot loader is ready. Also, from PSP v13.0.6 and newer, bits[7:0] will be used to indicate command error status. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-09drm/amd: Disable S/G for APUs when 64GB or more host memoryMario Limonciello
Users report a white flickering screen on multiple systems that is tied to having 64GB or more memory. When S/G is enabled pages will get pinned to both VRAM carve out and system RAM leading to this. Until it can be fixed properly, disable S/G when 64GB of memory or more is detected. This will force pages to be pinned into VRAM. This should fix white screen flickers but if VRAM pressure is encountered may lead to black screens. It's a trade-off for now. Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: <stable@vger.kernel.org> # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter") Cc: <stable@vger.kernel.org> # 6.4.y Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Restore HQD persistent state registerLijo Lazar
On GFX v9.4.3, compute queue MQD is populated using the values in HQD persistent state register. Hence don't clear the values on module unload, instead restore it to the default reset value so that MQD is initialized correctly during next module load. In particular, preload flag needs to be set on compute queue MQD, otherwise it could cause uninitialized values being used at device reset state resulting in EDC. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amd: Fix an error handling mistake in psp_sw_init()Mario Limonciello
If the second call to amdgpu_bo_create_kernel() fails, the memory allocated from the first call should be cleared. If the third call fails, the memory from the second call should be cleared. Fixes: b95b5391684b ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Fix infinite loop in gfxhub_v1_2_xcc_gart_enable (v2)Victor Lu
An instance of for_each_inst() was not changed to match its new behaviour and is causing a loop. v2: remove tmp_mask variable Fixes: b579ea632fca ("drm/amdgpu: Modify for_each_inst macro") Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: use a macro to define no xcp partition caseGuchun Chen
~0 as no xcp partition is used in several places, so improve its definition by a macro for code consistency. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu/vm: use the same xcp_id from root PDGuchun Chen
Other PDs/PTs allocation should just use the same xcp_id as that stored in root PD. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_createGuchun Chen
Recent code set xcp_id stored from file private data when opening device to amdgpu bo for accounting memory usage etc, but not all VMs are attached to this fpriv structure like the vm cases in amdgpu_mes_self_test, otherwise, KASAN will complain below out of bound access. And more importantly, VM code should not touch fpriv structure, so drop fpriv code handling from amdgpu_vm_pt. [ 77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069 [ 77.294146] Call Trace: [ 77.294178] <TASK> [ 77.294208] dump_stack_lvl+0x49/0x63 [ 77.294260] print_report+0x16f/0x4a6 [ 77.294307] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.295979] ? kasan_complete_mode_report_info+0x3c/0x200 [ 77.296057] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.297556] kasan_report+0xb4/0x130 [ 77.297609] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.299202] __asan_load4+0x6f/0x90 [ 77.299272] amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.300796] ? amdgpu_init+0x6e/0x1000 [amdgpu] [ 77.302222] ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu] [ 77.303721] ? preempt_count_sub+0x18/0xc0 [ 77.303786] amdgpu_vm_init+0x39e/0x870 [amdgpu] [ 77.305186] ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu] [ 77.306683] ? kasan_set_track+0x25/0x30 [ 77.306737] ? kasan_save_alloc_info+0x1b/0x30 [ 77.306795] ? __kasan_kmalloc+0x87/0xa0 [ 77.306852] amdgpu_mes_self_test+0x169/0x620 [amdgpu] v2: without specifying xcp partition for PD/PT bo, the xcp id is -1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686 Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: Allocate root PD on correct partitionGuchun Chen
file_priv needs to be setup firstly, otherwise, root PD will always be allocated on partition 0, even if opening the device from other partitions. Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: Allow the initramfs generator to include psp_13_0_6_taCandice Li
Allow the initramfs generator to automatically include psp_13_0_6_ta firmware to initramfs. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancelGuchun Chen
In below thousands of screen rotation loop tests with virtual display enabled, a CPU hard lockup issue may happen, leading system to unresponsive and crash. do { xrandr --output Virtual --rotate inverted xrandr --output Virtual --rotate right xrandr --output Virtual --rotate left xrandr --output Virtual --rotate normal } while (1); NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 ? hrtimer_run_softirq+0x140/0x140 ? store_vblank+0xe0/0xe0 [drm] hrtimer_cancel+0x15/0x30 amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu] drm_vblank_disable_and_save+0x185/0x1f0 [drm] drm_crtc_vblank_off+0x159/0x4c0 [drm] ? record_print_text.cold+0x11/0x11 ? wait_for_completion_timeout+0x232/0x280 ? drm_crtc_wait_one_vblank+0x40/0x40 [drm] ? bit_wait_io_timeout+0xe0/0xe0 ? wait_for_completion_interruptible+0x1d7/0x320 ? mutex_unlock+0x81/0xd0 amdgpu_vkms_crtc_atomic_disable It's caused by a stuck in lock dependency in such scenario on different CPUs. CPU1 CPU2 drm_crtc_vblank_off hrtimer_interrupt grab event_lock (irq disabled) __hrtimer_run_queues grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate amdgpu_vkms_disable_vblank drm_handle_vblank hrtimer_cancel grab dev->event_lock So CPU1 stucks in hrtimer_cancel as timer callback is running endless on current clock base, as that timer queue on CPU2 has no chance to finish it because of failing to hold the lock. So NMI watchdog will throw the errors after its threshold, and all later CPUs are impacted/blocked. So use hrtimer_try_to_cancel to fix this, as disable_vblank callback does not need to wait the handler to finish. And also it's not necessary to check the return value of hrtimer_try_to_cancel, because even if it's -1 which means current timer callback is running, it will be reprogrammed in hrtimer_start with calling enable_vblank to make it works. v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes tag as well v3: drop warn printing (Christian) v4: drop superfluous check of blank->enabled in timer function, as it's guaranteed in drm_handle_vblank (Christian) Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)") Cc: stable@vger.kernel.org Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-14Merge tag 'amd-drm-fixes-6.5-2023-07-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.5-2023-07-12: amdgpu: - SMU i2c locking fix - Fix a possible deadlock in process restoration for ROCm apps - Disable PCIe lane/speed switching on Intel platforms (the platforms don't support it) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230712184009.7740-1-alexander.deucher@amd.com
2023-07-12drm/amd: Move helper for dynamic speed switch check out of smu13Mario Limonciello
This helper is used for checking if the connected host supports the feature, it can be moved into generic code to be used by other smu implementations as well. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-07-12drm/amdgpu: avoid restore process run into dead loop.gaba
In restore process worker, pinned BO cause update PTE fail, then the function re-schedule the restore_work. This will generate dead loop. Signed-off-by: gaba <gaba@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-07-06Merge tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Lots of fixes, mostly i915 and amdgpu. It's two weeks of i915, and I think three weeks of amdgpu. fbdev: - Fix module infos on sparc panel: - Fix mode on Starry-ili9882t i915: - Allow DC states along with PW2 only for PWB functionality [adlp+] - Fix SSC selection for MPLLA [mtl] - Use hw.adjusted mode when calculating io/fast wake times [psr] - Apply min softlimit correctly [guc/slpc] - Assign correct hdcp content type [hdcp] - Add missing forward declarations/includes to display power headers - Fix BDW PSR AUX CH data register offsets [psr] - Use mock device info for creating mock device amdgpu: - Misc cleanups - GFX 9.4.3 fixes - DEBUGFS build fix - Fix LPDDR5 reporting - ASPM fixes - DCN 3.1.4 fixes - DP MST fixes - DCN 3.2.x fixes - Display PSR TCON fixes - SMU 13.x fixes - RAS fixes - Vega12/20 SMU fixes - PSP flashing cleanup - GFX9 MCBP fixes - SR-IOV fixes - GPUVM clear mappings fix for always valid BOs - Add FAMS quirk for problematic monitor - Fix possible UAF - Better handle monentary temperature fluctuations - SDMA 4.4.2 fixes - Fencing fix" * tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm: (83 commits) drm/i915: use mock device info for creating mock device drm/i915/psr: Fix BDW PSR AUX CH data register offsets drm/amdgpu: Fix potential fence use-after-free v2 drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation drm/amd/pm: expose swctf threshold setting for legacy powerplay drm/amd/display: 3.2.241 drm/amd/display: Take full update path if number of planes changed drm/amd/display: Create debugging mechanism for Gaming FAMS drm/amd/display: Add monitor specific edid quirk drm/amd/display: For new fast update path, loop through each surface drm/amd/display: Remove Phantom Pipe Check When Calculating K1 and K2 drm/amd/display: Limit new fast update path to addr and gamma / color drm/amd/display: Fix the delta clamping for shaper LUT drm/amdgpu: Keep non-psp path for partition switch drm/amd/display: program DPP shaper and 3D LUT if updated Revert "drm/amd/display: edp do not add non-edid timings" drm/amdgpu: share drm device for pci amdgpu device with 1st partition device drm/amd/pm: Add GFX v9.4.3 unique id to sysfs drm/amd/pm: Enable pp_feature attribute drm/amdgpu/vcn: Need to unpause dpg before stop dpg ...
2023-06-30drm/amdgpu: Fix potential fence use-after-free v2shanzhulig
fence Decrements the reference count before exiting. Avoid Race Vulnerabilities for fence use-after-free. v2 (chk): actually fix the use after free and not just move it. Signed-off-by: shanzhulig <shanzhulig@gmail.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amd/pm: avoid unintentional shutdown due to temperature momentary ↵Evan Quan
fluctuation An intentional delay is added on soft ctf triggered. Then there will be a double check for the GPU temperature before taking further action. This can avoid unintended shutdown due to temperature momentary fluctuation. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: Keep non-psp path for partition switchLijo Lazar
When PSP block is not present, use direct programming. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: share drm device for pci amdgpu device with 1st partition deviceJames Zhu
To save render node resoure, share drm device setting for pci amdgpu device with 1st XCP partition device. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu/vcn: Need to unpause dpg before stop dpgEmily Deng
Need to unpause dpg first, or it will hit follow error during stop dpg: "[drm] Register(1) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n" Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: remove duplicated doorbell range init for sdma v4.4.2Le Ma
Handled in earlier phase Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: gpu recovers from fatal error in poison modeYiPeng Chai
Fatal error occurs in ras poison mode, mode1 reset is used to recover gpu. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: enable mcbp by default on gfx9Alex Deucher
It's required for high priority queues. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535 Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: make mcbp a per device settingAlex Deucher
So we can selectively enable it on certain devices. No intended functional change. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amd: Don't initialize PSP twice for Navi3xMario Limonciello
PSP functions are already set by psp_early_init() so initializing them a second time is unnecessary. No intended functional changes. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: port SRIOV VF missed changesZhigang Luo
port SRIOV VF missed changes from gfx_v9_0 to gfx_v9_4_3. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amd: Don't try to enable secure display TA multiple timesMario Limonciello
If the securedisplay TA failed to load the first time, it's unlikely to work again after a suspend/resume cycle or reset cycle and it appears to be causing problems in futher attempts. Fixes: e42dfa66d592 ("drm/amdgpu: Add secure display TA load for Renoir") Reported-by: Filip Hejsek <filip.hejsek@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2633 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: fix number of fence calculationsChristian König
Since adding gang submit we need to take the gang size into account while reserving fences. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 4624459c84d7 ("drm/amdgpu: add gang submit frontend v6") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: Modify for_each_inst macroLijo Lazar
Modify it such that it doesn't change the instance mask parameter. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu:Remove sdma halt/unhalt during frontdoor loadMangesh Gadre
sdma halt/unhalt is performed by psp when frontdoor loading used,so this can be skipped. v2: Instead of removing halt/unhalt completely, driver will do it only during backdoor load. Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: check RAS irq existence for VCN/JPEGTao Zhou
No RAS irq is allowed. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30drm/amdgpu: remove vm sanity check from amdgpu_vm_make_computeXiaogang Chen
Since we allow kfd and graphic operate on same GPU VM to have interoperation between them GPU VM may have been used by graphic vm operations before kfd turns a GPU VM into a compute VM. Remove vm clean checking at amdgpu_vm_make_compute. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-29Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "There is one set of patches to misc for a i915 gsc/mei proxy driver. Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots of refactoring. core: - replace strlcpy with strscpy - EDID changes to support further conversion to struct drm_edid - Move i915 DSC parameter code to common DRM helpers - Add Colorspace functionality aperture: - ignore framebuffers with non-primary devices fbdev: - use fbdev i/o helpers - add Kconfig options for fb_ops helpers - use new fb io helpers directly in drivers sysfs: - export DRM connector ID scheduler: - Avoid an infinite loop ttm: - store function table in .rodata - Add query for TTM mem limit - Add NUMA awareness to pools - Export ttm_pool_fini() bridge: - fsl-ldb: support i.MX6SX - lt9211, lt9611: remove blanking packets - tc358768: implement input bus formats, devm cleanups - ti-snd65dsi86: implement wait_hpd_asserted - analogix: fix endless probe loop - samsung-dsim: support swapped clock, fix enabling, support var clock - display-connector: Add support for external power supply - imx: Fix module linking - tc358762: Support reset GPIO panel: - nt36523: Support Lenovo J606F - st7703: Support Anbernic RG353V-V2 - InnoLux G070ACE-L01 support - boe-tv101wum-nl6: Improve initialization - sharp-ls043t1le001: Mode fixes - simple: BOE EV121WXM-N10-1850, S6D7AA0 - Ampire AM-800480L1TMQW-T00H - Rocktech RK043FN48H - Starry himax83102-j02 - Starry ili9882t amdgpu: - add new ctx query flag to handle reset better - add new query/set shadow buffer for rdna3 - DCN 3.2/3.1.x/3.0.x updates - Enable DC_FP on loongarch - PCIe fix for RDNA2 - improve DC FAMS/SubVP support for better power management - partition support for lots of engines - Take NUMA into account when allocating memory - Add new DRM_AMDGPU_WERROR config parameter to help with CI - Initial SMU13 overdrive support - Add support for new colorspace KMS API - W=1 fixes amdkfd: - Query TTM mem limit rather than hardcoding it - GC 9.4.3 partition support - Handle NUMA for partitions - Add debugger interface for enabling gdb - Add KFD event age tracking radeon: - Fix possible UAF i915: - new getparam for PXP support - GSC/MEI proxy driver - Meteorlake display enablement - avoid clearing preallocated framebuffers with TTM - implement framebuffer mmap support - Disable sampler indirect state in bindless heap - Enable fdinfo for GuC backends - GuC loading and firmware table handling fixes - Various refactors for multi-tile enablement - Define MOCS and PAT tables for MTL - GSC/MEI support for Meteorlake - PMU multi-tile support - Large driver kernel doc cleanup - Allow VRR toggling and arbitrary refresh rates - Support async flips on linear buffers on display ver 12+ - Expose CRTC CTM property on ILK/SNB/VLV - New debugfs for display clock frequencies - Hotplug refactoring - Display refactoring - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake - Use large rings for compute contexts - HuC loading for MTL - Allow user to set cache at BO creation - MTL powermanagement enhancements - Switch to dedicated workqueues to stop using flush_scheduled_work() - Move display runtime init under display/ - Remove 10bit gamma on desktop gen3 parts, they don't support it habanalabs: - uapi: return 0 for user queries if there was a h/w or f/w error - Add pci health check when we lose connection with the firmware. This can be used to distinguish between pci link down and firmware getting stuck. - Add more info to the error print when TPC interrupt occur. - Firmware fixes msm: - Adreno A660 bindings - SM8350 MDSS bindings fix - Added support for DPU on sm6350 and sm6375 platforms - Implemented tearcheck support to support vsync on SM150 and newer platforms - Enabled missing features (DSPP, DSC, split display) on sc8180x, sc8280xp, sm8450 - Added support for DSI and 28nm DSI PHY on MSM8226 platform - Added support for DSI on sm6350 and sm6375 platforms - Added support for display controller on MSM8226 platform - A690 GPU support - Move cmdstream dumping out of fence signaling path - a610 support - Support for a6xx devices without GMU nouveau: - NULL ptr before deref fixes armada: - implement fbdev emulation as client sun4i: - fix mipi-dsi dotclock - release clocks vc4: - rgb range toggle property - BT601 / BT2020 HDMI support vkms: - convert to drmm helpers - add reflection and rotation support - fix rgb565 conversion gma500: - fix iomem access shmobile: - support renesas soc platform - enable fbdev mxsfb: - Add support for i.MX93 LCDIF stm: - dsi: Use devm_ helper - ltdc: Fix potential invalid pointer deref renesas: - Group drivers in renesas subdirectory to prepare for new platform - Drop deprecated R-Car H3 ES1.x support meson: - Add support for MIPI DSI displays virtio: - add sync object support mediatek: - Add display binding document for MT6795" * tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits) drm/i915: Fix a NULL vs IS_ERR() bug drm/i915: make i915_drm_client_fdinfo() reference conditional again drm/i915/huc: Fix missing error code in intel_huc_init() drm/i915/gsc: take a wakeref for the proxy-init-completion check drm/msm/a6xx: Add A610 speedbin support drm/msm/a6xx: Add A619_holi speedbin support drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching drm/msm/a6xx: Use "else if" in GPU speedbin rev matching drm/msm/a6xx: Fix some A619 tunables drm/msm/a6xx: Add A610 support drm/msm/a6xx: Add support for A619_holi drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations drm/msm/a6xx: Introduce GMU wrapper support drm/msm/a6xx: Move CX GMU power counter enablement to hw_init drm/msm/a6xx: Extend and explain UBWC config drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init drm/msm/a6xx: Add a helper for software-resetting the GPU drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions() drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off() ...
2023-06-27Merge tag 'hardening-v6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "There are three areas of note: A bunch of strlcpy()->strscpy() conversions ended up living in my tree since they were either Acked by maintainers for me to carry, or got ignored for multiple weeks (and were trivial changes). The compiler option '-fstrict-flex-arrays=3' has been enabled globally, and has been in -next for the entire devel cycle. This changes compiler diagnostics (though mainly just -Warray-bounds which is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_ coverage. In other words, there are no new restrictions, just potentially new warnings. Any new FORTIFY warnings we've seen have been fixed (usually in their respective subsystem trees). For more details, see commit df8fc4e934c12b. The under-development compiler attribute __counted_by has been added so that we can start annotating flexible array members with their associated structure member that tracks the count of flexible array elements at run-time. It is possible (likely?) that the exact syntax of the attribute will change before it is finalized, but GCC and Clang are working together to sort it out. Any changes can be made to the macro while we continue to add annotations. As an example of that last case, I have a treewide commit waiting with such annotations found via Coccinelle: https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b Also see commit dd06e72e68bcb4 for more details. Summary: - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko) - Convert strreplace() to return string start (Andy Shevchenko) - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook) - Add missing function prototypes seen with W=1 (Arnd Bergmann) - Fix strscpy() kerndoc typo (Arne Welzel) - Replace strlcpy() with strscpy() across many subsystems which were either Acked by respective maintainers or were trivial changes that went ignored for multiple weeks (Azeem Shaikh) - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers) - Add KUnit tests for strcat()-family - Enable KUnit tests of FORTIFY wrappers under UML - Add more complete FORTIFY protections for strlcat() - Add missed disabling of FORTIFY for all arch purgatories. - Enable -fstrict-flex-arrays=3 globally - Tightening UBSAN_BOUNDS when using GCC - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays - Improve use of const variables in FORTIFY - Add requested struct_size_t() helper for types not pointers - Add __counted_by macro for annotating flexible array size members" * tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits) netfilter: ipset: Replace strlcpy with strscpy uml: Replace strlcpy with strscpy um: Use HOST_DIR for mrproper kallsyms: Replace all non-returning strlcpy with strscpy sh: Replace all non-returning strlcpy with strscpy of/flattree: Replace all non-returning strlcpy with strscpy sparc64: Replace all non-returning strlcpy with strscpy Hexagon: Replace all non-returning strlcpy with strscpy kobject: Use return value of strreplace() lib/string_helpers: Change returned value of the strreplace() jbd2: Avoid printing outside the boundary of the buffer checkpatch: Check for 0-length and 1-element arrays riscv/purgatory: Do not use fortified string functions s390/purgatory: Do not use fortified string functions x86/purgatory: Do not use fortified string functions acpi: Replace struct acpi_table_slit 1-element array with flex-array clocksource: Replace all non-returning strlcpy with strscpy string: use __builtin_memcpy() in strlcpy/strlcat staging: most: Replace all non-returning strlcpy with strscpy drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy ...
2023-06-23drm/amd/pm: update the LC_L1_INACTIVITY setting to address possible noise issueEvan Quan
It is proved that insufficient LC_L1_INACTIVITY setting can cause audio noise on some platform. With the LC_L1_INACTIVITY increased to 4ms, the issue can be resolved. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: Skip TMR for MP0_HWIP 13.0.6Zhigang Luo
For SRIOV VF, no TMR needed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amd/pm: revise the ASPM settings for thunderbolt attached scenarioEvan Quan
Also, correct the comment for NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT as 0x0000000E stands for 400ms instead of 4ms. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: fix clearing mappings for BOs that are always valid in VMSamuel Pitoiset
Per VM BOs must be marked as moved or otherwise their ranges are not updated on use which might be necessary when the replace operation splits mappings. This fixes random GPU hangs when replacing sparse mappings from the userspace, while OP_MAP/OP_UNMAP works fine because always valid BOs are correctly handled there. Cc: stable@vger.kernel.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: Add vbios attribute only if supportedLijo Lazar
Not all devices carry VBIOS version information. Add the device attribute only if supported. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu/atomfirmware: fix LPDDR5 width reportingAlex Deucher
LPDDR5 channels are 32 bit rather than 64, report the width properly in the log. v2: Only LPDDR5 are 32 bits per channel. DDR5 is 64 bits per channel Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2468 Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: Remove CONFIG_DEBUG_FS guard around body of ↵Nathan Chancellor
amdgpu_rap_debugfs_init() After commit 8020f0f9316b ("drm/amd/amdgpu: enable W=1 for amdgpu"), there is an instance of -Wunused-const-variable when CONFIG_DEBUG_FS is disabled: drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c:110:37: error: unused variable 'amdgpu_rap_debugfs_ops' [-Werror,-Wunused-const-variable] 110 | static const struct file_operations amdgpu_rap_debugfs_ops = { | ^ 1 error generated. There is no reason for the body of this function to be guarded when CONFIG_DEBUG_FS is disabled, as debugfs_create_file() is a stub that just returns an error pointer in that situation. Remove the preprocessor guards so that the variable never appears unused, while not changing anything at run time. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: Move calculation of xcp per memory nodeLijo Lazar
Its value is required for finding the memory id of xcp. Fixes: d26ea1b346e7 ("drm/amdgpu: Add xcp manager num_xcp_per_mem_partition") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23drm/amdgpu: Skip mark offset for high priority ringsJiadong Zhu
Only low priority rings are using chunks to save the offset. Bypass the mark offset callings from high priority rings. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Increase hmm range get pages timeoutPhilip Yang
If hmm_range_fault returns -EBUSY, we should call hmm_range_fault again to validate the remaining pages. On one system with NUMA auto balancing enabled, hmm_range_fault takes 6 seconds for 1GB range because CPU migrate the range one page at a time. To be safe, increase timeout value to 1 second for 128MB range. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>