PC-Hänger mit AER-Fehlern

Hast Du Probleme mit Hardware, die durch die anderen Foren nicht abgedeckt werden? Schau auch in den "Tipps und Tricks"-Bereich.
Antworten
rhHeini
Beiträge: 2284
Registriert: 20.04.2006 20:44:10

PC-Hänger mit AER-Fehlern

Beitrag von rhHeini » 06.07.2020 19:44:57

Als ich heute nach eine Pause wieder zu meinem PC kam, hing dieser (voll im Idle, FF mit ein paar offen Seiten und Evolution waren gestartet). Bildschirm ist schwarz/im Powersave, Tastatur und Maus tot. Musste ihn über den Power-Knopf abschalten zum Rebooten. Ist nicht das erste Vorkommnis, passiert vielleicht alle 3 Monate einmal. Jetzt hab ich etwas im Syslog gefunden.

AMD/ASUS X470-Pro mit Ryzen 7 2700X, RX570 Graphikkarte, 32GByte RAM, nvme-SSD.
Das ganze auf Devuan Beowulf = Buster ohne Systemd, Mate Desktop, Kernel 5.6 aus Backports.

Im Syslog hab ich folgendes gefunden:

Code: Alles auswählen

Jul  6 10:04:12 rh050 kernel: [ 6791.867957] pcieport 0000:00:03.1: AER: Uncorrected (Non-Fatal) error received: 0000:00:00.0
Jul  6 10:04:12 rh050 kernel: [ 6791.867962] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Jul  6 10:04:12 rh050 kernel: [ 6791.867966] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00200000/04400000
Jul  6 10:04:12 rh050 kernel: [ 6791.867968] pcieport 0000:00:03.1: AER:    [21] ACSViol                (First)
Jul  6 10:04:12 rh050 kernel: [ 6791.867973] amdgpu 0000:0c:00.0: AER: can't recover (no error_detected callback)
Jul  6 10:04:12 rh050 kernel: [ 6791.867975] snd_hda_intel 0000:0c:00.1: AER: can't recover (no error_detected callback)
Jul  6 10:04:12 rh050 kernel: [ 6791.867999] pcieport 0000:00:03.1: AER: device recovery failed
Jul  6 10:04:18 rh050 kernel: [ 6798.237631] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Jul  6 10:04:18 rh050 kernel: [ 6798.237639] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Jul  6 10:04:18 rh050 kernel: [ 6798.237643] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00200000/04400000
Jul  6 10:04:18 rh050 kernel: [ 6798.237646] pcieport 0000:00:03.1: AER:    [21] ACSViol                (First)
Jul  6 10:04:18 rh050 kernel: [ 6798.237649] amdgpu 0000:0c:00.0: AER: can't recover (no error_detected callback)
Jul  6 10:04:18 rh050 kernel: [ 6798.237651] snd_hda_intel 0000:0c:00.1: AER: can't recover (no error_detected callback)
Jul  6 10:04:18 rh050 kernel: [ 6798.237675] pcieport 0000:00:03.1: AER: device recovery failed
Jul  6 10:04:22 rh050 kernel: [ 6802.005538] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=97712, emitted seq=97713
Jul  6 10:04:22 rh050 kernel: [ 6802.005622] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 4199 thread Xorg:cs0 pid 4398
Jul  6 10:04:22 rh050 kernel: [ 6802.005629] amdgpu 0000:0c:00.0: GPU reset begin!
Jul  6 10:04:22 rh050 kernel: [ 6802.494530] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jul  6 10:04:22 rh050 kernel: [ 6802.494606] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jul  6 10:04:23 rh050 kernel: [ 6802.769119] cp is busy, skip halt cp
Jul  6 10:04:23 rh050 kernel: [ 6803.043292] rlc is busy, skip halt rlc
Jul  6 10:04:23 rh050 kernel: [ 6803.044306] amdgpu 0000:0c:00.0: GPU BACO reset
Jul  6 10:04:23 rh050 kernel: [ 6803.347265] amdgpu 0000:0c:00.0: GPU reset succeeded, trying to resume
Jul  6 10:04:23 rh050 kernel: [ 6803.349052] [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
Jul  6 10:04:23 rh050 kernel: [ 6803.349064] [drm] VRAM is lost due to GPU reset!
Jul  6 10:04:24 rh050 kernel: [ 6803.675372] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Jul  6 10:04:24 rh050 kernel: [ 6803.675439] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
Jul  6 10:04:24 rh050 kernel: [ 6803.675473] amdgpu 0000:0c:00.0: GPU reset(2) failed
Jul  6 10:04:24 rh050 kernel: [ 6803.675513] amdgpu 0000:0c:00.0: GPU reset end with ret = -110
Jul  6 10:04:34 rh050 kernel: [ 6813.771770] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

Code: Alles auswählen

# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge

00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge

00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
01:00.0 Non-Volatile memory controller: Lite-On Technology Corporation Device 23f1 (rev 01)
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43d0 (rev 01)
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43c8 (rev 01)
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c6 (rev 01)
03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
03:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
03:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43c7 (rev 01)
06:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03)
07:04.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
07:04.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
07:04.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
08:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
09:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
0a:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev ef)
0c:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580]
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
0d:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
0d:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] USB 3.0 Host controller
0e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455
0e:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
0e:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
Ich gehe davon aus das die abgesetzte GPP Bridge oder die Graphikkarte der Verursacher ist.

Was kann ich tun um so etwas zu verhindern? Diese AERs zu unterdrücken scheint mir kein guter Ansatz zu sein, da ich uncorrectable errors habe.

Rolf

Benutzeravatar
Blackbox
Beiträge: 4289
Registriert: 17.09.2008 17:01:20
Lizenz eigener Beiträge: GNU Free Documentation License

Re: PC-Hänger mit AER-Fehlern

Beitrag von Blackbox » 07.07.2020 13:36:14

rhHeini hat geschrieben: ↑ zum Beitrag ↑
06.07.2020 19:44:57
Das ganze auf Devuan Beowulf
Vielleicht ist das Devuan Forum [0] die bessere Adresse für dich?

[0] https://dev1galaxy.org/viewforum.php?id=5
Eigenbau PC: Debian Sid - Kernel: 6.5.13 - Xfce 4.18 mit sway
Desktop PC: Dell Inspiron 530 - Debian Sid - Kernel: 6.5.13 - Xfce 4.18 mit sway
Notebook: TUXEDO BU1406 - Debian Sid - Kernel: 6.5.13 - Xfce 4.18 mit sway
Alles Minimalinstallationen und ohne sudo/PA/PW.
Rootserver: Rocky Linux 9.3 - Kernel: 5.14

Freie Software unterstützen, Grundrechte stärken!

Antworten