Server Reboot with Kernel Crash logged "Package temperature above threshold, cpu clock throtteld"

General support questions
Post Reply
alfredogotama
Posts: 2
Joined: 2020/05/06 07:55:59
Contact:

Server Reboot with Kernel Crash logged "Package temperature above threshold, cpu clock throtteld"

Post by alfredogotama » 2020/06/24 10:28:25

Hi guys,

Need your help to investigate what has caused this kernel crash:

Code: Select all

[6793867.206403] CPU41: Package temperature above threshold, cpu clock throttled (total events = 266864)
[6793867.206405] CPU85: Package temperature above threshold, cpu clock throttled (total events = 266871)
[6793867.206407] CPU39: Package temperature above threshold, cpu clock throttled (total events = 266867)
[6793867.206409] CPU79: Package temperature above threshold, cpu clock throttled (total events = 266871)
[6793867.206411] CPU35: Package temperature above threshold, cpu clock throttled (total events = 266871)
[6793867.206413] CPU26: Core temperature above threshold, cpu clock throttled (total events = 241152)
[6793867.206415] CPU70: Core temperature above threshold, cpu clock throttled (total events = 241550)
[6793867.206417] CPU24: Package temperature above threshold, cpu clock throttled (total events = 266866)
[6793867.206419] CPU68: Package temperature above threshold, cpu clock throttled (total events = 266854)
[6793867.206421] CPU80: Package temperature above threshold, cpu clock throttled (total events = 266873)
[6793867.206423] CPU36: Package temperature above threshold, cpu clock throttled (total events = 266867)
[6793867.206425] CPU74: Package temperature above threshold, cpu clock throttled (total events = 266869)
[6793867.206427] CPU30: Package temperature above threshold, cpu clock throttled (total events = 266865)
[6793867.206429] CPU25: Package temperature above threshold, cpu clock throttled (total events = 266863)
[6793867.206431] CPU27: Package temperature above threshold, cpu clock throttled (total events = 266872)
[6793867.206433] CPU86: Package temperature above threshold, cpu clock throttled (total events = 266870)
[6793867.206435] CPU42: Package temperature above threshold, cpu clock throttled (total events = 266870)
[6793867.206437] CPU84: Package temperature above threshold, cpu clock throttled (total events = 266871)
[6793867.206439] CPU37: Package temperature above threshold, cpu clock throttled (total events = 266874)
[6793867.206442] CPU81: Package temperature above threshold, cpu clock throttled (total events = 266870)
[6793867.206444] CPU40: Package temperature above threshold, cpu clock throttled (total events = 266872)
[6793867.206446] CPU71: Package temperature above threshold, cpu clock throttled (total events = 266864)
[6793867.206448] CPU29: Package temperature above threshold, cpu clock throttled (total events = 266872)
[6793867.206449] CPU26: Package temperature above threshold, cpu clock throttled (total events = 265357)
[6793867.206451] CPU70: Package temperature above threshold, cpu clock throttled (total events = 265812)
[6793867.208337] CPU70: Core temperature/speed normal
[6793867.208338] CPU26: Core temperature/speed normal
[6793867.208340] CPU41: Package temperature/speed normal
[6793867.208341] CPU27: Package temperature/speed normal
[6793867.208343] CPU85: Package temperature/speed normal
[6793867.208344] CPU74: Package temperature/speed normal
[6793867.208345] CPU30: Package temperature/speed normal
[6793867.208346] CPU66: Package temperature/speed normal
[6793867.208347] CPU22: Package temperature/speed normal
[6793867.208348] CPU71: Package temperature/speed normal
[6793867.208350] CPU32: Package temperature/speed normal
[6793867.208351] CPU40: Package temperature/speed normal
[6793867.208353] CPU73: Package temperature/speed normal
[6793867.208354] CPU29: Package temperature/speed normal
[6793867.208354] CPU84: Package temperature/speed normal
[6793867.208356] CPU83: Package temperature/speed normal
[6793867.208357] CPU39: Package temperature/speed normal
[6793867.208359] CPU38: Package temperature/speed normal
[6793867.208360] CPU86: Package temperature/speed normal
[6793867.208361] CPU42: Package temperature/speed normal
[6793867.208363] CPU43: Package temperature/speed normal
[6793867.208364] CPU87: Package temperature/speed normal
[6793867.208366] CPU35: Package temperature/speed normal
[6793867.208367] CPU79: Package temperature/speed normal
[6793867.208368] CPU69: Package temperature/speed normal
[6793867.208369] CPU25: Package temperature/speed normal
[6793867.208370] CPU76: Package temperature/speed normal
[6793867.208371] CPU70: Package temperature/speed normal
[6793867.208373] CPU36: Package temperature/speed normal
[6793867.208374] CPU80: Package temperature/speed normal
[6793867.208375] CPU23: Package temperature/speed normal
[6793867.208377] CPU75: Package temperature/speed normal
[6793867.208378] CPU31: Package temperature/speed normal
[6793867.208379] CPU81: Package temperature/speed normal
[6793867.208381] CPU37: Package temperature/speed normal
[6793867.208382] CPU68: Package temperature/speed normal
[6793867.208383] CPU24: Package temperature/speed normal
[6793867.208385] CPU82: Package temperature/speed normal
[6793867.208386] CPU78: Package temperature/speed normal
[6793867.208387] CPU34: Package temperature/speed normal
[6793867.208389] CPU77: Package temperature/speed normal
[6793867.208390] CPU33: Package temperature/speed normal
[6793867.208392] CPU28: Package temperature/speed normal
[6793867.208393] CPU72: Package temperature/speed normal
[6793867.208394] CPU26: Package temperature/speed normal
[6793867.216060] CPU67: Package temperature/speed normal
[6800031.218905] hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-6 SSDSmartPathCap- En- Exp=1
[6800032.303544] hpsa 0000:08:00.0: device is ready.
[6800032.303549] hpsa 0000:08:00.0: scsi 0:1:0:0: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-6 SSDSmartPathCap- En- Exp=1
[6800128.461321] hpsa 0000:08:00.0: cmd_tagged_alloc: tag collision (tag=56)
[6800128.461329] scsi 0:0:0:0: [sg0] tag#45 CDB: Test Unit Ready 00 00 00 00 00 00
[6800128.561571] BUG: unable to handle kernel NULL pointer dereference at           (null)
[6800128.562038] IP: [<ffffffff922e129e>] scsi_eh_done+0x1e/0x60
[6800128.562287] PGD 0
[6800128.562520] Oops: 0000 [#1] SMP
[6800128.562760] Modules linked in: gsch(OE) redirfs(OE) dsa_filter(POE) binfmt_misc bonding nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi crc32_pclmul vfat fat ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sg hpilo hpwdt lpc_ich i2c_i801 ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq wmi ioatdma acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm drm_panel_orientation_quirks ixgbe tg3 hpsa mdio dca ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
[6800128.565340] CPU: 0 PID: 1508 Comm: scsi_eh_0 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.18.1.el7.x86_64 #1
[6800128.566278] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/21/2019
[6800128.566797] task: ffff9601f9c49070 ti: ffff9601fbeb4000 task.ti: ffff9601fbeb4000
[6800128.567328] RIP: 0010:[<ffffffff922e129e>]  [<ffffffff922e129e>] scsi_eh_done+0x1e/0x60
[6800128.567880] RSP: 0018:ffff9601ff203df0  EFLAGS: 00010093
[6800128.568442] RAX: 0000000000000000 RBX: ffff96825c73c380 RCX: ffff9601ff3f5000
[6800128.569016] RDX: ffff96825c73c380 RSI: ffff9541efd0a800 RDI: ffff96825c73c380
[6800128.569602] RBP: ffff9601ff203df8 R08: 0000000000000001 R09: ffff9601fbeb7d18
[6800128.570201] R10: ffff9601f96a3800 R11: ffff95f4b0adcfc0 R12: ffff96825c73c380
[6800128.570814] R13: ffff9601fa2b0000 R14: 0000000000000000 R15: ffff9541efc20a80
[6800128.571433] FS:  0000000000000000(0000) GS:ffff9601ff200000(0000) knlGS:0000000000000000
[6800128.572072] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6800128.572715] CR2: 0000000000000000 CR3: 0000017d78b0c000 CR4: 00000000003607f0
[6800128.573380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[6800128.574044] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[6800128.574709] Call Trace:
[6800128.575357]  <IRQ>
[6800128.575372]  [<ffffffffc027e10b>] hpsa_cmd_free_and_done+0x2b/0x30 [hpsa]
[6800128.576680]  [<ffffffffc02824d2>] complete_scsi_command+0x312/0x9c0 [hpsa]
[6800128.577356]  [<ffffffff91eade10>] ? __internal_add_timer+0x130/0x130
[6800128.578043]  [<ffffffffc028365f>] do_hpsa_intr_msi+0x7f/0x1d0 [hpsa]
[6800128.578737]  [<ffffffff91f4fe44>] __handle_irq_event_percpu+0x44/0x1c0
[6800128.579442]  [<ffffffff91f4fff2>] handle_irq_event_percpu+0x32/0x80
[6800128.580151]  [<ffffffff91f5007c>] handle_irq_event+0x3c/0x60
[6800128.580859]  [<ffffffff91f52e6f>] handle_edge_irq+0x7f/0x150
[6800128.581579]  [<ffffffff91e2f5f4>] handle_irq+0xe4/0x1a0
[6800128.582306]  [<ffffffff9259287d>] do_IRQ+0x4d/0xf0
[6800128.583031]  [<ffffffff9258436a>] common_interrupt+0x16a/0x16a
[6800128.583775]  <EOI>
[6800128.583785]  [<ffffffff922e7fa5>] ? scsi_request_fn+0x135/0x680
[6800128.585285]  [<ffffffff922e80ab>] ? scsi_request_fn+0x23b/0x680
[6800128.586053]  [<ffffffff922e8685>] ? scsi_end_request+0x135/0x1e0
[6800128.586827]  [<ffffffff9214f289>] __blk_run_queue+0x39/0x50
[6800128.587606]  [<ffffffff9214f306>] blk_run_queue+0x26/0x40
[6800128.588386]  [<ffffffff922e68e8>] scsi_run_queue+0x258/0x2f0
[6800128.589176]  [<ffffffff922de12b>] ? scsi_device_put+0x2b/0x30
[6800128.589946]  [<ffffffff922e8751>] scsi_run_host_queues+0x21/0x40
[6800128.590717]  [<ffffffff922e4455>] scsi_error_handler+0x1b5/0x8b0
[6800128.591480]  [<ffffffff922e42a0>] ? scsi_eh_get_sense+0x250/0x250
[6800128.592245]  [<ffffffff91ec6321>] kthread+0xd1/0xe0
[6800128.592996]  [<ffffffff91ec6250>] ? insert_kthread_work+0x40/0x40
[6800128.593728]  [<ffffffff9258dd37>] ret_from_fork_nospec_begin+0x21/0x21
[6800128.594435]  [<ffffffff91ec6250>] ? insert_kthread_work+0x40/0x40
[6800128.595117] Code: e8 38 08 fd ff e9 33 ff ff ff 0f 1f 00 0f 1f 44 00 00 8b 05 79 92 d7 00 55 48 89 e5 53 48 89 fb 83 e0 07 83 f8 03 77 1a 48 8b 03 <48> 8b 00 48 8b b8 a0 00 00 00 48 85 ff 74 05 e8 0e 28 bf ff 5b
[6800128.596547] RIP  [<ffffffff922e129e>] scsi_eh_done+0x1e/0x60
[6800128.597215]  RSP <ffff9601ff203df0>
[6800128.597852] CR2: 0000000000000000
Thank you!

User avatar
TrevorH
Site Admin
Posts: 33218
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Server Reboot with Kernel Crash logged "Package temperature above threshold, cpu clock throtteld"

Post by TrevorH » 2020/06/24 10:31:52

The hundreds of warnings about overheating are not enough of a clue?
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

Post Reply