Page 1 of 1

monitoring CPU temp

PostPosted: Tue Jan 22, 2008 7:44 pm
by stack
So I turned on the monitor screen for my server today and saw this message on the login screen:
kernel: CPU0: Temperature above threshold, cpu clock throttled (total events = 3851 )
kernel: CPU0: Temperature / speed normal

The only thing that came to my mind was "what the hell... ? ? " I monitor my hard drive temperatures religiously, but because I never can get CPU temperature monitoring to work right I never bother with it ( with my pick-of-the-litter luck I seem to always get hardware that isn't supported). So I just keep my systems well cooled and I always set the BIOS option of "when temp gets above XX degrees C turn it off".

When I saw this message, I was quite shocked. So I did some digging. Sure enough /var/log/kern.log has some more info (it throttled and then 5 minutes later returned to normal) but nothing about how hot the temperature was nor about where I could get such info. So off to Google. It turns out that the Linux kernel monitors such things for you now (learn something new every day). However, I can't find out where such information would be stored.

I dug around in /proc/acpi and it did have some information about the CPU but nothing about the temperature. The processor is a P4 2.2 and there is nothing special about it as far as I can tell on a standard Intel motherboard. So I caved and started installing more programs (Horay! Just want you want to do with your server kids! Installing random packages helps you clutter your drive which is important for servers. Remember, if it's not broke your not trying!" ).

The BIOS sees the fans, knows their speed, and has the three temperature zones listed. It shouldn't be too difficult to just read what it is already capturing right?

First up lm-sensors. Did the install, configured and compiled the modules, then modprobe'd the modules and run! ...or not...because even though it finds 1 of the 6 fans in the case it doesn't know what to do with it and of course my motherboard is not supported for anything else....

Next up MBmon. This is the one program that I have had best results with in the past...and once again my motherboard is not recognized.

OK lets try gkrellm. Can we guess the results? Not supported. Dah.... :x

OK if the kernel can figure out that the temp is too hot, why can't at least one of three more popular programs for monitoring CPU temperatures figure it out? but more importantly, does anyone know how to find such info out from the kernel? It apparently has figured it out but it just isn't saying (or I am looking in the wrong spot).

OK well its dinner time, so I will start searching Google again when I get back. In the meantime, if anyone has any ideas please let me know.

Thanks!
~Stack~

Re: monitoring CPU temp

PostPosted: Tue Jan 22, 2008 9:10 pm
by Randy
Have you tried running the sensors-detect script, as root, to see what hardware sensors it will detect?

Admittedly, even with the script, it's still kind of hit or miss. You can make a copy of it's output and manually load any modules it recommends before putting the output in /etc/modprobe.conf (or wherever your distro looks for module info) to test it out.

After you manually load the modules, just type sensors and see what kind of output you get. Should it display the values you seek, you many need to delve further into the correct formulas for computing the thresholds of your system. Look for a sensors.conf file somewhere on your system.

Re: monitoring CPU temp

PostPosted: Tue Jan 22, 2008 10:06 pm
by stack
Yeah, I ran the sensors-detect and it only detected the fan. The modprobe loaded just fine into the kernel, but when I run sensors I get an error on reading the fan. But I don't care about the fan :P .

I have done some more research and I can't figure out how the kernel is determining what the temperature. I did confirm my suspicion that the motherboard is not supported so these other programs are not going to work :x

If I can figure out what the kernel is seeing, then maybe I can figure out a solution. So far I am not finding much...guess its time to start reading the code....