Profiling is an essential technique for assessing software application performance. It provides detailed insights into various aspects, such as function call frequency, routine execution times, and the duration spent in different code segments. This information helps identify performance bottlenecks and inefficient sections, enabling effective improvements.
In this guide, we will focus on measuring the relationship between performance and thermal metrics. This approach helps us understand the impact of system load on thermal performance, which is crucial for developing reliable products. By understanding how thermal performance changes over time, we can make informed decisions to improve system reliability.
There are numerous profiling tools available, including summary and sampling-based tools like perf
. Some tools also provide a UI for visualization. However, for this guide, we will use Python scripts to quickly achieve thermal logging and gain insights into thermal changes over time.
The objective is to log all critical data into a single CSV file, which can later be processed using Python’s matplotlib
. The two provided Python scripts are minimal and require few tools or dependencies.
The Logger
This script records temperature and CPU frequency changes every second. Note that the specific thermal nodes and CPU frequency nodes may vary depending on your hardware and system drivers. Ensure you adjust the script accordingly to match your system’s configuration.
Usage
Run the following logger on your target machine:
python {path_to_script}/thermal_logger.py
It logs to the /tmp
folder:
/tmp/temperature_curve{date_and_time}.csv
thermal_logger.py
#!/usr/bin/env python3
import os
import time
def get_cpu_usage():
output = os.popen('top -bn1').read()
cpu_line = [line for line in output.split('\n') if 'Cpu(s)' in line][0]
cpu_usage = 100 - float(cpu_line.split()[7].strip('%id,'))
return round(cpu_usage, 2)
def get_cpu_thermal(zone):
thermal_file = f"/sys/class/thermal/thermal_zone{zone}/temp"
with open(thermal_file) as file:
thermal_temp = int(file.read().strip())
return round(thermal_temp / 1000, 1)
def get_s1_cpu_frequency():
with open("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq") as file:
frequency = int(file.read().strip())
return round(frequency / 1000000, 2)
def get_s2_cpu_frequency():
with open("/sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_cur_freq") as file:
frequency = int(file.read().strip())
return round(frequency / 1000000, 2)
def get_l1_cpu_frequency():
with open("/sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_cur_freq") as file:
frequency = int(file.read().strip())
return round(frequency / 1000000, 2)
def get_l2_cpu_frequency():
with open("/sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_cur_freq") as file:
frequency = int(file.read().strip())
return round(frequency / 1000000, 2)
def get_gpu_usage_and_frequency():
with open("/sys/class/devfreq/fb000000.gpu/load") as file:
gpu_info = file.read().strip('Hz\n').split('@')
return round(int(gpu_info[0]), 2), round(int(gpu_info[1])/1000000000, 2)
def main():
filename = "/tmp/temperature_curve" + time.strftime('-%m-%d-%H-%M-%S', time.localtime(time.time())) + ".csv"
print("Saving thermal log to:")
print(filename)
header = (
"Timestamp,CPU Usage(%),CPU Bigcore Thermal(C),CPU Littlecore Thermal(C),"
"GPU Thermal(C),NPU Thermal(C),CPU L1-Core Frequency(GHZ),CPU L2-Core Frequency(GHZ),"
"CPU S1-Core Frequency(GHZ),CPU S2-Core Frequency(GHZ),GPU Usage(%),GPU Frequency(GHZ),"
"NPU Frequency(GHZ),NPU1 Usage(%),NPU2 Usage(%),NPU3 Usage(%)"
)
with open(filename, "w") as file:
print(header, file=file)
file.flush()
while True:
try:
timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
cpu_usage = get_cpu_usage()
cpu_bigcore_thermal = get_cpu_thermal(1)
cpu_littlecore_thermal = get_cpu_thermal(3)
gpu_thermal = get_cpu_thermal(5)
npu_thermal = get_cpu_thermal(6)
s1_cpu_frequency = get_s1_cpu_frequency()
s2_cpu_frequency = get_s2_cpu_frequency()
l1_cpu_frequency = get_l1_cpu_frequency()
l2_cpu_frequency = get_l2_cpu_frequency()
gpu_usage, gpu_frequency = get_gpu_usage_and_frequency()
log_entry = (
f"{timestamp},{cpu_usage:.2f},{cpu_bigcore_thermal:.2f},{cpu_littlecore_thermal:.2f},"
f"{gpu_thermal:.2f},{npu_thermal:.2f},{l1_cpu_frequency:.2f},{l2_cpu_frequency:.2f},"
f"{s1_cpu_frequency:.2f},{s2_cpu_frequency:.2f},{gpu_usage:.2f},{gpu_frequency:.2f},"
f"{npu_frequency:.2f},{npu_usage[0]:.2f},{npu_usage[1]:.2f},{npu_usage[2]:.2f}"
)
print(log_entry, file=file)
file.flush()
except Exception as e:
print(e)
time.sleep(1)
if __name__ == "__main__":
main()
Captured Data Example
Timestamp,CPU Usage(%),CPU Thermal 0(C),CPU Thermal 1(C),CPU Thermal 2(C),CPU Thermal 3(C),CPU Frequency(GHZ)
2024-04-17 15:56:26,9.30,60.10,60.10,60.10,60.10,0.60
2024-04-17 15:56:28,6.20,60.10,60.10,59.20,60.10,0.41
2024-04-17 15:56:29,6.30,60.10,60.10,60.10,60.10,0.41
.....
Run Your Process to Increase System Load
You can run your desired “load” to increase system usage and temperature. This can be achieved using either a real application or stress testing tools.
Examples of stress tests include:
- CPU stress tests
- Memory stress tests
- GPU stress tests
- I/O stress tests
Alternatively, you can use real applications to generate system load.
Plot the Data
python thermal_plot.py {logged_file_name}.csv
thermal_plot.py
import argparse
import os
import pandas as pd
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt
def plot_data(log_path):
data = pd.read_csv(log_path, parse_dates=['Timestamp'])
fig, (ax_temperatures, ax_usage, ax_frequency) = plt.subplots(3, 1, figsize=(20, 15))
# Add text to the plot
fig.text(0.5, 0.95, "Data Name: "+ str(log_path), ha='center', fontsize=20, fontweight='bold')
fig.text(0.5, 0.92, "Log Time: "+ str(data['Timestamp'][0]), ha='center', fontsize=20, fontweight='bold')
# Plot CPU temperatures for each core
ax_temperatures.plot(data['Timestamp'], data['CPU Bigcore Thermal(C)'], label='CPU L1-Core')
ax_temperatures.plot(data['Timestamp'], data['CPU Littlecore Thermal(C)'], label='CPU S1-Core')
ax_temperatures.plot(data['Timestamp'], data['GPU Thermal(C)'], label='GPU')
ax_temperatures.plot(data['Timestamp'], data['NPU Thermal(C)'], label='NPU')
ax_temperatures.set_ylabel('Temperature (C)')
ax_temperatures.set_ylim(40, 120)
ax_temperatures.axhline(y=90, color='orange', linestyle='--', label='High Temp Warning')
ax_temperatures.axhline(y=115, color='r', linestyle='--', label='Panic Shutdown Temp')
ax_temperatures.legend()
ax_temperatures.set_title('CPU Temperatures')
# Plot CPU usage
ax_usage.plot(data['Timestamp'], data['CPU Usage(%)'], label='CPU Usage')
ax_usage.plot(data['Timestamp'], data['GPU Usage(%)'], label='GPU Usage')
ax_usage.set_ylabel('Usage %')
ax_usage.set_ylim(0, 105)
# Add dashed line at 80% CPU usage
ax_usage.axhline(y=80, color='orange', linestyle='--', label='High Load')
ax_usage.legend()
ax_usage.set_title('CPU and GPU Usage in %')
# Smooth CPU frequency using Savitzky-Golay filter
l1_smoothed_frequency = savgol_filter(data['CPU L1-Core Frequency(GHZ)'], window_length=11, polyorder=3)
l2_smoothed_frequency = savgol_filter(data['CPU L2-Core Frequency(GHZ)'], window_length=11, polyorder=3)
s1_smoothed_frequency = savgol_filter(data['CPU S1-Core Frequency(GHZ)'], window_length=11, polyorder=3)
s2_smoothed_frequency = savgol_filter(data['CPU S2-Core Frequency(GHZ)'], window_length=11, polyorder=3)
# Plot smoothed CPU frequency
ax_frequency.plot(data['Timestamp'], l1_smoothed_frequency, label='CPU L1-Core Frequency(GHZ)')
ax_frequency.plot(data['Timestamp'], l2_smoothed_frequency, label='CPU L2-Core Frequency(GHZ)')
ax_frequency.plot(data['Timestamp'], s1_smoothed_frequency, label='CPU S1-Core Frequency(GHZ)')
ax_frequency.plot(data['Timestamp'], s2_smoothed_frequency, label='CPU S2-Core Frequency(GHZ)')
ax_frequency.plot(data['Timestamp'], data['GPU Frequency(GHZ)'], label='GPU Frequency(GHZ)')
ax_frequency.axhline(y=0.4, color='r', linestyle='--', label='Thermal Throttle Line')
ax_frequency.set_ylabel('Frequency (GHz)')
ax_frequency.legend()
ax_frequency.set_title('CPU (Large/Small Core) Frequency GHZ')
# Set y-axis limit to 0Hz to 2GHz
ax_frequency.set_ylim(0, 3)
fig.set_size_inches(25, 18)
# Save the plot as a PNG file
plot_file_path = os.path.splitext(log_path)[0] + '.png'
plt.savefig(plot_file_path)
# Show the plot
plt.show()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('log_path', help='Path to the log file')
args = parser.parse_args()
try:
plot_data(args.log_path)
except Exception as
fig.set_size_inches(25, 18)
# Save the plot as a PNG file
plot_file_path = os.path.splitext(log_path)[0] + '.png'
plt.savefig(plot_file_path)
# Show the plot
plt.show()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('log_path', help='Path to the log file')
args = parser.parse_args()
try:
plot_data(args.log_path)
except Exception as e:
print("Parse/Plot error:", e)
if __name__ == "__main__":
main()
Output Plot
The example plot below demonstrates the thermal behavior of the system. Initially, the temperature rises until it reaches the thermal limit. At this point, the frequency is reduced to prevent overheating. The plot also shows how the system dissipates heat through passive cooling.
Conclusion
This guide demonstrated how to profile your system’s thermal and performance characteristics using Python scripts. By logging CPU and GPU usage, thermal metrics, and frequency changes, you can identify performance bottlenecks and improve system reliability. Visualizing the data helps understand the relationship between system load and thermal performance, aiding in heat management and avoiding thermal throttling.
What’s Next:
If your system suffers from throttling, it’s advisable to conduct further testing with a power meter. Connect your system to an ammeter to measure power usage and log the power consumption data alongside the thermal metrics. By analyzing the relationship between power usage and thermal performance, you can gain valuable insights to enhance hardware thermal designs and optimize software for better system efficiency.