Disk I/O (Input/Output) performance issues can significantly affect the performance of a Linux server, especially in resource-heavy applications like databases or file servers. Identifying and troubleshooting disk I/O issues involves using a combination of tools and techniques to pinpoint bottlenecks, misconfigurations, or hardware failures. Below are several steps to help you troubleshoot disk I/O performance problems on your Linux server.
1. Check Disk Usage with df
Before diving deeper into I/O-specific troubleshooting, check if the disk is full or nearing full capacity, as this could impact performance.
$ df -h
This command shows disk space usage for all mounted filesystems.
Example output:
Filesystem Size Used Avail Use% Mounted on/dev/sda1 50G 30G 18G 63% / tmpfs 16G 1.6G 15G 10% /dev/shm /dev/sdb1 100G 25G 70G 27% /data
If the disk usage is high (especially the "Use%" value), consider freeing up space or expanding storage.
2. Monitor Disk I/O with iostat
The iostat
command, provided by the sysstat
package, gives detailed information about disk performance, including read/write speeds, I/O operations, and the overall system load.
$ iostat -x 5
The -x
option gives extended statistics, and the 5
specifies a 5-second interval between reports.
Example output:
Linux 5.4.0-74-generic (hostname) 12/26/2024 _x86_64_ (8 CPU)Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await svctm %util sda 10.1 8.2 1040 750 0.0 0.0 0.1 0.1 10.0 15.0 1.2 25.0 sdb 3.5 2.2 400 250 0.0 0.0 0.1 0.0 30.0 35.0 1.5 10.0
- r/s: Reads per second
- w/s: Writes per second
- rkB/s: Kilobytes read per second
- wkB/s: Kilobytes written per second
- %util: Percentage of time the device was busy (a high value suggests the disk is heavily utilized)
Look for high utilization or delays, which indicate a potential bottleneck.
3. Check for Disk Errors Using dmesg
Disk errors can significantly degrade performance. Use the dmesg
command to check for any system messages related to disk errors or I/O issues.
$ dmesg | grep -i error
If there are disk-related errors, they will typically appear here, including issues like I/O timeouts or hardware failures.
Example output:
[42615.217683] sd 2:0:0:0: [sda] Unhandled sense code[42615.217707] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [42615.217719] sd 2:0:0:0: [sda] Sense Key : Hardware Error [current]
4. Measure Disk Latency with blktrace
blktrace
is a low-level tool that traces block layer I/O operations. This tool provides insights into how long it takes for the system to read or write data to the disk.
-
Install
blktrace
:$ sudo apt-get install blktrace -
Start tracing the disk (replace
/dev/sda
with your device):$ sudo blktrace -d /dev/sda -o - | blkparse -i -
This will produce detailed information about I/O operations and their latencies.
Example output:
0,0 10.984523 563 I/O 4096 READ0,0 10.984731 564 I/O 4096 WRITE 0,0 10.985035 565 I/O 4096 READ
Look for high latency values, which indicate that disk operations are taking longer than usual.
5. Analyze Disk Queue Length with sar
The sar
command, part of the sysstat
package, can show historical disk performance metrics, including disk queue length.
$ sar -d 5 5
This will display disk activity every 5 seconds for 5 intervals, including the average queue length.
Example output:
Linux 5.4.0-74-generic (hostname) 12/26/2024 _x86_64_ (8 CPU)Time tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 15:30:01 30.0 1024 1024 68.0 0.20 12.0 10.0 25.0 15:30:06 28.0 900 1000 60.0 0.15 11.0 9.0 22.0
- avgrq-sz: Average request size
- avgqu-sz: Average queue size
- await: Average wait time per request
- %util: Percentage of time the disk is busy
A high average queue size or wait time can indicate disk performance issues.
7. Check Disk Health with smartctl
Disk failures or health issues can cause I/O performance degradation. Use smartctl
from the smartmontools
package to check the health of your disks.
$ sudo smartctl -a /dev/sda
Example output:
SMART Status: OKTemperature: 38 C (good) Reallocated_Sector_Ct: 0 (good) Power_On_Hours: 2400 (good)
If there are any SMART errors, it could indicate a failing disk.
8. Review Disk Configuration
If you're using software RAID or LVM, ensure that the configuration is optimal. Check for degraded RAID arrays or improperly configured volume groups that might affect performance.
For RAID, use:
$ cat /proc/mdstat
For LVM, use:
$ sudo vgs$ sudo lvs
Conclusion
By using these tools and techniques, you can diagnose and troubleshoot disk I/O performance issues on your Linux server. Start with basic checks like disk space and CPU usage, then move on to more advanced tools such as iostat
, dmesg
, and blktrace
. Identifying the root cause will help you optimize disk performance and prevent future issues.