10 Troubleshooting Commands for Linux Systems
Photo by Benjamin Voros on Unsplash
1. How to view processes consuming the most CPU?¶
$ ps H -eo pid,pcpu | sort -nk2 | tail
31396 0.6
31396 0.6
31396 0.6
31396 0.6
31396 0.6
31396 0.6
31396 0.6
31396 0.6
30904 1.0
30914 1.0
The most CPU-intensive PID is 30914. Voiceover: Actually, it’s 31396.
2. What is the service name corresponding to the PID of the most CPU-intensive process?¶
Method One:
$ ps aux | fgrep 30914
work 30914 1.0 0.8 309568 71668 ? Sl Feb02 124:44 ./router2 –conf=rs.conf
The process is./router2.
Method Two:
$ ll /proc/30914
lrwxrwxrwx 1 work work 0 Feb 10 13:27 cwd -> /home/work/im-env/router2
lrwxrwxrwx 1 work work 0 Feb 10 13:27 exe -> /home/work/im-env/router2/router2
Voiceover: Great, the full path is all there.
3. How to check the connection status of a specific port?¶
Method One:
$ netstat -lap | fgrep 22022
tcp 0 0 1.2.3.4:22022 *:* LISTEN 31396/imui
tcp 0 0 1.2.3.4:22022 1.2.3.4:46642 ESTABLISHED 31396/imui
tcp 0 0 1.2.3.4:22022 1.2.3.4:46640 ESTABLISHED 31396/imui
Method Two:
$ /usr/sbin/lsof -i :22022
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
router 30904 work 50u IPv4 69065770 TCP 1.2.3.4:46638->1.2.3.4:22022 (ESTABLISHED)
router 30904 work 51u IPv4 69065772 TCP 1.2.3.4:46639->1.2.3.4:22022 (ESTABLISHED)
router 30904 work 52u IPv4 69065774 TCP 1.2.3.4:46640->1.2.3.4:22022 (ESTABLISHED)
4. How to check the number of connections on a machine?¶
The SSH daemon (sshd) on 1.2.3.4 is listening on port 22. How can we count the number of connections in various states (TIME_WAIT/ CLOSE_WAIT/ ESTABLISHED) for the sshd service on 1.2.3.4?
$ netstat -n | grep 1.2.3.4:22 | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
$ netstat -lnpta | grep ssh | egrep "TIME_WAIT | CLOSE_WAIT | ESTABLISHED"
Note: netstat is a commonly used tool for tracing network connection issues, especially when combined with grep/awk, it becomes a powerful tool.
5. Querying data from pre-backed up logs¶
From the pre-backed up service.2022–06–26.log.bz2 log, how many entries contain the keyword 1.2.3.4?
$ bzcat service.2022-06-26.log.bz2 | grep '1.2.3.4' | wc -l
$ bzgrep '1.2.3.4' service.2022-06-26.log.bz2 | wc -l
$ less service.2022-06-26.log.bz2 | grep '10.37.9.11' | wc -l
Note: Online log files are generally preserved after being compressed with bz2. If decompressed for querying, it consumes a lot of space and time. Therefore, bzcat and bzgrep are essential tools for research and development colleagues to master.
6. Backup service tips¶
Pack up the /opt/web/service_web directory for backup, excluding the logs directory within it, and store the packed file in the /opt/backup directory.
$ tar -zcvf /opt/backup/service_web.tar.gz \
-exclude /opt/web/service_web/logs \
/opt/web/service_web
Note: This command is commonly used in online applications. When a project needs to be packed and migrated, it often requires excluding the log directory. The `exclude` parameter is essential to master in such scenarios.
7. Querying thread count¶
Query the total number of threads running for a server’s services. When the number of threads on the machine exceeds the threshold for warning, it should quickly identify the relevant process and thread information.
$ ps -eLf | wc -l
$ pstree -p | wc -l
8. Disk alarm, empty the largest file¶
Find and release space for a large number of exception logs generated by a running Tomcat server on the server. Suppose the file contains the keyword “log” and is larger than 1GB.
Step 1: Find the file.
$ find / -type f -name "*log*" | xargs ls -lSh | more
$ du -a / | sort -rn | grep log | more
$ find / -name '*log*' -size +1000M -exec du -h {} \;
Step 2: Empty the file.
Assuming the found file is a.log, the correct way to empty it is:
$ echo "" > a.log
This will immediately release the file space.
Many people might use:
$ rm -rf a.log
While this deletes the file, if the Tomcat service is still running, the space will not be immediately released. You would need to restart Tomcat to free up the space.
9. Display file, filter comments¶
Display the server.conf file, masking comment lines starting with #.
$ sed -n '/^[#]/!p' server.conf
$ sed -e '/^#/d' server.conf
$ grep -v "^#" server.conf
10. Disk IO exception troubleshooting¶
How to troubleshoot disk IO exceptions, such as slow write or high current usage, please identify the process ID causing the high disk IO exception.
Step 1:
$ iotop -o
View all process IDs currently writing to disk.
Step 2: If the write indicators are low and there are basically no major write operations, the disk itself needs to be checked. You can check the system
$ dmesg
or
$ cat /var/log/message
to see if there are any related disk error messages. At the same time, you can touch an empty file on the slow-write disk to see if the disk failure prevents writing.
My interests are wide-ranging, covering topics such as frontend and backend development, DevOps, software architecture, a bit of economics and finance
More from ByteCook¶
Recommended from Medium¶
[
See more recommendations
](https://medium.com/?source=post_page---read_next_recirc--4fa8c3a1a466---------------------------------------)

