Aller au contenu

10 Troubleshooting Commands for Linux Systems

Sitemap

Photo by Benjamin Voros on Unsplash

1. How to view processes consuming the most CPU?

$ ps H -eo pid,pcpu | sort -nk2 | tail
31396  0.6
31396  0.6
31396  0.6
31396  0.6
31396  0.6
31396  0.6
31396  0.6
31396  0.6
30904  1.0
30914  1.0

The most CPU-intensive PID is 30914. Voiceover: Actually, it’s 31396.

2. What is the service name corresponding to the PID of the most CPU-intensive process?

Method One:

$ ps aux | fgrep 30914
work 30914  1.0  0.8 309568 71668 ?  Sl   Feb02 124:44 ./router2 conf=rs.conf

The process is./router2.

Method Two:

$ ll /proc/30914
lrwxrwxrwx  1 work work 0 Feb 10 13:27 cwd -> /home/work/im-env/router2
lrwxrwxrwx  1 work work 0 Feb 10 13:27 exe -> /home/work/im-env/router2/router2

Voiceover: Great, the full path is all there.

3. How to check the connection status of a specific port?

Method One:

$ netstat -lap | fgrep 22022
tcp        0      0 1.2.3.4:22022          *:*                         LISTEN      31396/imui
tcp        0      0 1.2.3.4:22022          1.2.3.4:46642          ESTABLISHED 31396/imui
tcp        0      0 1.2.3.4:22022          1.2.3.4:46640          ESTABLISHED 31396/imui

Method Two:

$ /usr/sbin/lsof -i :22022
COMMAND   PID USER   FD   TYPE   DEVICE SIZE NODE NAME
router  30904 work   50u  IPv4 69065770       TCP 1.2.3.4:46638->1.2.3.4:22022 (ESTABLISHED)
router  30904 work   51u  IPv4 69065772       TCP 1.2.3.4:46639->1.2.3.4:22022 (ESTABLISHED)
router  30904 work   52u  IPv4 69065774       TCP 1.2.3.4:46640->1.2.3.4:22022 (ESTABLISHED)

4. How to check the number of connections on a machine?

The SSH daemon (sshd) on 1.2.3.4 is listening on port 22. How can we count the number of connections in various states (TIME_WAIT/ CLOSE_WAIT/ ESTABLISHED) for the sshd service on 1.2.3.4?

$ netstat -n | grep 1.2.3.4:22 | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

$ netstat -lnpta | grep ssh | egrep "TIME_WAIT | CLOSE_WAIT | ESTABLISHED"

Note: netstat is a commonly used tool for tracing network connection issues, especially when combined with grep/awk, it becomes a powerful tool.

5. Querying data from pre-backed up logs

From the pre-backed up service.2022–06–26.log.bz2 log, how many entries contain the keyword 1.2.3.4?

$ bzcat service.2022-06-26.log.bz2 | grep '1.2.3.4' | wc -l

$ bzgrep '1.2.3.4' service.2022-06-26.log.bz2 | wc -l

$ less service.2022-06-26.log.bz2 | grep '10.37.9.11' | wc -l

Note: Online log files are generally preserved after being compressed with bz2. If decompressed for querying, it consumes a lot of space and time. Therefore, bzcat and bzgrep are essential tools for research and development colleagues to master.

6. Backup service tips

Pack up the /opt/web/service_web directory for backup, excluding the logs directory within it, and store the packed file in the /opt/backup directory.

$ tar -zcvf /opt/backup/service_web.tar.gz \
    -exclude /opt/web/service_web/logs \
    /opt/web/service_web

Note: This command is commonly used in online applications. When a project needs to be packed and migrated, it often requires excluding the log directory. The `exclude` parameter is essential to master in such scenarios.

7. Querying thread count

Query the total number of threads running for a server’s services. When the number of threads on the machine exceeds the threshold for warning, it should quickly identify the relevant process and thread information.

$ ps -eLf | wc -l

$ pstree -p | wc -l

8. Disk alarm, empty the largest file

Find and release space for a large number of exception logs generated by a running Tomcat server on the server. Suppose the file contains the keyword “log” and is larger than 1GB.

Step 1: Find the file.

$ find / -type f -name "*log*" | xargs ls -lSh | more 

$ du -a / | sort -rn | grep log | more

$ find / -name '*log*' -size +1000M -exec du -h {} \;

Step 2: Empty the file.

Assuming the found file is a.log, the correct way to empty it is:

$ echo "" > a.log

This will immediately release the file space.

Many people might use:

$ rm -rf a.log

While this deletes the file, if the Tomcat service is still running, the space will not be immediately released. You would need to restart Tomcat to free up the space.

9. Display file, filter comments

Display the server.conf file, masking comment lines starting with #.

$ sed -n '/^[#]/!p' server.conf

$ sed -e '/^#/d' server.conf

$ grep -v "^#" server.conf

10. Disk IO exception troubleshooting

How to troubleshoot disk IO exceptions, such as slow write or high current usage, please identify the process ID causing the high disk IO exception.

Step 1:

$ iotop -o

View all process IDs currently writing to disk.

Step 2: If the write indicators are low and there are basically no major write operations, the disk itself needs to be checked. You can check the system

$ dmesg

or

$ cat /var/log/message

to see if there are any related disk error messages. At the same time, you can touch an empty file on the slow-write disk to see if the disk failure prevents writing.

My interests are wide-ranging, covering topics such as frontend and backend development, DevOps, software architecture, a bit of economics and finance

More from ByteCook

[

See more recommendations

](https://medium.com/?source=post_page---read_next_recirc--4fa8c3a1a466---------------------------------------)