Hello, I truly hope it treats you awesomely on your side of the screen :),
Troubleshooting Linux servers is a crucial skill for system administrators and IT professionals. This process involves identifying and resolving issues that may arise in the operation of Linux server systems. From diagnosing hardware and software problems to addressing network and process issues, troubleshooting Linux servers ensures optimal performance, stability, and security. By employing a systematic approach and utilizing various diagnostic tools and techniques, administrators can effectively identify and rectify server-related problems, minimizing downtime and ensuring smooth server operations.
In this guide, we will explore common commands and tools used for Troubleshooting Linux servers environments.
let’s get started.
- systemctl, journalctl
- HDD
- Searching
- CPU (usage, info)
- RAM (usage, free, infor)
- Process {strace -p 3569}
- Networking
- systemctl
You can verify the service is running by using the status
subcommand:
$ systemctl status <service_name>
- journalctl
Journalctl is a utility for querying and displaying logs from journald, systemd’s logging service. Since journald stores log data in a binary format instead of a plaintext format, journalctl is the standard way of reading log messages processed by journald.
The entries will start with a banner similar to this which shows the time span covered by the log.
$ journalctl
-- Logs begin at Tue 2023-02-28 08:21:01 CET, end at Tue 2023-02-28 15:11:18 CET. --
To see messages logged by any systemd unit, use the -u
switch. The command below will show all messages logged by the Nginx web server.
$ journalctl -u nginx
- HDD
The first step is to run the df command to find out information about total space and available space on a file system including partitions. For example
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 898M 0 898M 0% /dev
tmpfs 910M 0 910M 0% /dev/shm
tmpfs 910M 10M 900M 2% /run
tmpfs 910M 0 910M 0% /sys/fs/cgroup
/dev/mapper/centos-root 37G 19G 19G 50% /
/dev/sda1 1014M 239M 776M 24% /boot
tmpfs 182M 0 182M 0% /run/user/1000
- fdisk
Also known as format disk is a dialog-driven command in Linux used for creating and manipulating disk partition table. It is used for the view, create, delete, change, resize, copy and move partitions on a hard drive using the dialog-driven interface.
fdisk allows you to create a maximum of four primary partitions and the number of logical partition depends on the size of the hard disk you are using. It allows the user:
$ fdisk -l
Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b8270
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 41943039 19921920 8e Linux LVM
/dev/sda3 41943040 83886079 20971520 5 Extended
/dev/sda5 41945088 83886079 20970496 8e Linux LVM
Disk /dev/mapper/centos-root: 39.7 GB, 39720058880 bytes, 77578240 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Search and find files and folders
- find
The find
command is one of the most useful Linux commands, especially when you’re faced with the hundreds and thousands of files and folders on a modern computer
Find a single file by namek
When you know the name of a file but can’t remember where you saved it, use find
to search your home directory. Use 2>/dev/null
to silence permission errors (or use sudo
to gain all permissions).
$ find / -name "Foo.txt"
/home/user/Documents/Foo.txt
Find files by type
You can display files, directories, symlinks, named pipes, sockets, and more using the -type
option.
$ find ~ -type f
/home/user/.bash_logout
/home/user/.bash_profile
/home/user/.bashrc
- grep
grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command g/re/p, which has the same effect
Search for a string in text
The canonical use of grep
is searching for a precise string of characters in some greater body of text, and returning the line or lines containing successful matches. Here’s an example:
$ grep HERO example.txt
NetHERO
thisAtestHERO
Search for a string in a stream of text
Another common way to use grep
is with a pipe, making it a sort of filter. This technique has some advantages. One is helping to narrow grep
‘s scope by searching through only the results of another process. For example, this command searches for iana
only in the last 10 lines of example.com
‘s source code, instead of searching the whole page:
curl example.com -s | tail | grep iana
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
Search many files at once
$ grep Fedora distro.list example.txt fake.txt
distro.list:Fedora creates an innovative, free, and open source platform for hardware, clouds, and containers that enables software developers and community members to build tailored solutions for their users.
example.txt:Fedora Linux
- locate
The locate command finds files in Linux using the file name. locate is used for obtaining instantaneous results, and it is an essential utility when speed is a priority. The command performs the search using a database containing bits of files with the corresponding paths in the system
Search for a File
$ locate mysql
/etc/my.cnf.d/mysql-clients.cnf
/home/user/.oh-my-zsh/plugins/mysql-macports
/home/user/.oh-my-zsh/plugins/mysql-macports/README.md
CPU
Whether an application is running on a server or a local machine, monitoring CPU utilization and CPU load is essential for optimizing performance. While CPU utilization and load might sound similar, they’re actually quite different.
- top
Running the top
command will create an output similar to the one seen in the figure below:
man top: https://man7.org/linux/man-pages/man1/top.1.html
- uptime
The uptime command is also useful for viewing the load average of the system. This command displays the current system time, the uptime of the machine, the number of users currently logged into the system, and the load averages for the last 1, 5 and 15-minute durations.
Running the uptime
command will generate an output similar to the one shown below:
$ uptime
10:15:28 up 35 days, 19:06, 1 user, load average: 0.00, 0.00, 0.00
- ps
The ps
command is a flexible and widely used tool for identifying the processes running in the system and the number of resources they’re using to run. This command can show different outputs according to various options.
Running the ps
command will generate an output like the one shown below:
For example, we can view and sort which processes are using the most CPU by running the following command:
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
get CPU info
- lscpu
“lscpu” in Linux is command-line used to get CPU information of the system. The lscpu fetches the CPU architecture information from the “sysfs” and /proc/cpuinfo files and displays it in a terminal.
Running the lscpu command will generate an output similar to the one shown below:
Architecture : x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme : Little Endian
Processeur(s) : 2
Liste de processeur(s) en ligne : 0,1
Thread(s) par cœur : 1
Cœur(s) par socket : 2
Socket(s) : 1
Nœud(s) NUMA : 1
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle : 79
Nom de modèle : Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Révision : 1
Vitesse du processeur en MHz : 2294.685
BogoMIPS : 4589.37
Constructeur d'hyperviseur : AWS
Type de virtualisation : complet
Cache L1d : 32K
Cache L1i : 32K
Cache L2 : 256K
Cache L3 : 51200K
Nœud NUMA 0 de processeur(s) : 0,1
Drapaux : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
RAM
When troubleshooting Linux RAM, there are a few steps you can take to diagnose and resolve the issue:
- free
Use the free
command to check the amount of available memory on the system. If the available memory is low, it could be a sign of a memory leak or excessive memory usage by a process.
Running thefree -h
command will generate an output similar to the one shown below:
total used free shared buff/cache available
Mem: 7.8Gi 266Mi 5.5Gi 1.0Mi 2.0Gi 7.2Gi
Swap: 0B 0B 0B
Use the top
command to identify processes that are using a lot of memory. (read the description above in the CPU section)
Check for hardware issues: If you suspect a hardware issue, you can use tools like memtest
to test your RAM for errors. Make sure to run this test for an extended period of time to ensure accurate results.
Check system logs for any memory-related errors or warnings. You can use the dmesg
command to view kernel messages related to memory.
Finally If all else fails, a simple restart of the system may resolve the issue. This can help clear any memory leaks or other issues that may be causing problems.
Process
When it come to troubleshoot Linux process issues, here are some steps and command i personally use to diagnose and resolve the problem(s).
- ps (read the description above in the CPU section)
- top (read the description above in the CPU section) also you can use htop it’s has better interface.
- kill
If you identify a problematic process, you can terminate it using the kill
command. Use kill PID
where “PID” is the process ID of the process you want to terminate. If the process does not respond to the kill
command, you can use the kill -9 PID
command to force it to terminate.
- dmesg
Check system logs for any process-related errors or warnings. You can use the dmesg
command to view kernel messages related to processes.
Netwok
Networking configuration and troubleshooting are crucial tasks that sysadmins need to perform regularly. Some of these tasks can be challenging. However, when dealing with connectivity issues, using the right tools will assist you in achieving the results in a faster and consistent way. here are some steps you can take to diagnose and resolve the problem.
- ifconfig
ifconfig is a command-line utility known for interface configuration in Linux/Unix operating systems. Network administrators also use it to query and manage interface parameters with the help of configuration scripts.
It helps you enable or disable a network interface and allows you to assign an IP address and netmask to the selected interface. You can also view all the available interfaces, IP addresses, hardware addresses, and maximum transmission unit size for active interfaces.
You can activate/deactivate any interface by using up/down parameters, as follows:
sudo ifconfig up eth0
sudo ifconfig down eth0
To assign an IP address to an interface:
sudo ifconfig eth0 192.168.120.5 netmask 255.255.255.0
- netstat
netstat is a command-line utility that helps discover connected and listening TCP, UDP, and UNIX sockets. It displays information about routing tables, listening ports, and information statistics.
You can list both listening or closed connections by typing:
netstat -a
To list only listening TCP connections:
netstat -tl
- host
host is a minimal yet most powerful CLI utility that performs DNS lookups and resolves hostname to IP addresses and vice versa. In addition to troubleshooting DNS server problems, it also displays and verifies NS and MX DNS record types and ISP DNS servers.
To find NX for the Google website:
host -t ns google.com
You can also find MX records by running:
host -n -t mx google.com
- dig
an acronym for Domain Information Groper gathers DNS-related information and troubleshoots DNS problems.
The dns command output displays information available inside files containing DNS records and helps network administrators verify if the host to IP address name resolution is working fine.
You can perform the DNS lookup query as follows:
dig google.com
Similarly, you can query all types of DNS records associated with a domain with the help of the ANY option:
dig google.com ANY
- ping
The ping
utility is a useful tool for identifying network and host availability, as well as checking for network connectivity issues such as high latency and packet drop. By sending ICMP (Internet Control Message Protocol) echo request messages and waiting for ICMP echo reply packets,
ping
can verify if a host is reachable and whether a service is running. The command’s output includes the total number of messages sent and received, as well as the time it takes for packets to reach their destination.
ping 8.8.8.8
Troubleshooting is part of the day-to-day activities of a System administrator. Knowing which tool to use in the absence of one or with a broader functionality is equally important for effective troubleshooting of a variety of system conditions.
Resources
- https://letstalkaboutdevops.com/
- https://www.redhat.com/sysadmin/linux-find-command
- https://www.redhat.com/sysadmin/how-to-use-grep
- https://man7.org/linux/man-pages/man1/top.1.html
- https://www.makeuseof.com/best-network-troubleshooting-commands-linux/
- https://www.redhat.com/sysadmin/five-network-commands