How Do You Troubleshoot a Network Problem?

Note:  This posting should help you troubleshoot many different network problems (not just those described in the problem scenario below).  Possible solutions 1 through 5 are ideal for erratic nmap results (inconsistent or discrepant output). 

Problem scenario
A port seems blocked on a Linux server given the results of nmap. The host appears to be down. You know this port is not blocked by intermediate routers and/or firewalls.  You know the server is turned on.  Sometimes you get false negatives with nmap.  How do you troubleshoot a network problem with a port that seems blocked but is not truly blocked insofar as you can tell?

Solution
If there are commands in bold, the outermost (and possibly the only) quotes should be ignored when actually entering them into a command prompt.

Possible solution #1
Verify the IP addresses are really what you think.  Sometimes with routing tables, DNS issues, or NAT, an IP address can become confused by the professional herself.

Possible solution #2
n.b.  This is only relevant for Docker containers.  If you are not running Docker, skip this possible solution.  There could be a web service on the Linux server that maps the inbound port to a different port.  This separate port could be blocked by an intermediate [hardware or software] firewall.  This can happen when a web service is run in a Docker container.  A .yml file will map the listening, inbound port to another port.  This can create the problem scenario described above.  You can change the mapping of the .yml file and restart the Docker container.  Or you can open up the firewall to allow connectivity on the second port involved.

Possible solution #3
Use nmap -Pn x.x.x.x (not the -p flag and a specific port).  The short reason why this may work is that it bypasses a process called "host discovery" that is invoked with the -p flag (to test a specific port).  If the "host discovery" process fails, and it happens initially, then nmap will report that the host appears down when it is not.  Remember that nmap -Pn x.x.x.x only scans 1000 ports. There are 65535 TCP ports total. To scan every port, try this: nmap -p 1-65535 x.x.x.x

For more details, see the * below.

Possible solution #4
Use sudo nmap -p 55 (where 55 is the port you are testing).  When nmap is run with a non-root user without the "sudo " in front, two ports are tested in the host discovery phase.  This is an initial behind the scenes process.  If both of these ports (80 and 443) are blocked, nmap will report failure.  With "sudo " in front, four ports are used.  You have more chances of getting past the initial host discovery phase with a leading "sudo " invocation.**
You may get output such as this:
Starting Nmap 7.40 ( https://nmap.org ) at 2018-04-10 10:56 UTC
Note: Host seems down. If it is really up, but blocking our ping probes, try -Pn
Nmap done: 1 IP address (0 hosts up) scanned in 3.04 seconds

The above problem can be circumvented if you use the sudo command before nmap.

Possible solution #5
Use the "-d" flag at the end of the nmap command.  The nmap utility has many processes that happen behind the scenes.  To see the print out of these processes verbosely in real time, add the -d flag like this:
nmap -p 22 x.x.x.x -d

Possible solution #6
Try ping and traceroute.  These commands may reveal something new to help you find the problem.  This is a "back to basics" approach that can be overlooked by intermediate level (not advanced) network engineers.  If traceroute provides only "* * *" as output, try the traceroute command with "sudo " in front and the " -I" flag after the traceroute command but before the IP address like this:  sudo traceroute -I x.x.x.x You may also want to see this posting for details about TTL packets.

Possible solution #7
If you are using Windows, use tracert from a DOS prompt.  tracert works like traceroute.  You can use test-netconnection in newer versions of PowerShell.  This command works like nmap with different options to use when you run it.  If you are using PowerShell version 3 (which does not have test-netconnection), you can see this posting.  You never know what Windows hosts may reveal.  These options may help. You may also want to see this posting for details about TTL packets.

Possible solution #8
You may want to install Cacti on a Linux server in your network.  If your network is flooded with packets from a malware infection or a DDoS attack, Cacti may help show you the congestion graphically and isolate the source.  Collisions in a network can degrade network performance.  Packets can flood the network from a malfunctioning device.  For a Windows environment, you may want to view this posting.

Possible solution #9
Be aware of operating system firewalls, intermediate firewalls, and intermediate intrusion prevention systems with active rules to drop packets or reject connections from certain IP addresses and not others.  Be aware of internal IP addresses and external IP addresses.  To find an internal IP address on a Linux server, run ip addr show.  To find an external IP address on a Linux server, run curl ipinfo.io (assuming the server has access to the internet).  To find an internal IP address for a Windows server, run ipconfig from a DOS or PowerShell prompt.  To find an external IP address on a Windows server, open PowerShell and run curl ipinfo.io.  Consider that an IP address on a given server may be the one used for successful connectivity that was not the one you were expecting. To find out if there is a firewall running on Linux, see this posting.

Possible solution #10
The port might truly be blocked, but you have forgotten about some silent coworker implementing something new (e.g., applying updates to a router's flash memory or implementing a new ACL).  Was there an email that you received with the subject "Maintenance window"? Network Access Control Lists are part of AWS.

Possible solution #11
You might be on the wrong server (or a different server from the one you thought you were on).  False positives from networking utilities could be the result of being on the wrong server.  Sometimes DHCP in a highly automated or dynamic environment can contribute to what seems to be a false positive. 

Possible solution #12
Can you look at the routers and switches involved?  Do you see flashing amber or solid red lights on the network interface ports?  Did a network cable get unplugged?  Is there an electronic device that operates on the same frequency as the wireless routers?  There is a story about a network outage that happened every weekday at lunch time.  The kitchen was physically positioned between the workstations using the wireless network and the WiFi routers, the frequency of the microwaves being on would interfere with the wireless connections.   

Possible solution #13
Could the network be using IP v6 and you did not know about it?  Could the networking team have changed the routing protocols?  Was an OS patch recently released to the servers involved in your network riddle? 

Possible solution #14
If you are having trouble isolating a network problem, install tcpdump on the server that is not receiving connections properly and consistently.  You may want to run "tcpdump" on this host that is not being as responsive as you like.  tcpdump can make pings work that otherwise would not.  These threads are examples of that:
https://unix.stackexchange.com/questions/65872/ping-receives-no-packets-but-tcpdump-can-see-them-coming-in
https://security.stackexchange.com/questions/124394/nmap-says-host-down-when-host-is-up

Possible solution #15
Use the netstat command.  You can use the "-anlp" flags to produce relevant details.  You can use "grep" to find only network activity that is associated with a certain string pattern.  For example, if you can guess what port should be active, and you are running on Linux, run a command like this with one: sudo netstat -anlp | grep 8080 You should replace "8080" with the port you surmise there is activity on.  If you see no results, then that port is not active.

Possible solution #16
If your problem is that you cannot reach the internet at all with an available WiFi connection, see this link.

Possible solution #17
If you do not want to use or install nmap, and you want to test connectivity over a specific port, do the following.

If you have can configure two terminal sessions, this can work. If you cannot have more than one, use the screen command before you start the first nc command (below) so you can simulate a second terminal afterward. If you need directions installing screen, see this posting. Install the nc utility. On CentOS/RHEL/Fedora you would run sudo yum -y install nc to install it. Once nc is installed, run this command: nc -l -p 9999 > /tmp/contint.txt

From a second terminal run these two commands:
date > /tmp/orig.txt
nc 127.0.0.1 9999 < /tmp/orig.txt

Now check your work from either terminal: cat /tmp/contint.txt


*  It is possible to have a situation like this:
Ping from server A to server B works.  
Ping from server C to server B works.
From server C, this works:  nmap -p 22 serverB
From server B, this works:  nmap -p 22 serverB
From server A, this does not work: nmap -p 22 serverB
From server A, this worksnmap -Pn serverB
This irregularity (or anamoly) seems elusive.  What is different about using nmap to test one specific port?

The -Pn flag bypasses host discovery.  This option with nmap can help you understand why there appears to be an inconsistency.

Host discovery is a process that is part of most nmap commands (depending on which flags you use) in nmap's initial stages of running.  If you are not using "sudo" before the nmap commands, or running the commands as root, the host discovery process is limited to two ports (80 and 443).   

This was taken from the nmap 7.1 man page:  "If no host discovery options are given, Nmap sends an ICMP echo request, a TCP SYN packet to port 443, a TCP ACK packet to port 80, and an ICMP timestamp request." 

** Using "nmap -p 22" (or a different port number) will be different from "nmap -Pn" or "sudo nmap -p 22".  If you are using "-Pn" flag will bypass the host discovery stage altogether.  With sudo before the nmap command (or running nmap as the root user), the host discovery process happens, but it happens differently.  With a sudoer running nmap, the host discovery process uses four ports and not just two.  Thus the scan process (or nmap run) will continue and not appear blocked in the initial stage of host discovery if the extra ports allow for reachability.

Leave a comment

Your email address will not be published. Required fields are marked *