{"id":183,"date":"2001-03-25T12:32:41","date_gmt":"2001-03-25T20:32:41","guid":{"rendered":"http:\/\/www.jeffcarl.com\/?p=183"},"modified":"2020-07-08T19:04:32","modified_gmt":"2020-07-09T02:04:32","slug":"the-web-server-first-aid-kit","status":"publish","type":"post","link":"https:\/\/www.jeffcarl.com\/index.php\/2001\/03\/25\/the-web-server-first-aid-kit\/","title":{"rendered":"The Web Server First Aid Kit"},"content":{"rendered":"\n<p class=\"has-medium-font-size\"><strong>By Jeffrey Carl<\/strong><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright\"><img loading=\"lazy\" decoding=\"async\" width=\"350\" height=\"109\" src=\"http:\/\/www.jeffcarl.com\/wp-content\/uploads\/2020\/04\/bwatch.gif\" alt=\"Boardwatch Magazine\" class=\"wp-image-22\"\/><figcaption>Boardwatch Magazine, March 2001<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"has-background has-light-gray-background-color\"><em>Boardwatch Magazine was the place to go for Internet Service Provider industry news, opinions and gossip for much of the 1990s. It was founded by the iconoclastic and opinionated Jack Rickard in the commercial Internet&#8217;s early days, and by the time I joined it had a niche following but an influential among ISPs, particularly for its annual ranking of Tier 1 ISPs and through the ISPcon tradeshow. Writing and speaking for Boardwatch was one of my fondest memories of the first dot-com age.<\/em><\/p>\n\n\n\n<p>It\u2019s a sad fact that most system administration learning is done in the minutes and hours after you say the words, \u201cWow. I\u2019ve never seen something get broken&nbsp;<em>that<\/em>&nbsp;way before.\u201d Learning to be a sysadmin means that you discover how to fix all the problems that pop up, until you find a problem you\u2019ve never run into before. Then you scramble to learn how to fix&nbsp;<em>that<\/em>, and you\u2019re fine until the next new Unidentified Weird Thing\u2122 happens. And so on.<\/p>\n\n\n\n<p>Fortunately, about 90 percent of Unix web\/mail\/etc. server problems can be discovered or fixed with just a few tools \u2013 much like 90 percent of all household repairs can be done with a screwdriver, a wrench or a baseball bat. Knowing just a few likely trouble spots and troubleshooting tools can help you resolve a lot of that unidentified weirdness without getting so frustrated that you want to rip the hard drives out of the computer and make refrigerator magnets out of them.<\/p>\n\n\n\n<p>The key here is that unlike cars or girlfriends, everything that goes wrong on a Unix system happens for a clearly defined reason. While that reason may sometimes be freakish or undocumented, it\u2019s almost always one of a few fairly common issues.&nbsp;<\/p>\n\n\n\n<p>So, with that in mind, we\u2019re going to take a look at the Handy Tools and the Usual Suspects \u2013 the top commands and tools to use, and common places to look that will at least shed a clue on most server problems. I\u2019m going to use FreeBSD as the example system \u2013 but most other BSDs and Linux can use the same tools, even if they act slightly differently or are located in a different place in the filesystem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Handy Tools<\/h2>\n\n\n\n<p>\u2022 When a server is responding slowly, you need to figure out whether the problem is on the server or in the network. After you\u2019ve logged in to the server and become root, your first stop should be&nbsp;<strong>uptime<\/strong>.<\/p>\n\n\n\n<p>The important part of the information it provides is the server\u2019s load averages \u2013 shown for the last one, five and 15 minutes. If the load average is high (above two or three), the most likely cause of the slowness is one or more \u201crunaway\u201d processes or some other processes extensively utilizing the system. If the load average isn\u2019t high, then you\u2019re probably looking at a networking issue that is slowing access to the server.<\/p>\n\n\n\n<p>\u2022 If you\u2019ve found that a high load average is the likely culprit, turn to&nbsp;<strong>top<\/strong>. The top command lists the server\u2019s process in order of CPU and memory utilization.<\/p>\n\n\n\n<p>By default, top shows the top 10 processes, or you can use it in the form&nbsp;top N, where&nbsp;<em>N<\/em>&nbsp;is the number of processes you wish it to show. If you have one or more \u201crunaway\u201d processes (like the tcsh process shown above \u2013 most likely from an improperly terminated login session), you can quickly identify it and issue a&nbsp;kill&nbsp;or&nbsp;kill \u20139&nbsp;(which effectively means \u201cI don\u2019t care&nbsp;<em>what<\/em>&nbsp;you think you\u2019re doing, just shut up and go away\u201d) command to the process ID number (PID) of the runaway.<\/p>\n\n\n\n<p>\u2022 For a more complete listing of the processes that are running on your computer, use&nbsp;<strong>ps<\/strong>. The&nbsp;ps \u2013auxw&nbsp;command (on BSD-based systems;&nbsp;ps \u2013ef&nbsp;on System V-based systems; the ps on most Linuxes will accept either) will show all system processes owned by all users, whether active or background.<\/p>\n\n\n\n<p>You can use this to find any active process and get its PID if you need to \u201cre-HUP\u201d or kill it. You can find processes for a single server by using ps in combination with the venerable&nbsp;<strong>grep<\/strong>, such as finding all Apache processes by using&nbsp;ps \u2013auxw | grep httpd | grep \u2013v grep. Compare the number of web processes to the server\u2019s \u201chard\u201d and \u201csoft\u201d limits (the \u201chard\u201d limit is set when Apache is compiled; the \u201csoft\u201d limit is set in the&nbsp;<em>[apache_dir]<\/em>\/conf\/httpd.conf&nbsp;file for recent versions) to the number of active processes. If those numbers are close to being equal, consider either upgrading your hardware or reconfiguring\/recompiling Apache with higher limits.<\/p>\n\n\n\n<p>\u2022 If you\u2019re worried that a user on your system is running an unauthorized program, hacking the system or otherwise foobaring things, then&nbsp;<strong>w<\/strong>&nbsp;is a simple check.<\/p>\n\n\n\n<p>The w command lists active users on the system and what they\u2019re doing. If any of them are performing unauthorized activities, simply kill that user\u2019s shell and use&nbsp;<strong>vipw<\/strong>&nbsp;to either give them a password (the second colon-separated field, immediately after the username) of \u201c*\u201d or assign them a shell (the last field of each user\u2019s line) of&nbsp;\/sbin\/nologinuntil you have sorted out the what they were doing and whether it violated your policies. A&nbsp;kill \u20139&nbsp;may be necessary for \u201cphantom\u201d or \u201czombie\u201d processes that were left running after improper logouts.<\/p>\n\n\n\n<p>\u2022 If your problem is a crashed or non-starting Apache webserver, use the built-in&nbsp;<strong>apachectl<\/strong>&nbsp;command to work out the issue. It\u2019s generally installed in the&nbsp;bin&nbsp;subdirectory of the Apache installation; if this isn\u2019t in your shell\u2019s command path, you may need to specify the full path to this command. Aside from the basic&nbsp;apachectl start&nbsp;and&nbsp;apachectl stop&nbsp;commands, one of the more useful options is the&nbsp;apachectl configtest&nbsp;command, which performs a basic evaluation of Apache\u2019s&nbsp;httpd.conf&nbsp;configuration file (where almost all options are specified for Apache 1.3.4 and later).&nbsp;<\/p>\n\n\n\n<p>Unfortunately, apachectl is notorious for providing \u201cokay\u201d readings when some configuration problems are still present (most notably when a directory specified for a virtual host is not found or not readable, which causes Apache to fail). For these situations, you\u2019ll need to consult your Apache error logs (see below). Also, apachectl consults the file&nbsp;\/var\/run\/httpd.pid&nbsp;to find its originating process; if this PID is different, the&nbsp;apachectl stop&nbsp;command won\u2019t work. In these cases, find the httpd process owned by root using ps (this will be the \u201cparent\u201d Apache process) and kill that process.<\/p>\n\n\n\n<p>\u2022 Your first tool for diagnosing whether a problem may in the server\u2019s network connection rather than on the server itself is&nbsp;<strong>ping<\/strong>. Using ping to test the connection to a server is a common test, but some problems (such as an error in duplex or settings between a server and its switch) may not show up using ping normally. If a ping to a server appears normal but you suspect a network error is involved, try using ping with larger-than-normal packet sizes. The default size of the data packet used by ping is only 56 bytes, but many errors will only show up when large ping packets (2048 bytes or greater) are used. Use the&nbsp;\u2013s&nbsp;flag with ping to specify a larger packet size (use the&nbsp;\u2013c&nbsp;option to specify the number or \u201ccount\u201d of pings to send).<\/p>\n\n\n\n<p>With large packet sizes, a longer-than-usual round-trip time is normal, but excessively long times or packet loss are good indicators that there is a network configuration problem present. Try sending large ping packets for at least a count of 50, and compare the results with a long-count ping with normal packet sizes.<\/p>\n\n\n\n<p>\u2022 If a network misconfiguration between a server and its switch (or router) is possible, then you\u2019ll want to To show the status of your server\u2019s network connections, use&nbsp;<strong>netstat -finet<\/strong>. Netstat will show you which ports are open or your server or which services are active, as well as what foreign host is connecting to the port or service in question.&nbsp;<\/p>\n\n\n\n<p>If you\u2019re concerned that your server is being attacked across the network, this will generally show up in excessive usage of the memory that the kernel has allocated to networking. To find this out, use the&nbsp;\u2013m&nbsp;(memory buffer, or \u201cmbuf\u201d) flag for netstat. If you find that normal services like httpd aren\u2019t heavily burdened but the percentage of memory allocated to networking is still high (90 percent or more), consider shutting down network services or ports that are open and may be being attacked or misused.<\/p>\n\n\n\n<p>\u2022 If a network issue is the likely cause of your problem, use&nbsp;<strong>ifconfig<\/strong>&nbsp;(the interface configuration command) to check how the NICs (Network Interface Cards) on the server are set up.<\/p>\n\n\n\n<p>You can ignore the&nbsp;lo0&nbsp;(loopback) interface; what really matters are the settings for your server\u2019s NIC(s) as specified by their driver type. These will show its IP address(es), netmask, duplex and speed, as well as which driver is in use.&nbsp;<\/p>\n\n\n\n<p>Very frequently, a server which otherwise boots up and appears fine but has a problematic or nonexistent network connection can be fixed with a check of its network interface configuration. Double-check the options set for your default ifconfig startup settings in the file&nbsp;\/etc\/rc.conf&nbsp;(at least in recent versions of FreeBSD). Frequently, a slow network connection is the result of a NIC configured for a different speed or duplex than its switch\/router port, especially when \u201cautosense\u201d options are set but fail for whatever reason. This can frequently be remedied by resetting the connection with a simple&nbsp;ifconfig down&nbsp;<em>[interface]<\/em>&nbsp;<em>[options]<\/em>&nbsp;OPTIONS&nbsp;followed by an&nbsp;ifconfig up&nbsp;<em>[interface]<\/em>&nbsp;<em>[options]<\/em>&nbsp;command.<\/p>\n\n\n\n<p>\u2022 Weird errors with files or services may sometimes be caused by a full hard drive (preventing the system from writing logfiles or other operations). Use the&nbsp;<strong>df<\/strong>&nbsp;command to show your server\u2019s mounted partitions and their available capacity.<\/p>\n\n\n\n<p>\u2022 A whole nasty horde of seemingly inexplicable problems are caused by simple issues with file permissions. In these cases, the humble&nbsp;<strong>ls<\/strong>&nbsp;command can be your best ally. Using&nbsp;ls \u2013l&nbsp;will show you the permissions settings for files in any directory. Common issues include missing \u201cx\u201d (executable) permissions on CGI scripts or applications, or directory permissions which don\u2019t allow \u201cr\u201d (reading) or \u201cx\u201d (entering).&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Usual Suspects<\/h2>\n\n\n\n<p>\u2022 When bizarre things are happening, the&nbsp;<strong>system logfiles<\/strong>&nbsp;are the first place to check. Under BSD, you\u2019ll find these in&nbsp;\/var\/log; the first place to look is&nbsp;\/var\/log\/messages., where syslog deposits all the messages that aren\u2019t specified to go into another logfile. In fact, the entire&nbsp;\/var\/log&nbsp;directory is home to the messages for different services \u2013 from telnet\/SSH or FTP logins to SMTP and POP connections to system errors and kernel messages.&nbsp;<\/p>\n\n\n\n<p>Checking these files can often provide the answers to 90 percent of \u201cI can\u2019t do&nbsp;<em>X<\/em>\u201d messages from desperate system users. Check&nbsp;\/etc\/syslog.conf&nbsp;to see where the syslog daemon is sending the errors it receives; check the config files for individual applications or services to see which logfiles they\u2019re writing to.<\/p>\n\n\n\n<p>\u2022 If the webserver won\u2019t start, but there aren\u2019t any clues elsewhere, immediately look at the&nbsp;&nbsp;<strong>webserver logfiles<\/strong>. Using Apache, these are generally located in the file&nbsp;<em>[<\/em><em>apache_dir]<\/em>\/logs\/error_log&nbsp;or something similar. Even if apachectl runs and fails while printing a simple message like \u201chttpd: could not be started\u201d (this message is the winner of the \u201cFrontPage Memorial \u2019Duh\u2019 Award for Unhelpful Error Handling\u201d three years in a row), the problem will almost certainly be logged to Apache\u2019s errors file.<\/p>\n\n\n\n<p>For problems with specific virtual hosts on a server, check wherever their logfiles are located. This is generally specified inside that domain\u2019s&nbsp;&lt;Virtual Host&gt;&nbsp;\u2026&nbsp;&lt;\/Virtual Host&gt;&nbsp;directive in the&nbsp;httpd.conf&nbsp;file. If no error logfile is specified for that virtual host, then errors will be logged to the main Apache error file.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Getting a Second Opinion on First Aid<\/h2>\n\n\n\n<p>Of course, all of the above are merely a few recommendations derived from my experience; if you have found other \u201cFirst Aid Tools\u201d or \u201cUsual Suspects\u201d that you rely on for server administration, please let me know at&nbsp;<a href=\"mailto:me@schnell.net\">me@schnell.net<\/a>&nbsp;and I\u2019ll include them in an upcoming column.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Jeffrey Carl Boardwatch Magazine was the place to go for Internet Service Provider industry news, opinions and gossip for much of the 1990s. It was founded by the iconoclastic and opinionated Jack Rickard in the commercial Internet&#8217;s early days, and by the time I joined it had a niche following but an influential among &hellip; <a href=\"https:\/\/www.jeffcarl.com\/index.php\/2001\/03\/25\/the-web-server-first-aid-kit\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">The Web Server First Aid Kit<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":22,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8,66],"tags":[],"class_list":["post-183","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-boardwatch-writing","category-tech"],"jetpack_featured_media_url":"https:\/\/www.jeffcarl.com\/wp-content\/uploads\/2020\/04\/bwatch.gif","_links":{"self":[{"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/posts\/183","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/comments?post=183"}],"version-history":[{"count":1,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/posts\/183\/revisions"}],"predecessor-version":[{"id":184,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/posts\/183\/revisions\/184"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/media\/22"}],"wp:attachment":[{"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/media?parent=183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/categories?post=183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jeffcarl.com\/index.php\/wp-json\/wp\/v2\/tags?post=183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}