appgate security server troubleshooting 101

13
Updated 30-03-2016 – Version 1.0 AppGate Security Server Troubleshooting 101

Upload: others

Post on 23-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AppGate Security Server Troubleshooting 101

Updated 30-03-2016 – Version 1.0

AppGate Security Server

Troubleshooting 101

Page 2: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

2

Before you begin – prerequisite ............................................................................. 3

Overview ................................................................................................................... 3

Required skills and audience .................................................................................. 3

Required system access .......................................................................................... 4

AGS System components ........................................................................................ 4

Server management interface (iLOM/iDRAC/iRMC) ................................................... 4

How to handle a reported issue efficiently ........................................................... 5

How to work on a reported issue ........................................................................... 6

Surround the reported issue .................................................................................... 7

Surround the issue ............................................................................................................. 7Base-line client side .......................................................................................................... 7User account to replicate issue ....................................................................................... 7Use the right tools in the right way .................................................................................. 7

Troubleshooting the issue ........................................................................................ 8

Network issues on client side ........................................................................................... 8Network issues on AppGate Server ................................................................................ 9Client-server behavior issue ............................................................................................. 10

Escalate the identified issue ................................................................................... 11

Typical resources for troubleshooting .................................................................... 12

Documentation ................................................................................................................. 12Knowledgebase and Cryptzone Support Website ....................................................... 12Search engines and man pages .................................................................................... 12

Software tools used for isolating issues .................................................................. 13

Site info ............................................................................................................................... 13Logcat ................................................................................................................................ 13Snoop and wire-shark ....................................................................................................... 13

Page 3: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

3

Before you begin – prerequisite

Anger is an acid that can do more harm to the vessel in which it is stored than to anything on which it is poured.

[Mark Twain].

Overview Troubleshoot 101 is a guide to handle incidences efficiently by AppGate Security Server (AGS) Administrators. It contains troubleshooting steps to resolve AppGate Security Server issues by identifying faults isolate symptoms, restore the system to operation and/or to communicate higher support level.

Required skills and audience The guide is intended for the helpdesk staff trying to resolve a suspected AGS issue. The expected skills for the person to handle issues are preferably:

• Administration knowledge of the AGS and how it is integrated in the environment • Strong knowledge and administrative experience on Unix and Windows • Good understanding of overall computing environment (Network, Operating Systems, Client

Platforms, Firewalls) • Knowledge of packet “sniffer” and protocols capture utilities (snoop, tcpdump, wireshark)

Page 4: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

4

Required system access AGS integrates with different aspects in the IT infrastructure. You might come across issues where access to:

• End point system configuration/logs • Firewall configuration/logs • LDAP/Active Directory configuration/logs • Radius configuration/logs

.

AGS System components Dividing the entire AGS system into components helps to isolate and identify a potential cause for the system issue as depicted below.

1. Client's network 2. AppGate attached networks 3. AppGate operation - client and-or server

Basically you can divide the areas in three parts:

Server management interface (iLOM/iDRAC/iRMC) Assure you have set up and access to the physical appliances through onboard management platform. You will be dependent on it during maintenance but also while troubleshooting. This is your lifesaver when the appliance is in a remote data center—but in any case, always, always, always (always) attaches the appliance to the management network. In the worst scenario you will need to be in person by the appliance and bring a VGA screen and a USB keyboard. If you running virtual, you should assure you have access to the console for the same reasons as above.

Page 5: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

5

How to handle a reported issue efficiently A reported issue can be divided in to two types: (1) system down and (2) degraded system operation. A system down is usually:

• No user can login, • There might still be users in the AGS system, but no new user can login • AppGate system has faulted/broke down and does not boot (usually disk/hw or disk/virtual memory

full)

A degraded System is usually: • Some features on the AGS are not working as expected, such as a daemon failing to start • Some users cannot login while others can • A node in a multi server AGS cluster is down, put the other still operational • Experiencing temporary network issues such that connections to account sources and/or

authentication methods time out. In most of the cases you will encounter type (2), and will work to put back the AGS to fully operation. As Figure 1 depicts, you will need to have an emergency recovery plan ready. This includes backup of the configuration and also backup of HW or failover tolerance (clustered system). Emergency plans are up to customer business, Cryptzone will help to the extent of the SLA in such case – so only a proper ER plan reduces the downtime of your system .

Figure 1Action plan dealing with issues

Page 6: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

6

How to work on a reported issue In most cases, the responsible operation support, also known as the System Administrator, will start to surround the possible issue, try to find a way to resolve the issue or to find a workaround to keep production up. Anyhow, the person is required to work in an organized and analytical manner to achieve the task to assure production is not affected.

Figure 2 Working an issue

As depicted in a basic workflow how to handle the issue in Figure 2, you can see there are tasks to be performed by the administrator but also it requires a suitable level of information in case you cannot resolve or workaround the issue. In this case you must escalate the issue to 2nd level support, your supervisor and any administrators who might need to be involved to help fix the issue (such as network, AD, firewall etc.).

Page 7: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

7

Surround the reported issue If it is not obvious where the problem lies, get as much information as possible about the reported issue.

• Who and what role has the reporter (end user vs admin) • Find out if there have or are ongoing changes in the environment:

o Network changes o Application changes o AD or DNS changes o Upgrades on client side, either software or driver updates

Summarize the observed issue in a few sentences, describe how it should be and give the issue a meaningful title. Ask if you cannot fully understand the problem or if there are logical gaps preventing you to establish a summary.

Surround the issue Establish a way to surround the issue, e.g. which of the areas to start with:

• Server side: o Network issue o AppGate Server issue o DNS or directory/account source issue o Service or Data issue (the resource behind AppGate)

• Client side: o Network issue o Client related issue o OS related such as installed software (AV/HIDS) and OS.

Base-line client side When client-side is involved, always base-line it:

• Reboot the client's operating system. • Version of client software must much match AppGate server version • If possible reboot network equipment on client side

User account to replicate issue If the issue is related to a certain user account, try to replicate the issue with a new user account resembling that user account as much as possible. This will help you to (1) work and test independently with an AppGate end user account who otherwise is at a remote location. As well, it allows you to (2) use different techniques with that account without tempering and interfering the user/group who is experiencing the issue, and (3) you establish a test case or prove that it is not possible which means the problem is related to that specific user account.

Use the right tools in the right way It’s easy to believe a tool will show you where the problem is; but this is not always the case. Make sure to choose the tools wisely; sometimes choosing ‘a’ tool is just taking to much time where you should have been looking somewhere else:

1. Think about what the goal is with your trouble-shooting step, e.g. prove or disprove an assumption. 2. Think how you can reach that goal. 3. If necessary choose the appropriate tool which helps you reach that goal.

Note: Troubleshooting always gives you two choices: prove or disprove an assumption. Sometimes you do not have a choice, and sometimes you have to do both. It depends on the importance and how you can find a resolution or workaround. Just keep in mind “you” are the one in charge and you will need to choose appropriately.

Page 8: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

8

Troubleshooting the issue So you have an issue? Now, we assume you are aware of everything written before this chapter, and have assured with your supervisor that you have:

• The right skills • The right privileges • Know who to contact/reach system admins of the depended systems • Have agreed on a internal process

o and specifically have agreed on an emergency plan • Assured the management access (or console access) to the appliances is configured • Know how to base line, gather info, work the issue and use of tools.

That’s great. Then we can start to troubleshoot. You are in!

Network issues on client side Whatever prevents the AppGate client from communicating with the server, you have to find it on the client's machine or client's environment. Possible starting points or checks to perform:

• Connected to any network? • Mobile network: weak signal (UMTS/3G/4G/5G), high packet loss? • Misbehaving network adapters (HW/SW). Look into the logs on OS and network equipment. • Routing issue between several network adaptors. • Connected to network, but network has issues such as no route, high latency, unreliable. Proxy

settings, which are interfering. • DNS is not working – On Windows check per adapter settings and search base. • Firewall on the client machine, which blocks outgoing traffic. • Antivirus/Malware detection software on the client machine. • Malware/virus on the client machine. • Other service program occupy port(s) on all IP addresses, such as skype or MS local http server (Port

Forward Mode) • Other network adapter/vpn products changing routing and network configurations. • LAN/WAN blocking traffic.

Sometimes the above list cannot be conclusive or you have instinctively doubts and want to identify whether it's machine or network quickly. A fast way to identify something is wrong with the HW/SW/network on client side is to do the exclude tests:

• Exclude current network: Connect from another network from the same machine. • Exclude client machine: Connect from another machine in the same network. • Exclude both, client machine and network: Connect from another network and another machine.

On some occasions you might also want to to find out if another version of the client software works which an either be

• Newer client software • Older client software • Installable client vs. Java Webstart client (comparing built in Java to OS installed Java) • Use a native SSH client/putty

When it comes to SSL frontend, verify:

• Login with a different browser. • Check the proxy settings in the browser, and the operating system.

Page 9: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

9

Network issues on AppGate Server This is a systematic checklist bottom up.

• Check that you have link connection between interfaces (up/down) and switch • Check link settings, FD/HD on both ends for mismatch • On command line run: dladm show-phys study the result and verify with your cabling and network

setup that this is makes sense. • Check MTU settings in interfaces dladm show-link • Check if there are errors on the wire

kstat <if:nr>|grep -i errors • Use dmesg to see if any hardware related errors are present (network card) • Start to ping or telnet from the appgate server torwards a resource

o If the results are not conclusive, run a snoop in parallel to understand where the traffic disappears.

o Find routing issues: Do a traffic capture at the resource and ping from the AppGate server, to see if the traffic comes in and where the traffic is send back to.

o If there is a Firewall or Router in between, check the states and logs of those or run packet capture on it.

• Use ARP table and dmesg to rule out IP conflicts If you found issues from the above (or need to broaden the possibilities), then:

• Check ARP table against known IP addresses (ping them, then check table). This is useful for identifying IP duplicates.

• Use snoop to identify where traffic does/not appear and from where an answer is coming back. • Check the routing table on the appgate server. Is the default GW present?

o Use route get <ip-address> to verify the packets will destined correctly on the defined interface

• In seldom cases you might check the ipfilter tables with ipmon –ion Common issue related to network or thought to be network related

§ DNS is slow or timing out: verify the DNS server is responsive, ping it from the appgate server. Check the logs for error messages regarding slow DNS.

§ LDAP look up slow or timing out. Ping the LDAP and verify how responsive server is. Check the logs for indications of slow LDAP connection/time-outs.

§ LDAP cannot bind: The bind-user, which used by the AppGate server, is either blocked or the password of the bind-user has changed or must be changed.

Page 10: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

10

Client-server behavior issue You will end up in this area when:

• There is a client communication with the server • The client will throw an error • The login with client does work or fails, or both in that order. • The services do work or not, or both in that order.

Reproduce the issue to analyze what is working or not working: • Baseline the client:

o Install the same version of the client software as on server o Reboot the client operating system o Webstart client: Enable Java Debug console in Java settings, clear the Java cache

• Prepare the client for a client-debug log, see Documentation • Repeat the problem. Isolate this session on the appgate server logs and examine both, the server log

and the client debug bug, look for clues end elaborate further actions. • Maybe you need to throttle up debug level for a daemon (to level 4) on the server to get more

information o ag_userd: issues related with user authentication o ag_webproxy: issues related to web-access components o ag_rdpproxy: issues related with the rdp component. o ag_filesysd: issues related with the file access component. o There are many more, please check the chapter ‘Documentation and Knowledgebase’.

• Also iptunnel or hosts-file writer related issues can be debugged by looking closer into what these are reporting. You find the how-to on Documentation and Knowledgebase.

• If the client is the SSL frontend, refer to the troubleshooting section ‘Getting debug files from the web proxy or SSL gateway’ in the on Documentation.

• Follow the signal: Even you did check there is no network issue on client or server, it is time to verify to test between client and resource:

o If you are using iptunneling: § check the interface is up and configured with IP (and DNS on Windows) on client

ipconfig /a or ifconfig § check the routing table with netstat –rn § Ping telnet or tcp-connect to the resource IPName:port and IPAddress:port

o If you are using port-forwarding: § Check with netstat –an if you find the 127.0.0.x:Port listener and check against the

client-debug log. § Check the hosts file and check if the host entries where added for 127.0.0.x and

compare against the client debug log. § Note: tcp-connect/telnet against a local port forward will only prove if the local

forward is accepting new connections or not. It will not prove that the traffic is forwarded.

If you can not do anything with the information gathered from above then:

• Read the chapter: Error! Reference source not found. • Move lateral:

o Use another client machine o Use the other client, Java Webstart vs. installable client o Use another user with same properties to start a session o Explain the issue to one of your colleagues, or finally escalate.

Page 11: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

11

Escalate the identified issue Either you are reaching for higher-level support or internal help to get an issue resolved. In such situations it is crucial to provide as much as possible information to the next level. This implies, again, you have been working in a structured manner from the beginning. Now it’s time to summarize your findings and provide it so next level support can work efficiently together with you on the case; it is key to work “together” on the case.

• Summarise the issue in a few sentences what is happening and also describe how it should work. • Give it the issue a meaningful title describing the issue in a condensed form. • Put your analysis together from the information gathered during troubleshooting. • Provide the version of the OS, AppGate client and AppGate Server version. • On top of all information:

o Always include a “site_info”(see section Site info) of all nodes in the cluster. o Provide the logfile, copied from the server as is, of that day the issue is present.

• When escalating to Cryptzone or a reseller of Cryptzone, assure you have a valid support contract

Page 12: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

12

Typical resources for troubleshooting Documentation This is what you need to do, read the manual and particularly when an issue arises, the section ‘Troubleshooting and System Recovery’:

• Server Manual on https://www.cryptzone.com/downloadcenter/appgate The table of contents is:

• Troubleshooting an unresponsive system o Baseline Testing

• Troubleshooting a cluster o Secmsg checks

• Reset System to factory defaults o The GRUB menu o Factory default shell

• How-to o Getting debug files from the web proxy or SSL gateway o Provide a site_info o Capture debug output from the AppGate Client o Capture debug output from the IP Tunneling Driver o Getting debug files from the RDP proxy

You should bookmark the current manual. It is also available directly from the AppGate console.

Knowledgebase and Cryptzone Support Website The knowledge base is the reflection of what is common practice, contains use cases, announcements, compatibility reference and known issues with workarounds. Use it when you cannot answer or proceed in a incident; you might very well find information to progress quickly in your troubleshooting. Also, when you need to escalate an identified issue, you will use the website:

• https://cryptzone.force.com/support/pkb_home Under the knowledge base you will find a page called “AppGate – Software Compatibility requirements”. Check this page to find out that your setup of client; OS, protocols etc. and so on are compatible with the AppGate system.

• https://cryptzone.force.com/support/articles/Customer/AppGate-software-requirements You should bookmark the knowledge center for your daily work with AppGate.

Search engines and man pages AppGate sits in the middle between end user and resource. Quite often it turns out an issue is not related to the AppGate system but somewhere before the traffic enters or after the traffic egresses of the system. Search engines do a great job to troubleshoot relevant information regarding other commercial systems and products, which sit behind the AppGate; same goes for the client machines which sits in front of the AppGate (Windows, Linux OSX etc). Man pages might be difficult to make use of, but it is the pure truth. In some cases you might come across the situation where there is no search-engine result good enough. So, don’t be scared of man pages: take your time and do exactly what they say.

Page 13: AppGate Security Server Troubleshooting 101

AppGate Security Server Troubleshooting 101, Version 1.0

13

Software tools used for isolating issues Site info The AppGate security server has a built-in command to gather information about the system.

1. Start a Terminal (in the AppGate Console, start Run commands->Terminal ). 2. Become root (type "su" and enter the root password). 3. Type siteinfo in the Terminal. The system will now collect information. When this is done, it will ask

you whether to send the result to AppGate support or instruct you how to retrieve the result. Do not send the result, the functionality is broken.

4. You can now analyse the generated text file with ‘more’ ‘vi’ or ‘nano’. Alternatevely Use the ‘File Transfer’ under System Settings in the console to copy the file to your desktop.

Logcat The AppGate log files are stored in a binary format. To convert the files to a human-readable format, the tool logcat is provided. logcat can also be used to convert the log files to formats which are more suitable as input for other programs. The logcat program is a command line tool and runs on the appgate appliance; you can use it from a command prompt on the server. The logs can be viewed using the command-line interface, as outlined below. A number of options can be given to logcat to make it select only those entries, which match the given options. Selectable items are: events, severity, session IDs, time, user, host, authentication used, and which process the entry originates from. It is also possible to view only the beginning or end of the log file, and to view new entries as they appear.

logcat [options] -l /appgate/log/appgate.log* -e [expressions]

A complete description of all options can be found in the manual or the help of logcat (logcat -h). Note: to be able to transport the logfile and read it in a text editor, you would need it first to make it a text file:

logcat > /tmp/logfile.txt or for a specific file

logcat –l /appgate/log/appgate.log.YYYMMDD > /tmp/logfile.YYYYMMDD.txt

The text file can be very large, so zip it.

Snoop and wire-shark This is one of your skills, and that is why you are using it: The traffic capture program on OpenSolaris (The operating System on the AppGate system) is called snoop, and produces a pcap format file. Here is an article about how to use it with example:

• https://cryptzone.force.com/support/articles/Customer/Snooping-traffic-on-the-appgate-server/ You should also capture traffic in other points of the network (anyway in certain cases). Wireshark or tcpdump are very good tools. Use wireshark to analyse the traffic captured from either snoop or tcpdump.