|
Reference |
|
This document details common practices in building High-End Enterprise Internet DMZ's, engineered for security, scalability, and fault tolerance. This document presumes a basic understanding of system infrastructure, including Firewalls and the concept of a network DMZ.
Copyright March 2003, Brandon Gillespie
Topics:
Monitoring of all levels is critical to maintaining a highly available DMZ. Monitoring should include system security, utilization and capacity, service availability and environmental stability. Processes should exist to properly alert and notify the necessary parties in a timely manner.
Fault Tolerance is the ability to gracefully handle in as short a time as possible both planned and unplanned faults in hardware, software and even the environment of the servers. This includes building redundant systems as well as having an appropriate level of monitoring and a decent backup and restore policy.
Clustering is simply the act of grouping multiple servers in a manner where they can takeover from another server if one fails, and/or where they distribute load. High Availability and Load Balancing are the two facets of Clustering.
Clustering a group of servers for High Availability has been a common practice for some time. There are three ways to do this.
The first is to setup a standby system (also known as a Cold Cluster, or to be confusing a Hot Standby), where two identical systems exist with one system providing all the services live, and the other system sitting idle in a standby state, waiting to takeover if the primary system fails. This is the least preferred method of High Availability clustering, because it does not fully utilize all available resources. However, sometimes this is the easiest way to implement a cluster for some services, such as Databases, where only one instance can exist at a time.
The other way to cluster is to have multiple systems all actively providing the service. This type of cluster is also known as a Hot Cluster. If the service being clustered can tolerate multiple discrete servers (such as HTTP) this is a preferred mechanism, because it also allows for Load Balancing.
Another approach, often used in parallel to having greater capacity, is to build in load balancing, where multiple smaller servers are clustered and the load is divided among them. This allows for scaling upwards as needed, without having to have the extra capacity sitting idle.
Load Balancing also provides a level of Fault Tolerance, because each individual system is a mirror of the other. This allows for one node in a load balanced group to fail, without effecting the overall service being provided (the additional load is just distributed among the other nodes).
Load Balancing is also highly useful for doing system maintenance, for the same reasons (a single system can be removed from the service, without effecting the availability of the service).
One drawback of Load Balancing is keeping each node as an exact mirror. How the nodes are managed and mirrored should always be addressed as part of the implementation of a Load Balancing scenario.
Having a defined backup and recovery process is vital to any infrastructure, let alone an enterprise deployment. However, it is often overlooked when considering all other issues. Backups should be performed at a minimum every day. A retention policy should exist, defined by the organization owning the services. Recommendation is a minimum of 1 week retention, preferably having multiple weeks of retention.
Periodically (every few months, no less than once a year) backups should be run through a test recovery scenario, to verify they are working properly.
Backups must be rotated offsite at a regular interval, to handle the possibility of complete site failure (such as a fire). Without an offsite backup process the risk exists of a complete loss of all data.
Archiving of data is different from Backing up data. Data is backed up to handle failures and disasters, but not to archive points in time. If the interest exists to recover information from random points in time, a separate Data Archiving process should be created separate from the Backup Process.
For an internet services DMZ, it is recommended to subdivide it into two zones: Public and a Private. The Public Zone is where all servers are placed that directly communicate with the internet. These include web servers, ftp servers and any edge-service server. The Private Zone is where all servers are placed that provide a secondary role. These include web application servers and Database servers. The Local network is commonly a factor in this equation, and is generally the common corporate network, where additional fulfillment and accounting systems may reside.
Inbound communications to the Public Zone is traditionally opened with very few limits on a service-by-service basis (for instance, all web traffic). Outbound communications from the Public Zone should be severely limited to only the remote hosts required, such as partners. This is to reduce the event of an indirect connection, where an application on a public server is coerced into opening a connection to an unauthorized server, providing administrative access on services normally blocked by the firewall (this is also known as a reverse connection).
Communication between the Public Zone and the Local Network should be restricted in the same manner as the Internet. Connections should only be allowed unrestricted inbound, and should be relayed through the Private Zone for sending information to Local Network systems. This is to avoid the crunchy edge soft center syndrome, where a public server protected by a firewall is compromised, and then has full access to all other corporate servers which would not normally be accessible from the Internet. By forcing connections through the Private Zone, it likely changes protocols (requiring a different venue of attack) and adds another step to the process, slowing the intruder down.
Inbound connections to the Private Zone should be limited on an as-needed service-by-service basis. For instance, access to a Web Application Server's communication port should be limited to the systems which need to access it, such as the Internet Web servers and possibly maintenance and monitoring systems on the Local Network. Equally, access to the Database ports should be restricted in the same manner, including connections originating from the Local Network. This reduces not only attacks from the LAN, but can help reduce failure because of worms and similar trojans which are more frequently crippling a Local network. Internet users are less tolerant of downtime, and internet servers traditionally have fewer maintenance windows than corporate servers.
It is easiest to implement a DMZ with one firewall on the front, to have multiple networks behind it (one Private, one Public, one Local Network). However, this is not a very secure design, because it is also one point of failure. If the front-end firewall is compromised, all zones are opened. It is recommended to have two separate firewalls, one which handles traffic on the internet edge allowing and regulating access into the Public Zone, and one which sits behind the Public Zone firewall and bridges the Public Zone, the Private Zone and the Local Network.
The secondary firewall is an added layer of protection, if the front edge firewall is compromised, at worst only the public servers are exposed, and they should still be the most hardened from the operating system level.
In addition, it is recommended that the second layer firewall be a different product from the front-end firewall. This is to avoid the firewall falling to the same vulnerability. If an exploit is found which can compromise the front-end firewall, it can likely compromise the back-end firewall as well, if they are the same vendor. With this in mind there really is not much value having the two firewalls be from the same vendor.
Security Sensors should be placed in front and behind of each firewall, monitoring all network traffic. Sensors should do both traffic analysis and signature based intrusion detection. Traffic analysis is important to have alongside signature based detection because when new exploits are found, the signatures are not up to date, and will not catch it. However, if the exploit is causing enough traffic on its own (such as some of the notorious worms), it will appear on the traffic analysis.
It is also paramount to have an automated update of signatures. If signatures get too old (more than a month or so) the value of the signature based intrusion detection is lessened.
Host Security involves hardening the Host operating system by removing unnecessary services, adjusting the network stack and by configuring applications in a manner that reveals less information and allows for less on-the-fly changes. Most servers ship by default with a fairly open configuration. This allows for easy deployment by administrators who may or may not know a lot about the server and/or applications being provided. For the general case this works well. It also leaves open a lot of venues for attack. Even though these open services may be blocked by a Firewall, they should still be disabled because they can be indirectly exploited, through a service which is allowed through the firewall.
Generally speaking, all hosts running in an Internet DMZ should follow these guidelines:
Applications providing services to the Internet should be hardened and constantly maintained at the latest recommended service patch. The application providing the service is the most vulnerable point for site violation, and should be given the most scrutiny. In the case where vulnerabilities exist but cannot be resolved, workarounds should be put in place to deny general access to the vulnerability.
| Copyright © 2004, Protos LLC |