Anti-pollution Engine System

Table of Contents

Introduction

Installation

Directories

Source Code Specification

Configuration File

Log Files

Introduction

The Anti-pollution Engine(AE) system is an add-on program for the Squid Web cache to counterattack cache-target DoS attacks. It detects pollution objects or attackers' IP addresses and then blocks them.

The system consists of two major parts: the AE and the AE Interface(AEI). The core part of the AE is called AE Daemon. The AE Daemon communicates with the AEI through a pair of pipes. It operates in the blocking mode for both pipe reading and writing, while AEI operates in the non-blocking mode. AEI intercepts access information such as the client IP, requested URL, object size, reference count from Squid, and sends it to the AE Daemon. If pollution objects are detected, the AE Daemon issues "block-entry" commands to the AEI through the pipe. If IP addresses of attackers are detected, the AE Daemon issues "block-client" commands to the AEI. Upon receiving "block-entry" or "block-client" commands, the AEI executes corresponding operations to counter the attack.

The AE Daemon spawns one or more Triggered Modules at startup and sends signals to Triggered Modules whenever necessary. A Triggered Module is designed to execute a time consuming detection algorithm, such as the SKETCH. Upon receiving a signal from the AE Daemon, a Triggered Module reads data from a file generated by the AE Daemon, runs the detection algorithm and outputs the result to another file. When exiting, it sends a signal to the AE Daemon which in turn reads the output file and performs corresponding operations. Lightweight online detection modules such as the PCSA, are embedded directly in the AE Daemon so that they can work efficiently.

 
Structure of the AE system

In this version, the AE Daemon is implemented as a process spawned by Squid at startup and the AEI is a module of which the codes are embedded directly in several source files of Squid. The pair of pipes between AE Daemon and AEI could be easily replaced with a TCP/UDP socket, such that the AE could be run on a separate machine.

Installation

1. Download the source code tarballs of Squid and the AE and the installation script.

2. Install Squid and the AE.

Login as "root". Put the tarballs and the installation script in the same directory and run the installation script by typing:

. install.sh

If you want to customize the installation, you can modify the installation script. There are plenty of comments in the script so that it should be easy to understand what the script does.

3. Configure Squid.

Configure Squid by customizing the configuration file /usr/local/squid/etc/squid.conf. Set the effective user of Squid to "squid" by adding the following line in the configuration file:

cache_effective_user squid

For customization of the rest part of the configuration file, please refer to the configuration guide provided by http://www.squid-cache.org/.

4. Initialize the cache directory of Squid by typing:

/usr/local/squid/sbin/squid -z

5. Run Squid.

Now, you can run Squid by typing:

/usr/local/squid/sbin/squid

If you want to run squid in debugging mode to check whether it works correctly, type:

/usr/local/squid/sbin/squid -d <level>

Directories

/usr/local/squid/sbin/

Binary executables of the AE and Squid are located in this directory.

/usr/local/squid/etc/

The configuration files of the AE and Squid are located in this directory.

/usr/local/squid/var/logs

Log files of the AE and Squid are located in this directory.

/usr/local/squid/var/AE_data

Some data files used by AE are put here.

Source Code Specification

Major Source File List

Here is a list of the major source files of the AE system:

Source file
Description
The revised Squid source files with the AEI codes embedded. All embedded codes are embraced by a specific format of comments.
Data types and function declarations of the AEI module.
Data types and function declarations related to IPC (Inter-process Communication) between the AEI and the AE Daemon.
The header file of the AE Daemon.
Main program and function implementations of the AE Daemon.
The header file of the AE specialized PCSA module.
Function implementations of the AE specialized PCSA module.
The header file of the general purpose PCSA module.
Function implementations of the general purpose PCSA module.
Data types and function declarations related to PCSA records, which are stored in a hash table.
Implementations of functions related to PCSA records, which are stored in a hash table.
A hash table template.
Non-template functions used in the hash table implementation.
Source files of the SKETCH module. Sketch/IDS_rev32.cpp is the main program.
The header file of the AEM (AE Monitor) tool. (AEM is a debugging tool.)
Implementation of the AEM tool.

AE Interface

Summary

AEI is a module directly embedded in Squid. The embedded codes are distributed in three Squid source files: main.c, client_side.c and store.c. AEI serves as an interface between Squid and the AE Daemon. It communicates with the AE Daemon through a pair of pipes. It intercepts access information such as the client IP, requested URL, object size, reference count from Squid and sends it to the AE Daemon. Upon receiving the feedback from the AE Daemon, AEI performs proper operations to counter the pollution attacks, e.g., remove pollution objects from the cache and block them.

Data types and function delcarations of the AEI module are located in AEI.h. Function implementations and other embedded codes of the AEI module are located in main.c, client_side.c and store.c. All embedded codes in these three files are inserted in the following format so that they are easy to be located:

// ************* Begin *************
// IPC_AE module code

<embedded codes>

// ************* End ***************

Specification of Major Functions

Initialize_AE_Interface()

The Initialize_AE_Interface() function is called when Squid starts up. It performs the following operations:

  1. Open the log file of the AEI ("IPC_AE.log").
  2. Spawn the AE Daemon process.
  3. Open a pair of pipes and set non-blocking mode for both pipe reading and writing.

Definition of Initialize_AE_Interface() is located in "main.c" and it is called in "main.c".

Close_AE_Interface()

The Close_AE_Interface() function is called when Squid exits. It performs the following operations:

  1. Terminate the AE Daemon process.
  2. Close the pipes.
  3. Close Log files.

Definition of Close_AE_Interface() is located in "main.c" and it is called in "main.c".

AEI_DB_Init()

The AEI_DB_Init() function is called right after Initiaize_AE_Interface() being called. It initializes two hash tables used to keep track of blocked entries and blocked clients ("blocked_entry_table" and "blocked_client_table").

Definition of AEI_DB_Init() is located in "client_side.c" and it is called in "main.c".

AEI_DB_FreeMemory()

The AEI_DB_FreeMemory() function is called just before Close_AE_Interface() being called. It de-initializes two hash tables used to keep track of blocked entries and blocked clients ("blocked_entry_table" and "blocked_client_table").

Definition of AEI_DB_FreeMemory() is located in "client_side.c" and it is called in "main.c"

ProcessAccessInfo()

The ProcessAccessInfo() function is called each time a http request has been processed by Squid. It performs the following operations:

  1. Collect useful access information from the intercepted data of Squid.
  2. Perform some preprocess on the access information.
  3. Send the access information to the AE Daemon by calling SendAccessInfo().

Definition of ProcessAccessInfo() is located in "client_side.c" and it is called in the httpRequestFree() function in "client_side.c".

Process_AE_Feedback()

The Process_AE_Feedback() function checks the reading pipe for feedbacks from the AE Daemon and performs corresponding operations. Currently, there are two types of feedbacks from the AE Daemon: a block-entry request and a block-client request.

  • When a block-entry request is received, the function tries to release the corresponding object from the Squid cache and inserts it to the "block_entry_table", so that later requests for the same object will be blocked.
  • When a block-client request is received, the function inserts the corresponding client to the "blocked_client_table", so that all later requests from this client will be blocked.

Definition of Process_AE_Feedback() is located in "client_side.c" and it is called in the httpRequestFree() function in "client_side.c". It is also called when Squid exits, right before AEI_DB_FreeMemory() being called in "main.c".

SendReleaseNotification()

The SendReleaseNotification() function sends a notification to the AE Daemon each time a stored entry is released by Squid. (Note that a stored entry is not necessarily associated with a cached object, but a cached object is always associated with a stored entry.) This notification allows the AE Daemon to catch the event whenever an object is evicted from the Squid cache so that the it can keep its own database consistent with the Squid cache by removing the evicted entry.

Definition of SendReleaseNotification() is located in "main.c" and it is called in the storeRelease() function in "store.c".

clientAccessCheck()

clientAccessCheck() is an original function of Squid provided in "client_side.c". We inserted codes to this function to perform additional access check based on entries in the "blocked_client_table" and the "blocked_entry_table". If an http request matches an entry in either table, the request will be blocked. The blocking operation is accomplished by the following code:

clientAccessCheckDone( ACCESS_DENIED, http );

SendAccessInfo()

The SendAccessInfo() function sends access information to the AE Daemon through a pipe.

Definition of SendAccessInfo() is located in "main.c" and it is called in the ProcessAccessInfo() function in "client_side.c".

AE Daemon

Summary

The AE Daemon is a program spawned by Squid at startup. The communicates with the AEI through a pair of pipes. It receives access information sent by the AEI and processes it by running detection algorithms. When pollution objects or IP addresses of attackers are detected, it sends corresponding commands to the AEI to perform counterattack operations. Currently, there are only two commands: a block-entry request and a block-client request. Please refer to Process_AE_Feedback() for details about the two commands.

Data types and function delcarations of the AE Daemon are located in AE.h. Main program and major function implementations of the AE Daemon are located in AE.cpp.

Specification of Major Functions

main()

The main program of the AE Daemon works in the following way:

  1. Open the log file of the AE Daemon ("AE.log").
  2. Parse the configuration file ("AE.conf").
  3. Install handlers for detection modules and maintenance modules.
  4. Enter the main loop: Repeatedly read data from the input pipe (stdin) in blocking mode and perform corresponding operations.

Sig_Term()

The handler for the SIGTERM signal. If the AE Daemon is not busy, the handler terminates the AE Daemon immediately by calling Quit(); otherwise, it marks a bit in a global flag to postpone the operation until AE Daemon is free.

Sig_Alarm()

The handler for the SIGALRM signal. SIGALRM is sent by the AE Daemon itself periodically to trigger the data processing of the SKETCH module. On receiving this signal, the handler re-set the signal timer for the next period first. Then it checks the state of the AE Daemon, if it is not busy, the handler triggers the SKETCH module by calling Run_SKETCH(); otherwise, it marks a bit in a global flag to postpone the operation until AE Daemon is free.

Sig_User1()

The handler for the SIGUSR1 signal. SIGUSR1 is sent by the SKETCH module when it has finished processing an input file. On receiving this signal, the AE Daemon processes the SKETCH's feedback by reading its output file. This is accomplished by calling the Process_SKETCH_Feedback() function. If the AE Daemon is currently busy, the handler will postpone the operation until the AE Daemon is free.

Sig_User2()

The handler for the SIGUSR2 signal. SIGUSR2 is sent by a debugging tool called AE Monitor(AEM) to perform some maintenance operations.

CheckPendingProcess()

The CheckPendingProcess() is called when the AE Daemon is free. It checks whether there are any postponed operations triggered by signals. If there are, it performs the corresponding operations.

Quit()

The Quit() function terminates the AE Daemon and all its child processes.

PCSA_Process()

The PCSA_Process() function is called each time new access information is received. It updates PCSA records according to the access information. If a record becomes "positive" according to the PCSA detection rule, the function issues a block-entry request to the AEI by calling the SendBlockEntryRequest() function. In addition, the corresponding PCSA record is deleted from the PCSA database (a hash table).

Install_SKETCH_Handler()

The Install_SKETCH_Handler() function initializes the SKETCH module by performing the following operations:

  1. Spawn the SKETCH process.
  2. Get rid of the spawned process's "stdin" and "stdout". Redirect its "stderr" to a log file (Sketch.log).
  3. Open a temporary file to write input data for the SKETCH module.
  4. Install the signal handler for SIGALRM and start the signal timer.
  5. Install the signal handler for SIGUSR1, which is used to handle the SKETCH's feedbacks.

If an error occurs during the process, the function terminates the AE Daemon.

Append_SKETCH_Record()

The Append_SKETCH_Record() function is called each time new access information is received. It appends a record to the SKETCH input file based on the new access information.

Run_SKETCH()

The Run_SKETCH() function triggers the data processing of the SKETCH module by sending the SIGUSR1 signal to the SKETCH module. Before sending the signal, it renames the temporary input file for the SKETCH to its formal name and resets the temporary input file.

Process_SKETCH_Feedback()

The Process_SKETCH_Feedback() function processes feedbacks from the SKETCH module. This is accomplished by reading the output file of the SKETCH module and sending a block-client request to the AEI for each entry specified in the output file.

SendBlockEntryRequest()

The SendBlockEntryRequest() function sends a block-entry request to the AEI so that later requests for the specified object will be blocked.

SendBlockClientRequest()

The SendBlockClientRequest() function sends a block-client request to the AEI so that all later requests from the specified client IP will be blocked.

Install_AEM_Handler()

The Install_AEM_Handler() function installs the handler for the AEM tool, a debugging tool used to perform maintenance operations.

ParseConfiguration()

This function parses the configuration file (AE.conf) and sets system parameters according to the configuration. If an error occurs, it terminates the AE Daemon.

PCSA Module

Summary

The PCSA is a detection module that deals with false locality attacks. It records cached objects that are referenced frequently and keeps track of the reference count and the number of unique source IPs that request the object for each of them. If "number of unique IPs" / "reference count" exceeds certain threshold, the object is judged as "positive" and it will be blocked.

To keep track of the number unique source IPs, the PCSA uses accurate counting first, which records every unique IP in a set. When the number of unique IPs exceeds certain threshold (specified by the parameter PCSA_max_ac_set_size), the PCSA transits accurate counting to probabilistic counting. The probabilistic counting algorithm we use is called Probabilistic Counting with Stochastic Average (PCSA), that is where the name of this module comes from.

Data types and function delcarations about the interface of the PCSA module are located in AE_PCSA.h and the corresponding function implementations are located in AE_PCSA.cpp.

Data types and function delcarations of the general purpose PCSA algorithm are located in PCSA.h and the corresponding function implementations are located in PCSA.cpp.

In addtion, the source file Hash_PCSA.h and Hash_PCSA.cpp contain data types and functions related to PCSA records, which are stored in a hash table that serves as the core database of the PCSA module.

Specification of Major Functions

PCSA_MAIN::PCSA_MAIN()

This is the constructor of the PCSA_MAIN class. It initializes a hash table that stores the PCSA records for cached objects.

PCSA_MAIN::ReferenceEntry()

This function is called each time new access information is received. It updates the PCSA records according to the new access information.

PCSA_MAIN::Positive()

This function checks whether a PCSA record is "positive" according to the detection rule.

PCSA_MAIN::DeleteEntry()

This function deletes a PCSA record from the hash table.

PCSA_MAIN::ACtoPC()

This function transits a PCSA record from accurate counting to probabilistic counting.

PCSA::PCSA()

This is the constructor of the PCSA class, which implements the general purpose PCSA algorithm. The constructor performs the following operations:

  • Set the m_bit parameter and the hash function to the values passed by the arguments of the constructor.
  • Initialize the bitmaps used for probabilistic counting according to the m_bit parameter.

PCSA::InputValue()

This function input an element of the multiset to the probabilistic counting system.

PCSA::GetCardinality()

This function calculates the PCSA cardinality of the inputted multiset.

PCSA_RECORD::Cardinality()

This function retrieves the number of unique source IPs associated with a PCSA record. If a record has transitted to the probabilistic counting, the function returns the PCSA cardinality of source IPs, otherwise it returns the size of the set used for the accurate counting of unique IPs.

SKETCH Module

Summary

The SKETCH is a detection module that identifies abnormal heavy hitters. It is implemented as a Triggerred Module of the AE, which is spawned by the AE Daemon at startup and triggerred by signals from the AE Daemon.

The main program of the SKETCH module is located in Sketch/IDS_rev32.cpp. The core part of the program is a loop repeatedly waiting for signals. Once a signal of SIGUSR1 is received, the SKECTH module reads data from an input data file generated by the AE Daemon and processes them. It outputs the result to another file. When it has finished processing, it sends a SIGUSR1 signal to the AE Daemon. The AE Daemon then reads the output file of the SKETCH and issues "block-client" commands to the AEI for each entry specified in the file.

Configuration File

The configuration file of the AE is located at /usr/local/squid/etc/AE.conf. You can change the parameters of the AE by customizing the configuration file. The configuration includes the following parameters:

1. PCSA_enable

Whether to enable the PCSA module or not. Set it to 1 to enable the module and set it to 0 to disable it.

Example: PCSA_enable=1

2. PCSA_refcount_threshold

The threshold of the "refcount" provided by Squid, above which PCSA starts to keep track of a cache object.

Example: PCSA_refcount_threshold=1000

3. PCSA_m_bit

The bit length used for bitmap index number, number of bitmaps(m) = 2 ** m_bit. (default value: 5, i.e., m = 32.)

Example: PCSA_m_bit=5

4. PCSA_max_ac_set_size

The max size of the set used for accurate counting of unique source IPs associated with each PCSA record, above which accurate counting will be transitted to probabilistic counting. (default value: 50)

Example: PCSA_max_ac_set_size=50

5. PCSA_positive_threshold

The positive threshold of the PCSA detection algorithm. If "number of unique IPs" / "reference count" of a PCSA record exceeds this threshold, the record is judged as "positive" and the corresponding object will be blocked.

Example: PCSA_positive_threshold=200.0

6. SKETCH_enable

Whether to enable the SKETCH module or not. Set it to 1 to enable the module and set it to 0 to disable it.

Example: SKETCH_enable=0

7. SKETCH_delta

Delta, a parameter used in the SKETCH detection algorithm.

Example: SKETCH_delta=5.0

8. SKETCH_gamma

Gamma, a parameter used in the SKETCH detection algorithm.

Example: SKETCH_gamma=0.1

9. SKETCH_interval

The time in seconds that indicates the period of triggering the SKETCH module. (default value: 1800, i.e., 30 minutes.)

Example: SKETCH_interval=1800

Log Files

To be written...