About Data Loss Prevention
Data loss prevention (DLP) ensures that sensitive content is not allowed to leave your network. The prevention process detects this content and blocks traffic going out to the web accordingly.
The following elements are involved in this process:
- Data loss prevention rules that control the process
- Default classifications and a dictionary that you fill with entries for data loss prevention
- Data loss prevention modules, which are called by the rules that are processed to find out about sensitive content
You can also use data loss prevention rules to keep inappropriate content from entering your network. However, this can have an impact on performance.
The data loss prevention process can be applied to text contained in the body that is sent with a request or response or to any other text that is contained in requests or responses, for example, URL parameters or headers.
When you are running the appliance together with a DLP solution that uses an ICAP server for the filtering process, you can implement a rule set to ensure the smooth flow of data between the appliance and the ICAP server.
Data loss prevention rules
Data loss prevention is not implemented by default on the appliance, but you can import the Data Loss Prevention rule set from the library.
You can then review the rules of this rule set, modify or delete them, and also create your own rules.
A data loss prevention rule blocks, for example, a request if the text that is sent as its body includes sensitive content. To find out whether this is true for a given request body, the rule calls a module that inspects the body. To know what is considered sensitive, the module refers to the default classifications on the system lists or to dictionary entries, according to what is configured.
When a request or response is processed, its body text is stored as the value of the Body.Text property. Before body text can be stored and inspected, it must be extracted. The Composite Opener module performs the opening jobs. A rule in a rule set of the Common Rules rule set enables the opener by default.
A request body could, for example, be a text file that uploading to the web is requested for. The value of a suitable body-related property in the rule criteria would then have to be true for the rule to apply and execute the blocking.
The following rule uses the DLP.Classification.BodyText.Matched property in this way. If a request includes sensitive content in its body, this is detected by the data loss prevention module. The value of the property is set to true, and the request is blocked.
|Block files with SOX information|
|DLP.Classification.BodyText.Matched<SOX> equals true||–> Block<DLP.Classification.Block>|
When this rule is processed, the data loss protection module knows, due to its settings, that it has to look for content that is sensitive with regard to the SOX (Sarbanes-Oxley) regulations, which deal with responsibilities of public companies.
Events can be added to the rule to log information on data loss prevention or to increment a counter that counts how often it has occurred that a request is blocked due to this rule.
Default classifications and dictionary entries
Default classifications and dictionary entries are used in data loss prevention to specify sensitive content that
should be prevented from leaving your network.
However, you can also use system lists and dictionary entries to specify inappropriate content, such as discriminatory or offensive language, that should not be allowed to enter your network. Inappropriate content could, for example, be specified this way to let a rule block content sent from web servers in response to requests.
The library rule set for data loss prevention contains a nested rule set for processing body text in the response cycle.
Default classifications and dictionary entries differ in the following ways:
- Default classifications — Provide information for detecting different kinds of sensitive or inappropriate content, for example, credit card numbers, social security numbers, or medical diagnosis data.
Default classifications are contained in folders and subfolders on system lists and updated by the appliance system. You can view the system lists under DLP Classification in the System Lists branch of the lists tree, but you cannot edit or delete them.
When you edit the settings of the module that handles classifications, you can select suitable subfolders from the folders on these lists and create a list with classifications for data loss prevention in your network.
- Dictionary entries — Specify sensitive or inappropriate content, for example, names of persons or keywords indicating content that should not leave your network
The dictionary is created as part of the settings for the module that handles this list.
Creating a dictionary and filling it with entries for sensitive or inappropriate content is a means to configure the data loss prevention process beyond what is possible by using the default classifications on the system lists. This way you can adapt the process to the requirements of your network.
Data loss prevention modules
The job of the data loss prevention modules (also known as engines) is to detect sensitive or inappropriate content in the body text of requests and responses and also in any other text that is sent with requests and responses.
When composite objects, such as archive documents, bodies of POST requests, and others, are sent with requests or responses, they are also included in the data loss prevention process. To account for such objects, the data loss prevention rules are also processed in the embedded objects cycle.
Depending on what the data loss prevention modules find out, body-related properties in rule criteria are set to true or false, so web traffic is eventually blocked or allowed.
There are two modules that differ in their use of lists for detecting relevant content:
- Data Loss Prevention (Classifications) — Uses default classifications on system lists for data loss prevention
- Data Loss Prevention (Dictionaries) — Uses dictionaries with entries for sensitive and inappropriate content that you provide yourself for data loss prevention
When configuring settings for the modules, you let them know which content to look for. The default classifications and dictionary entries that specify the content are among the settings parameters.
Search methods for data loss prevention
There are different methods of searching content that should be prevented from leaving or entering your network.
- A search can aim at finding out whether a given request or response body includes portions of content that are specified as sensitive or inappropriate.
- A search can begin with a portion of content, for example, an URL parameter or header, and find out whether it is sensitive or inappropriate according to what you have configured.
For the first method, you can use the DLP.Classification.BodyText.Matched property that was already shown in a sample rule.
For the second, you can use the DLP.Classification.AnyText.Matched property. This property takes a string parameter for the content portion that is checked for being on a system list or in a dictionary.
Depending on what you are working with, you would use the two already mentioned parameters with system lists and DLP.Dictionaries.BodyText.Matched and, DLP.Dictionaries.AnyText.Matched with the dictionary.
Logging data loss prevention
Additional properties are provided for logging the results of the data loss prevention process. They allow you to log this data, for example, using an event in a rule.
When the value of DLP.Classification.BodyText.Matched is true for the body text of a request or response that was processed, the following applies for the relevant logging properties:
- DLP.Classification.BodyText.MatchedTerms contains a list of the matching terms from the body text
- DLP.Classification.BodyText.MatchedClassifications contains a list of the matching classifications
When the value of DLP.Dictionary.BodyText.Matched is true, DLP.Dictionary.BodyText.MatchedTerms contains a list of all matching terms.
Similarly, matching terms and classifications can be logged for the search method that looks for matches of a given text string.
When the value of DLP.Classification.AnyText.Matched is true:
- DLP.Classification.AnyText.MatchedTerms contains a list of matching terms found in text other than body text.
- DLP.Classification.AnyText.MatchedClassifications contains a list of matching classifications found in text other than body text.
When the match is in a dictionary, DLP.Dictionary.AnyText.Matched is true and DLP.Dictionary.AnyText.MatchedTerms contains a list of matching terms.
Information on data loss prevention results is also shown on the dashboard.
Preventing loss of medical data
The following is an example of data loss prevention that assumes medical data must be prevented from leaving the network of an American hospital.
Default classifications for preventing the loss of medical data are contained in the HIPAA (Health Insurance Portability and Accountability Act) folder. In addition to this default information, the names of the doctors who are working in the hospital are entered in a dictionary to ensure they also do not leave the network.
The following activities need to be completed for configuring data loss prevention in this example:
- Configure settings for the Data Loss Prevention (Classifications) module that include the default HIPAA classifications
- Configure settings for the Data Loss Prevention (Dictionaries) module that include the doctors' names as dictionary entries
- Make sure the rule that activates the Composite Opener is enabled
In the default rule set system, this rule is contained in the Enable Opener rule set, which is nested in the Common Rules rule set.
- Create a rule that checks content according to the configured settings
The rule must be included in a rule set that applies in the request cycle for request to upload data from the hospital network to the web.
This rule set can be a nested rule set of the default rule set for data loss prevention or a rule set that you create yourself.
In this example, the rule checks only text contained in the body of a request. It could look as follows:
|Prevent loss of HIPAA data and doctors' names|
|DLP.Classification.BodyText.Matched<HIPAA> equals true AND
DLP.Dictionary.BodyText.Matched<Doctors'Names> equals true