Skip to main content
Skyhigh Security

Data Identifier Rules

For data identifier rules, you can select to use Skyhigh CASB's predefined data identifier categories, or you can create custom data identifiers, all in one step of the wizard. 

Use a Predefined Category

Data Identifier predefined categories can be used to detect many common patterns such as Social Security Numbers, Credit Card Numbers, and others, and apply advanced validation to improve accuracy. (For example, it can validate the Luhn check for credit card numbers).
data_id_category.png

This rule allows you to define:

  • Data Identifier. Select Use a Predefined Category. 
  • Location. Specify if the match is in:
    • All
    • Email Subject and File Metadata
    • Email Subject, Body, Attachments, and File Content
  • Match Count. Specify the number of unique matches and perform extra keyword validation.
  • Count each match only one time. Activate or deactivate the checkbox to count the matches only one time or multiple times.
  • Exclude. Explicitly allow list-specific Data Identifiers that do not trigger an incident. For instance, specific company-owned CCNs can be added to an allowlist and excluded from the match when the policy is evaluated
  • Keyword Validation. Validates a predefined set of keywords. 
  • Keyword List. Select from Skyhigh default keywords or create a list of custom keywords of your own. 
    • Skyhigh Default. Select to use Skyhigh default keywords for your data identifier. 
    • Custom Only. Select to use custom keywords only. For custom keywords, you can use a predefined dictionary or manually enter keywords. The maximum number of custom keywords allowed is 10. 
    • Skyhigh Default and Custom List. Select to use both Skyhigh default keywords and custom keywords. The maximum number of custom keywords allowed is 10. 
  • Proximity Distance. Keyword validation looks for a predefined set of keywords within 200 characters (about 30-word) radius from a matched pattern.

For details about Data Identifier Definitions, validation, and Skyhigh default keywords, see Data Identifiers

Custom Data Identifier

Create a Custom Data Identifier using regular expressions, keyword validation, and proximity distance. You can use up to 5 regex rules.

data_id_custom.png

To define your Custom Data Identifier, enter the following:

  • Name. Enter a name for your Custom Data Identifier
  • RegEx. Regular expression rules allow you to define custom regular expression patterns to match the arbitrary text. We strongly suggest that you use a tool like RegexBuddy to develop and test your regular expression before deploying your Skyhigh CASB policy. (There is a limit of 5 regex rules. OR conditions are used between the regular expressions.)
  • Location. Specify if the match is in:
    • All 
    • Email Subject and File Metadata
    • Email Subject, Body, Attachments, and File Content
  • Match Count. Specify the number of unique matches and perform extra keyword validation.
  • Count each match only one time. Activate or deactivate the checkbox to count the matches only one time or multiple times.
  • Exclude. Explicitly allow list-specific Data Identifiers that do not trigger an incident. For instance, specific company-owned CCNs can be added to an allowlist and excluded from the match when the policy is evaluated
  • Keyword Validation. Validates a predefined set of keywords. 
  • Keyword List. Select a predefined dictionary or manually enter a list of custom keywords. (Limit of 10 custom keywords.)
  • Proximity Distance. Keyword validation looks for a predefined set of keywords within 200 characters (about 30-word) radius from a matched pattern.

Boundary Validation in Custom Data Identifiers

Custom data identifiers do not support boundary validation. Boundary validation must be explicitly captured in the regex rule.

For example, \bREGEX\b captures boundaries such as line breaks, tabs, white spaces, and special characters. But, simply using regex also shows matches that are in the middle of a longer pattern (partial matches).

The match highlights reported for custom data identifier incidents match the pattern described exactly as specified, which means they include word boundaries if they are specified in the pattern. 

  • Was this article helpful?