Fingerprints allow you to monitor your organization's data, build indexes of rolling hashes of that data on-premises, and prevent sensitive or confidential information from leaving the organization by creating compliance policies around it. You can create and use fingerprints for structured data or unstructured data. Structured, or Exact Data Matching (EDM) fingerprints, allow you to monitor your organization's data in a row and column format typically extracted from a database in CSV format. Unstructured or Index Document Matching (IDM) fingerprints work by content matching indexed documents and images. Once your data is fingerprinted, you can add a DLP Policy rule to leverage that indexed data.
To create a fingerprint, Skyhigh CASB extracts text from the input folder or file, normalizes it, and then secures it using multiple overlapping hashes. Then, Skyhigh CASB generates an index that consists of the hashes. Using multiple hashes allows partial and derivative matches to be detected. For example, for unstructured fingerprints, if 20% (or the configured percent as part of the DLP policy) of the content of any fingerprinted document appears in a document uploaded to a cloud service, then Skyhigh CASB generates a policy violation.
Skyhigh CASB creates and saves indexes locally at the location that is configured when you create a fingerprint: the Folder Path to Locally Generated Hashed Files. Indexes are then present until they are deleted. Fingerprints, indexes, and hashes are all roughly the same, and each file usually generates multiple indexes. The fingerprint is then uploaded to Skyhigh CASB and used to detect policy violations. A fingerprint is available for use with a DLP Policy only when one of the fingerprint's status is Active.
IMPORTANT: Before you can use the Fingerprint feature, you must download and install the DLP Integrator. See the install instructions for information about account permissions required by the DLP Integrator.
Unstructured or Index Document Matching (IDM) fingerprints work by content matching indexed documents and images.
Unstructured fingerprints support the following:
- Supported data includes the file types listed in Supported File Formats.
- Archive files such as ZIP files are automatically excluded.
- Each individual input file is limited to 50 MB, but multiple input files can be indexed together.
- Indexes generated are 10% to 15% of the total input data of all individual input files. For example, if the total input data is 100 GB, the indexes are about 10-15 GB.
Structured, or Exact Data Matching (EDM) fingerprints, allow you to monitor your organizations in a row and column format typically extracted from a database in CSV format.
IMPORTANT: Skyhigh CASB Structured fingerprints only work with ASCII characters. Any other language with Unicode characters outside the ASCII range is not supported.
Structured fingerprints support the following:
- Alphabetic. Alphabetic characters.
- Number. Numbers supported with decimals.
- Alphanumeric. Alphanumeric characters.
- Zip Code. ##### or #####-####
- Email. LOCAL_SUBPART ('.' LOCAL_SUBPART)* '@' DOMAIN_SUBPART ('.' DOMAIN_SUBPART)*;
- Date. DD/MM/YYYY DD-MM-YYYY MM/DD/YYYY MM-DD-YYYY YYYY/MM/DD YYYY-MM-DD
- Phone. (###)###-#### or ###-###-####
- Credit Card Number. CCNs formatted with decimals, spaces, or underscores, separated by pipes or semicolons.
- Social Security Number. ###-##-####, or ### ## ####, or ###.##.###, separated by pipes or semicolons.
- Identifier. Letters and numbers with hyphens and decimals.
- Indexes generated are 1x-2x the input data. For example, if input data is 50 MB, the indexes are about 50–100 MB.
IMPORTANT: In the Policy Incidents email, the match count for Structured Fingerprints (EDM) always appears as at least the minimum match count based on the rules of the policy. Once the minimum specified threshold in policy is met, Skyhigh CASB stops looking for matches. This is the expected behavior.
The Fingerprints page contains the tools you need to create and manage all Fingerprints (Structured and Unstructured).
You will find the page at Policy > DLP Policies > Fingerprints.
The Fingerprints page displays the following actions and information:
- Create New Fingerprint. Click to create a fingerprint. Then select Unstructured Data Fingerprint or Structured Data Fingerprint.
- Fingerprint Name. Name of the fingerprint, which is used to associate against the fingerprint rule in the DLP policy. Click to edit the fingerprint.
- Type. Displays whether it is a Structured or Unstructured fingerprint.
- Ver #. The version number, which indicates the number of times the indexing operation has been done. The last successful Active version is used for policy evaluation.
- Items Indexed.
- For unstructured fingerprints, this is the number of items indexed out of the total number.
- For structured fingerprints, this is the number of rows in the CSV input.
- Status. Some of the following statuses are "transitional" and appear only briefly. Possible statuses include:
- Active. The index is active and used to associate with the DLP policy and evaluation.
- Indexing Pending. Sending a request to the Index.
- In Progress. The request is received. Preparing to Index.
- Update In Progress. The update is processing.
- Optimizing. Optimizing indexes.
- Indexing Completed. The Indexing process is complete.
- Uploading. Indexing is complete, and the Index is uploading.
- Publishing. Index upload is complete, and the Index is being published.
- Canceling. Cancelation is in progress.
- No Digest Exists. No indexes exist.
- Deleted. The Index with all versions is deleted.
- Not Indexed Since Last Config Update. Not indexed since the last config update.
- Critical Errors. The indexing operation cancels for a critical error. Click Critical Errors to see a dialog with more information. Correct the errors, and (re)generate the index.
- Non-critical Issues. The indexing operation does not cancel and completes. To see the error dialog, click the link.
- Index Last Generated. The date and time of the latest index.