Skip to main content
Skyhigh Security

Log Sources

Content Security Reporter (CSR) uses log sources to collect the data displayed in dashboards and reports.

Log source modes

Log sources modes depend on the ability of your network filtering device to send log data to CSR. 

You can configure a log source by importing a single log file, or by using these options:

  • Accept incoming log files — CSR accepts log data from network filtering devices.
    NOTE: The log sources are accepted over FTP, FTPS (FTP over SSL), HTTP, and HTTPS.
     
  • Collect log files from — CSR collects log data from network filtering devices or log data storage devices.
    NOTE: The log sources are collected over FTP, FTPS (FTP over SSL), and SFTP (FTP over SSH).

Log formats

Log formats determine how CSR processes (also called parsing) data from log files, and how the data is stored in the database.

CSR recognizes the structure of auto-discover and fixed-field log formats. You can view the statistics of log processing on the CSR as you need it.

User-defined columns

Up to four user-defined columns can be configured for each log source during log file processing, and can be used to substitute column data, or to obtain data from columns that are normally skipped.

User-defined columns are also used when repopulating database columns during database maintenance.

User-defined columns do the following:

  • Include skipped log field data — During log file processing, some log file fields are skipped. For example, log file processing skips the Secure Web Gateway Referrer and Policy name fields. Up to four user-defined columns are available to pull data from the skipped fields.
  • Assign a custom value to column data — To easily find and review in reports, substitute standard column data with a custom string value. For example, you want to assign test-lab to all IP addresses beginning with 115, and assign other to any additional IP addresses. In the report, the user-defined column displays either test-lab or other in place of the numeric IP address values. CSR treats newly created user-defined columns as an additional column and leaves the original column and data in the log file. Using the previous example of substituting IP addresses, the original IP address column data remains unchanged and is still available to use in reports.

NOTE: When entering a value in the Log file header value box, do not use quotation marks.

Processing and post-processing

When configuring a log source, use the Processing and Post-Processing tabs to determine how CSR handles the data pulled from log files.

Page views setting

The Condense log records into page views setting on the Processing tab for a log source affects queries and disk space requirements for the reporting database.

Each line of a log file is a separate HTTP request for a webpage element. Viewing one webpage can result in multiple records in the log file.

The Condense log records into page views option consolidates multiple records from a log file into a single page view, or "hit," in reports. Condensing log records into page views generates a concise report view when using either summary or detailed queries. For example, condensing log records into page views could potentially reduce a 1 GB log file to a 100 MB log file.

By default, the Condense log records into page views option is enabled. If you disable this option, each webpage you visit, and element on the page, are logged as separate HTTP requests. For example, if you visit www.example.com, which contains multiple elements, the log data generates as follows:

www.example.com

www.example.com/rss.xml

www.example.com/advertisement.js

adserver.example.com/ad1.jpg

adserver.example.com/ad2.jpg

adserver.example.com/ad3.jpg

When you enable Condense log records into page views, your log data will show only one HTTP request as a page view —www.example.com.

Directory

Directories collect the group information for users that are located in the log file.

When you select multiple directories, CSR prioritizes the directories as they appear in the Selected directories list.

If the configured directories are Secure Sockets Layer (SSL) enabled, you might experience slowness in parsing logs.

Custom columns

Custom columns substitute the data in the browser and cache columns in your log files with a word or phrase that better identifies the browser or cache value.

Custom columns are pre-defined rules. Instead of your reports containing Mozilla/4.0 (compatible; MSIE 7.0…), the reports contain Internet Explorer 7.0. However, the original data value is retained in your database.

Each custom column uses a configured rule set to substitute technical data values from the browser or cache columns, and substitute with common identifiers to make the browser and cache data in your reports more recognizable.

Custom rule sets

Rule sets are customized instructions that tell CSR to look for a specific string of data during log file processing and replace it with a different string. This resulting string appears in reports and is more recognizable to users. A test function is available to validate the result of a rule set. Rule sets make your custom columns and user-defined columns work. Configure rule sets to find any string that appears in a log file and replace it with a different string defined by you. The string can be letters, numbers, and symbols.

Custom column rule sets

Custom columns are predefined for the browser and cache columns. Each custom column has a corresponding rule set. You can modify the rule sets, but you cannot add or delete rule sets for the custom columns.

User-defined column rule sets

User-defined columns are customized by you for any available log record or header. You create the rule sets for these columns, which can be edited, deleted, copied, and used by more than one user-defined column at a time.

Browse time

You can specify the length of time for the browse time threshold.

CSR estimates a user's browse time by calculating the difference between the time stamps of two log lines.

For example, if the log file shows that Jon Lock visits www.example.com at 03:00:00 p.m. and news.example.com at 04:30:00 p.m., the browse time is the 1 hour 30 minutes that occurred between the time he visited www.example.com and news.example.com. However, Jon Lock probably did not spend more than one hour viewing a single webpage. To compensate for this, CSR overrides the estimated browse time with a default browse time.

The browse time threshold option specifies the maximum length of time you expect a user to spend viewing a single webpage. The default is three minutes. When a user exceeds the browse time threshold, the default browse time is recorded in the database instead.

  • Was this article helpful?