**This is an old revision of the document!**

Content Discovery Administration

This page describes the steps that are required to enable, set-up and manage Content Discovery and DLP to detect information of interest within documents and other files. Troubleshooting, and a discussion of information security considerations is also included.

Applies to:

  • Enterprise File Fabric Appliance [some editions] (since v1808)

See also:

Configuration

Configuring the Appliance

The Appliance Administrator (appladmin) is responsible for assuring the content search engine for the appliance is enabled, and that Content Discovery is enabled for an Organization.

1. Enabling the Content Search Engine

Content search must be enabled for Content Discovery to work. Starting with v1808 it is enabled by default for new installations. See Content Search and PDF Burn Service for more information.

2. Enabling Content Discovery for an Organization

Content Discovery is only available to Organizations that have been assigned a User Package in which the feature is enabled. This can be set for a Package by:

  1. Choosing “User Packages” from the hamburger menu;
  2. Clicking on the pencil icon next to the package name to load the “User Packages :: Edit Package“ page;
  3. Activating “Content Discovery” on the “Extra options (add-ons)” list (use the Ctrl key to avoid de-selecting previously selected options);
  4. Saving the change.

Assign the Package to the User if it has not already been assigned.

Configuring the Organization

After the appliance is configured Content Discovery must be enabled for each organization, and content search enabled for each participating storage provider. Users who will be using Content Discovery features will also need authorization.

1. Enabling Content Discovery

Enable this feature under Policies > Content Discovery:

Search must be enabled for a provider data source when it is added. This is done by selecting “Index content for search” on the page that gathers authentication information:

Files that existed on the storage before the provider was added are indexed during the initial provider synchronization. Subsequently files are indexed when created or updated, or if a provider cloud sync is executed and new or updated files are discovered.

Search cannot currently be enabled for an existing provider. To determine whether content search is enabled for a provider, as an organizational administrator go to the Dashboard. Select the Setting gear icon to go to see the data source provider detail. The Content index for search setting must be set to Yes.

3. Authorizing Users for Content Discovery

Users with the Administrator role will be able to use features such as search and notifications that are available with Content Discovery. Other users must belong to a role where the Content Discovery permission is enabled

To enable a user to use Content Discovery features, to a user, the org. admin assigns the “review Content Discovery” permission to a role on the “Organization roles” page.

Then assigns the role to the user on the “Organization staff :: Edit User” page.

Note that the “review Content Discovery” permission replaces the “PII” permission that available in v1803 of the File Fabric.

Administration

The Content Discovery module, accessible by administrators from the Organisation Menu, allows users to describe the content they want to detect as a set of text strings or patterns. Content Detectors match pieces of content, and are grouped into Content Detection Categories for classification. Content Detectors can be reused between different Content Detection Categories.

Creating a Content Detection Category

You can create a new Content Detection Category by entering the name and pressing the Create Content Discovery Category button.

You can also select an existing template to start with

When you create a Content Detection Category this way it will be pre-populated with the content detectors in the template.

Editing a Content Detection Category

Content Detection Categories appear in a list on the Content Detection page:

To delete a Content Detection Category click on the ‘x’ next to its name. A category cannot be removed if matching content was detected in one or more files.

To change the name of a Content Detection Category, click on the pencil icon. To expose the set of content detectors in the category, click on the triangle.

To remove a content detector from a category, click on the ‘x’ next to the detector’s name.

To add a detector from the list of available detectors, click in the empty box at the bottom of the list of detectors in your category definition.

Then select the detector from the drop-down list and click on the Add button. You can add more than one detector at a time by clicking on each prior to clicking the Add button:

To remove a detector from your category, click on the ‘X’ next to the detector’s name.

You can use the expansion arrow next to the detector’s name to open the detector editor, which is discussed in the Content Detectors section later on this page.

Creating Content Detectors

A set of content detectors is supplied. Your can use these detectors as-is or change them. You can also add your own detectors, and remove detectors from the set available.

To see the list of available content detectors, click on the triangle next to Available Content Detectors on the Content Detection page:

Recall that when matching content is found during a document scan, the document is tagged using the category as the tag classification and the detector that detected the matching content as the tag value. To change a detector’s name or it’s tag value, click on the pencil next to its name to open a dialog box where these values can be changed.

To remove a detector from any categories in which it is used and delete it from your org’s list of available connectors, click on the ‘x’ next to its name:

To inspect or edit the detector definition, click on the triangle to the left of its name. This will expose the list of rules in the detector.

To create a new detector, click on the Create New Detector button under the list of available detectors:

The New Content Detector window will open, allowing you to set the new detector’s Name and Tag:

Creating Filters

Detectors contain filters which contain criteria for finding matching content.

If any of the filters match content during a file scan, then that file is marked is tagged with the detector’s tag value.

Each filter has a title, a type and a value. The type can be either RegEx or Built-in.

Filters of type RegEx can be created and edited by the org. admin. The value of a RegEx filter is a regular expression in PCRE syntax.

The value of a built-in filter is the name of a small piece of content detection code that is provided with the File Fabric for use inside of filters.

To edit a filter, align the cursor with the name of the filter you wish to edit, and click on the arrow that appears to the right:

The filter editor will open.

Here you can change the name, type and value of the filter. If you choose the Built-in type and the click in the Value field, a pick list of built-in filters will be displayed.

You can pick the built-in that you want from this list.

To add a filter, click the “Add a new filter” button under the bottom right of the Filters list.

This will open the Create New Filter window.

To remove a filter from a detector, click on the ‘x’ that appears to the right when the cursor is aligned with the filter name.

Unassigned detectors get assigned to a Default Category which would cause it to scan for these every time.

Exporting Content Discovery Configuration

The content discovery configuration (categories, detectors and filters) can be exported to a text file in JSON format. To export the current configuration, click on the Export Content Discovery Configuration button at the bottom of the Content Discovery page.

Importing Content Discovery Configuration

A content discovery configuration file, in JSON format, can be imported. First click on the triangle next to the Import Configuration label on the Content Discovery page. Then you can paste (or type) the JSON into the pane on the left, or drag a file containing the JSON configuration into the pane on the right. Finally, click on the Import Content Discovery Configuration to complete the operation.

When a new configuration is imported, neither categories nor content detectors that are in the current configuration but are not in the imported configuration will be removed from the current configuratio. Similarly, assignments of detectors to categories will not be removed. New assignments of detectors to categories will, however be reflected.
Changes to the contents of detectors are handled differently. If a content detector exists in the current configuration and also in the imported configuration file then the new definition replaces the old.

Information Security Considerations

Shared Team Folders

The File Fabric provides a feature to allow team members to have shared access to folders. Whether or not a Shared Team Folder contained matching content files at the time that it was shared, if new files that file contain matching content are uploaded, or if existing files are updated to contain matching content then users with access to the folder will gain access to the matching content in the files.

The File Fabric provides an option which, if enabled, allows users to generate links for sharing files by download with external recipients. Files containing matching content can be shared this way, allowing link recipients to gain access the matching content in the files.

Depending on the settings that are in force when a file sharing link is created, a link may remain active for a long time. These links point to the shared file by name; if the file is updated then a recipient who uses the link to download the file after it has been updated will get a copy of the updated file. Whether or not the original file contained matching content, it is possible that the updated file will contain matching content. In that case the recipient of the link will gain access to the matching content.

The File Fabric also provides an option which, if enabled, allows users to generate links for sharing folders with external recipients. When a folder has been shared this way, the recipient of the link can download any of the files in the folder. If any of those files contain matching content then the recipient can gain access to the matching content by downloading those files.

As with file sharing links, a folder sharing link may remain active for a long time. Whether or not the folder contained files which contained matching content when the link was generated, It is possible that files with matching content will be added to the folder or files in the folder will be updated to include PII. If matching content has been added to the folder contents while the link is still active then link recipients may gain access to the matching content by downloading the files that contain it.

Business Groups

Another File Feature, if enabled, allows the creation of groups of internal and external users with a shared workspace. If files containing matching content are placed into a Business Group’s workspace then members of the Business Group will be able to access the matching content by download those files.

Troubleshooting

Files Not Being Scanned on Upload

If no files are being scanned, check the following:

  • Is content search enabled on the File fabric?
  • Has Content Discovery been enabled in the package assigned to the Organization?
  • Is content search enabled on the provider(s) to which files are big uploaded?
  • Are the files of one of the supported file types?
  • Is the file within the configured size limit for content indexing?

If uploaded files are scanned in some cases and not in others, check the following:

  • Is the file that is not being scanned of one of the types for which Content Discovery is supported?
  • Is the file that is not being scanned within the size limit for content indexing?
  • Was content search enabled in the settings of the storage provider to which the file was uploaded?

Files Being Scanned but matching content Not Being Detected

If matching content is not being detected, confirm that the org.’s Content Discovery configuration contains includes categories with detectors for the content you expect to be found

Info Pane Doesn’t Show Matching cCntent

Confirm that the user has Content Discovery authorization. Only administrators and members to whom Content Discovery authorization has been given will see the Content Discovery button in the Info pane.

If the button appears but some or all of the Content Discovery values are missing, check to see if the content detectors for the missing values are still in the org.’s Content Discovery configuration. If they have been removed then the corresponding tags and values won’t be displayed.

Search Screen Doesn’t Show the Content Discovery Controls

Confirm that the user has Content Discovery authorization. Only administrators and members to whom Content Discovery authorization has been given will see the “Content Detection Categories” and “Detected Content” controls on the Search screen.

Content Discovery Classifications Not Listed in Dropdown on Tags Tab

Confirm that the user has Content Discovery authorization. Only administrators and members to whom Content Discovery authorization has been given will see Content Discovery classifications on the Tags tab dropdown.