**This is an old revision of the document!**

PII Scanning and Detection

(available in v1803)

This page covers the identification and classification of PII (Personally Identifiable Information).

The Enterprise File Fabric's PII Scanning and Detection feature helps enterprise customers manage personal information by automatically detecting PII in documents and alerting the organisation’s information security specialists or other designated users to its presence.

Under GDPR and other data privacy regulations, businesses face increasing regulatory responsibilities to secure the personal data they collect from customers and others. For an organisation with substantial data assets, one of the challenges associated with ensuring that personally identifiable information (PII) is managed appropriately is knowing where that data resides in their systems.

Feature Summary

Scanning

The PII Scanning and Detection feature works by scanning documents when they are added or updated. Documents are searched using a configurable set of rules, looking for personal information such as telephone numbers, email addresses and national identity numbers.

Tagging

Files in which personal information is found are classified as PII with the types of PII data that they contain. Users with appropriate permissions can see the PII that has been found in a document on the “info” tab for that document:

Notifications

Users are notified when documents with PII are uploaded or shared.

Users with PII permission, including administrators, receive an email:

The file owner, (the user who uploaded the file,) receives an email and a message:

Notifications are also sent to administrators and PII users when a sharing link is generated for a PII file.

Users with appropriate permission are able to easily search for and retrieve documents tagged as containing PII:

Workflow

Uploading

When a file is uploaded, updated or synchronized the File Fabric recognizes it as containing new content; it is a candidate for being scanned for PII.

To be scanned the file must be located on a storage provider that has content search enabled (this is set set when the provider is created).

Files with the following extensions will be scanned: 'txt', 'doc', 'docx', 'rtf', 'pdf', 'htm', 'html', 'xls', 'xlsx', 'ppt' or 'pptx'.

The file size limit for content indexing that can be set on the appliance administrator’s SIte Search Integration page also applies to PII scanning and detection.

Scanning for PII

While a file is being scanned for PII, a visual indicator that the scan is in progress appears next to the file name in the File Manager.

A warning message that a scan is in progress is also shown at the top of the directory listing in the File Manager.

Tagging of PII Files

When PII is detected in a file, a tag is added to the file indicating the type of PII that was detected. For example, if the File Fabric is configured to scan for US Social Security Numbers (SSNs) and one or more data values that match the US SSN detection rule are found when the file is scanned, then a tag with the value “US Social Security Number” will be added to the file's metadata under the PII classification.

Notifications

Administrators and users with PII permission are notified when a file that matches the PII rules has been detected.

Users with PII permission, including administrators, receive a notification by email:

The file owner (the user who uploaded the file), receives both an email and a message.

Messages are delivered through the Cloud File Manager and other applications.

Sharing

A confirmation dialog is presented to users who share documents that contain PII:

When the file is shared notifications are sent by email to users with PII permission, including administrators:

File and Folder Indicators

The folder icons for folders that contain PII files - either directly or in a child folder - are marked with a special decoration in the File Manager:

File icons for PII files also have a special decoration in the File Manager:

When the contents of a folder that contains PII files - either directly or in a child folder - are displayed in the right hand panel of the File Manager, a notification about the presence of PII files is added to the display:

Searching

Content searches through the web-based File Manager can filter for specific PII information. The option is available for users with PII permission, including administrators. Select one or more checkboxes to filter for documents with those types of PII.

File Information

If a file contains PII, a “Show PII matches” button is displayed on the File Manager Info tab for the file. This is available to users with PII or administration permissions.

Pressing the button will cause the PII tags and values to be displayed:

piidiscovery

Tag Cloud

As with other classifications, the tags belonging to the PII classification can be displayed in a tag cloud by selecting the PII classification from the classifications dropdown list on the File Manager’s Tags tab:

Also as with other classifications, a list of the files to which a specific tag has been attached can be displayed by clicking on the tag in the tag cloud.

Configuration

These are the configuration steps that are required to make PII Discovery available:

Appliance Administrator:

  • Enable the Content Search Engine
  • Enable PII Scanning and Detection in User Packages

(Organization) Administrator:

  • Enable the Policy “PII Scanning & Detection”
  • Add Storage Providers with Content Search
  • Give Users PII Authorization
  • Configure PII Detection Rules (optional)
  • Change the Name of the PII Classification (optional)

1. Enabling the Content Search Engine

Content search must be enabled for PII scanning and detection to work. The content search engine scans documents for PII as they are uploaded or synchronized. The search engine is available only with the Enterprise File Fabric appliance and must be explicitly enabled.

Here is a link to instructions for configuring the content search engine: Content Search and PDF Burn Service

2. Enabling PII Scanning and Detection in User Packages

PII Scanning and Detection is only available to Organizations that have been assigned a User Package in which the feature is enabled. The appliance administrator (appladmin) can set this option for a Package by:

  • choosing “User Packages” from the hamburger menu;
  • clicking on the pencil icon next to the package name to load the “User Packages :: Edit Package page;
  • activating “PII Scanning & Detection” on the “Extra options” list (use the Ctrl key
    • to avoid de-selecting previously selected options):
    • and saving the change.

3. Enable the Policy “PII Scanning & Detection”

An administrator can enable this feature under Policies > PII Scanning & Detection.

Search must be enabled for a provider data source when it is added. This is done by selecting “Index content for search” on the page that gathers authentication information:

Files that existed before the provider was added are indexed during the initial provider synchronization. Subsequently files are indexed when created or updated, or if a provider cloud sync is executed and new or updated files are discovered.

Search cannot be enabled for an existing provider data source. To verify that content search is enabled for a provider, as an organizational administrator go to the Dashboard. Select the Setting gear icon to go to see the data source provider detail. The Content index for search setting must be set to Yes.

5. Giving Users PII Authorization

Once PII has been enabled for an organization and content search has been enabled on a storage provider, any file of a supported type that is uploaded to that storage provider will be scanned for PII. There are, however, tags- and content-driven PII searching capabilities that are only available to users who have been given PII permissions.

To give PII authorization to a user, an administrator assigns the PII permission to a role on the “Organization roles” page:

That role is assigned to the user on the Edit User page:

Another way to give a user PII authorization is to assign the Admin role. Assigning the Admin role to a user gives several other administrative privileges and should not be done without a complete understanding of the implications.

6. Configuring PII Detection Rules

A set of rules for detecting different kinds of PII is provided with the Enterprise File Fabric. These rules can be used as provided, or the administrator can remove or change rules to meet the organization’s specific requirements.

PII detection rules are defined in a JSON document that is presented on the PII administration tab of the organization’s Policies page:

{  
    "id":"creditcard",
    "tag":"credit card",
    "title":"Credit card numbers",
    "filters":[  
        {  
            "name":"The main credit card filter",
            "code":"creditcard"
        }
    ]
}

This contents of this document must conform to a JSON schema specification that is included with the File Fabric appliance and can be downloaded from the same page:

Prior to editing the JSON document that contains the PII detection rules, make a safe copy of the current version by copying the contents to a text file. That way you can easily revert the changes if needed.

The JSON document consists of an array of structures, each of which describes a rule. Each rule is identified by an id. The id must be unique within the JSON document.

Each rule contains a list of filters.

The JSON schema describes two styles of filters that the JSON document can contain. Only the code filter is currently supported. Here is an example:

{  
    "id":"us_ssn",
    "tag":"US Social Security Number",
    "title":"Social Security Numbers (US)",
    "filters":[  
        {  
            "name":"The main SSN filter",
            "code":"usSsn"
        }
    ]
}

The “tag” value will be the name of the tag in the File Fabric’s tagging system. Tag values must be unique within the JSON document.

The “title” will be the name of the data type in the “Contains PII” tick list on the File Manager’s search screen and in the PII list for a file in the File Manager’s Info panel.

When you try to save your changes to the JSON document on the “PII Detection & Scanning” tab of the “Policies” page, the edited JSON is validated. If your edits have introduced an error then the document will not be saved.

You may also want to remove from the JSON document rules that scan for data items that are not of interest to your organization. In that case, remove the entire section starting with the curly brace before the id, and ending with the comma preceding the next rule (unless you are removing the final rule in the document, in which case there is no comma). For example, if you don’t want the File Fabric to scan for Australian tax file numbers, you would remove this text (including the trailing comma) from the JSON document:

 {  
    "id":"au_taxfilenum",
    "tag":"Australia Tax File Number",
    "title":"Australia Tax File Number",
    "filters":[  
       {  
          "name":"The main Australia Tax File Number filter",
          "code":"auTaxFileNumber"
       }
    ]
 },

7. Changing the Name of the PII Classification

The PII classification is used to organize PII tags that are applied to files by the scanning process. The classification name appears on a pulldown list on the File Manager’s Tags tab:

The default classification name is “PII”. For on-premises File Fabric installations, the appliance administrator can change this name on the Classifications page that is accessed from appladmin’s Settings menu:

The classification cannot be deleted, and a second classification of type “pii” cannot be added.

Information Security Considerations

Shared Team Folders

The File Fabric provides a feature to allow team members to have shared access to folders. Whether or not a Shared Team Folder contained PII files at the time that it was shared, if new files that file contain PII are uploaded or existing files are updated to contain PII then users with access to the folder will gain access to the PII in the files.

The File Fabric provides an option which, if enabled, allows users to generate links for sharing files by download with external recipients. Files containing PII can be shared this way, allowing link recipients to gain access the the PII in the file.

Depending on the settings that are in force when a file sharing link is created, a link may remain active for a long time. These links point to the shared file by name; if the file is updated then a recipient who uses the link to download the file after it has been updated will get a copy of the updated file. Whether or not the original file contained PII, It is possible that the updated file will contain PII. In that case the recipient of the link will gain access to the PII.

The File Fabric also provides an option which, if enabled, allows users to generate links for sharing folders with external recipients. When a folder has been shared this way, the recipient of the link can download any of the files in the folder. If any of those files contain PII then the recipient can gain access to the PII by downloading those files.

As with file sharing links, a folder sharing link may remain active for a long time. Whether or not the folder contained files which contained PII when the link was generated, It is possible that files with PII will be added to the folder or files in the folder will be updated to include PII. If PII has been added to the folder contents while the link is still active then link recipients may gain access to the PII by downloading the files that contain it.

Business Groups

Another File Feature, if enabled, allows the creation of groups of internal and external users with a shared workspace. If files containing PII are placed into a Business Group’s workspace then members of the Business Group will be able to access the PII by download those files.

Troubleshooting

Files Not Being Scanned on Upload

If no files are being scanned, check the following:

Is content search enabled on the File fabric? * Has PII scanning and detection been enabled in the package assigned to the Organization? * Is content search enabled on the provider(s) to which files are big uploaded? * Are the files of one of the supported file types? * Is the file within the configured size limit for content indexing?

If uploaded files are scanned in some cases and not in others, check the following: * Is the file of one of the types for which PII scanning is supported? * Is the file within the size limit for content indexing? * Was content search enabled in the settings of the storage provider to which the file was uploaded?

Files Being Scanned but PII Not Being Detected

If PII is not being detected, confirm that the JSON document that contains the scanning rules includes rules for the PII you expect to be found

Info Panel Doesn’t Show PII Values

Confirm that the user has PII authorization. Only administrators and members to whom PII authorization has been given will see the PII button in the Info panel.

If the button appears but some or all of the PII values are missing, check to see if the rules for the missing values are still in the JSON document. If they have been removed then the corresponding tags and values won’t be displayed.

Search Screen Doesn’t Show PII Search Panel

Confirm that the user has PII authorization. Only administrators and members to whom PII authorization has been given will see the PII search on the Search screen.

PII Classification Not Listed in Dropdown on Tags Tab

Confirm that the user has PII authorization. Only administrators and members to whom PII authorization has been given will see the PII classification on the Tags tab dropdown.