PII Scanning and Detection

REPLACED BY CONTENT DISCOVERY

Version 1808 of the Enterprise File Fabric introduced Content Discovery, which supersedes this PII Scanning and Detection feature. Content Discovery incorporates all of the PII feature’s functionality and is upwardly compatible with version 1803’s PII detection rules. Content Discovery organises sets of content detectors into one or more Content Detection Categories, each of which contains a set of content detectors. PII is (or can be) one such Content Detection Category.

This page covers the identification and classification of PII (Personally Identifiable Information).

The Enterprise File Fabric's PII Scanning and Detection feature helps enterprise customers manage personal information by automatically detecting PII in documents and alerting the organisation’s information security specialists or other designated users to its presence.

Under GDPR and other data privacy regulations, businesses face increasing regulatory responsibilities to secure the personal data they collect from customers and others. For an organisation with substantial data assets, one of the challenges associated with ensuring that personally identifiable information (PII) is managed appropriately is knowing where that data resides in their systems.

Applied to:

  • Enterprise File Fabric Appliance [Add-On] (until v1803)

Scanning

The PII Scanning and Detection feature works by scanning documents when they are added or updated. Documents are searched using a configurable set of rules, looking for personal information such as telephone numbers, email addresses and national identity numbers.

Tagging

Files in which personal information is found are classified as PII with the types of PII data that they contain. Users with appropriate permissions can see the PII that has been found in a document on the “info” tab for that document:

Notifications

Users are notified when documents with PII are uploaded or shared.

Users with PII permission, including administrators, receive an email:

The file owner, (the user who uploaded the file,) receives an email and a message:

Notifications are also sent to administrators and PII users when a sharing link is generated for a PII file.

Users with appropriate permission are able to easily search for and retrieve documents tagged as containing PII:

Uploading

When a file is uploaded, updated or synchronized the File Fabric recognizes it as containing new content; it is a candidate for being scanned for PII.

To be scanned the file must be located on a storage provider that has content search enabled (this is set set when the provider is created).

Files with the following extensions will be scanned: 'txt', 'doc', 'docx', 'rtf', 'pdf', 'htm', 'html', 'xls', 'xlsx', 'ppt' or 'pptx'.

The file size limit for content indexing that can be set on the appliance administrator’s SIte Search Integration page also applies to PII scanning and detection.

Scanning for PII

While a file is being scanned for PII, a visual indicator that the scan is in progress appears next to the file name in the File Manager.

A warning message that a scan is in progress is also shown at the top of the directory listing in the File Manager.

Tagging of PII Files

When PII is detected in a file, a tag is added to the file indicating the type of PII that was detected. For example, if the File Fabric is configured to scan for US Social Security Numbers (SSNs) and one or more data values that match the US SSN detection rule are found when the file is scanned, then a tag with the value “US Social Security Number” will be added to the file's metadata under the PII classification.

Notifications

Administrators and users with PII permission are notified when a file that matches the PII rules has been detected.

Users with PII permission, including administrators, receive a notification by email:

The file owner (the user who uploaded the file), receives both an email and a message.

Messages are delivered through the Cloud File Manager and other applications.

Sharing

A confirmation dialog is presented to users who share documents that contain PII:

When the file is shared notifications are sent by email to users with PII permission, including administrators:

File and Folder Indicators

The folder icons for folders that contain PII files - either directly or in a child folder - are marked with a special decoration in the File Manager:

File icons for PII files also have a special decoration in the File Manager:

When the contents of a folder that contains PII files - either directly or in a child folder - are displayed in the right hand panel of the File Manager, a notification about the presence of PII files is added to the display:

Searching

Content searches through the web-based File Manager can filter for specific PII information. The option is available for users with PII permission, including administrators. Select one or more checkboxes to filter for documents with those types of PII.

File Information

If a file contains PII, a “Show PII matches” button is displayed on the File Manager Info tab for the file. This is available to users with PII or administration permissions.

Pressing the button will cause the PII tags and values to be displayed:

piidiscovery

Tag Cloud

As with other classifications, the tags belonging to the PII classification can be displayed in a tag cloud by selecting the PII classification from the classifications dropdown list on the File Manager’s Tags tab:

Also as with other classifications, a list of the files to which a specific tag has been attached can be displayed by clicking on the tag in the tag cloud.

These are the configuration steps that are required to make PII Discovery available:

Appliance Administrator:

  • Enable the Content Search Engine
  • Enable PII Scanning and Detection in User Packages

(Organization) Administrator:

  • Enable the Policy “PII Scanning & Detection”
  • Add Storage Providers with Content Search
  • Give Users PII Authorization
  • Configure PII Detection Rules (optional)
  • Change the Name of the PII Classification (optional)

1. Enabling the Content Search Engine

Content search must be enabled for PII scanning and detection to work. The content search engine scans documents for PII as they are uploaded or synchronized. The search engine is available only with the Enterprise File Fabric appliance and must be explicitly enabled.

Here is a link to instructions for configuring the content search engine: Enabling Deep Content Search and PDF Burn Service

2. Enabling PII Scanning and Detection in User Packages

PII Scanning and Detection is only available to Organizations that have been assigned a User Package in which the feature is enabled. The appliance administrator (appladmin) can set this option for a Package by:

  • choosing “User Packages” from the hamburger menu;
  • clicking on the pencil icon next to the package name to load the “User Packages :: Edit Package page;
  • activating “PII Scanning & Detection” on the “Extra options” list (use the Ctrl key
    • to avoid de-selecting previously selected options):
    • and saving the change.

3. Enable the Policy “PII Scanning & Detection”

An administrator can enable this feature under Policies > PII Scanning & Detection.

Search must be enabled for a provider data source when it is added. This is done by selecting “Index content for search” on the page that gathers authentication information:

Files that existed before the provider was added are indexed during the initial provider synchronization. Subsequently files are indexed when created or updated, or if a provider cloud sync is executed and new or updated files are discovered.

Search cannot be enabled for an existing provider data source. To verify that content search is enabled for a provider, as an organizational administrator go to the Dashboard. Select the Setting gear icon to go to see the data source provider detail. The Content index for search setting must be set to Yes.

5. Giving Users PII Authorization

Once PII has been enabled for an organization and content search has been enabled on a storage provider, any file of a supported type that is uploaded to that storage provider will be scanned for PII. There are, however, tags- and content-driven PII searching capabilities that are only available to users who have been given PII permissions.

To give PII authorization to a user, an administrator assigns the PII permission to a role on the “Organization roles” page:

That role is assigned to the user on the Edit User page:

Another way to give a user PII authorization is to assign the Admin role. Assigning the Admin role to a user gives several other administrative privileges and should not be done without a complete understanding of the implications.

6. Configuring PII Detection Rules

A set of rules for detecting different kinds of PII is provided with the Enterprise File Fabric. These rules can be used as provided, or the administrator can add, remove or change them.

The PII Detection Rules are defined in a JSON document that is accessible from the PII Scanning & Detection tab of the organization’s Policies page. Prior to editing the PII Detection Rules, make a safe copy of the JSON document by copying the contents to a text file. That way you can easily revert the changes if needed.

The PII Detection Rules JSON document is an array of objects with each object describing one rule. A rule has the following properties:

  • id - A unique identifier.
  • title - The name of the rule shown in the user interface
  • tag - Files found with this rule are tagged with this value
  • filters - An array of one PII filter objects with matching criteria

The document is validated against a JSON schema on update. If there is an error the document will not be saved:

The JSON schema can be downloaded from the same page:

Rule Id

To add a scanning rule create a new unique id. An id must only contain the characters A-Z, a-z 0-9 and _ (underscore). It is only used internally and should not be changed.

Rule Title

The title will be the name of the data type in the “Contains PII” checklist on the File Manager’s search screen and in the PII list for a file in the File Manager’s Info panel.

Rule Tag

The tag value is the name of one tag. It does not have to be predefined. Tag values should be unique within the JSON document.

Rule Filters

Two types of matching filters are supported. Regular expression filters support the detection of PII content through search patterns. Code filters are predefined filters in the product that match common types of PII.

Regular Expression Filters

Rules created by users (admins) can each contain one user-supplied regular expression filter.

The regex property is the regular expression that will be used to detect data of the type described by the rule when a file is scanned. The regular expression must be delimited by slashes (‘/’). For more information on syntax see Regexp Reference.

This is an example of a rule using a regular expression filter:

 {  
    "id":"USVIN",
    "tag":"US VIN",
    "title":"US Vehicle Identification Number",
    "filters":[  
       {  
          "name":"VIN filter",
          "regex":"/([A-HJ-NPR-Z0-9]{17})/"
       }
    ]
 }
Code Filters

This is an example of a rule using a code filter:

 {  
    "id":"us_ssn",
    "tag":"US Social Security Number",
    "title":"Social Security Numbers (US)",
    "filters":[  
       {  
          "name":"The main SSN filter",
          "code":"usSsn"
       }
    ]
 }

Adding new code filters to this version of the File Fabric requires paid professional services support from Storage Made Easy. Users wishing to add their own code filters should contact their SME sales representatives.

The following predefined code filters are included with the File Fabric:

  • General
    • bankIban - Bank account numbers (IBAN)
    • bankSwift - SWIFT
    • creditcard - Credit cards
    • email - Email
    • Icd10cm - ICD 10-CM Code rule
    • Icd9cm - ICD 9-CM Code rule
    • Ip - IPv4 and IPv6 addresses
  • Australia
    • auMedicare - Australian Medicare account number
    • auTaxFileNumber - Australian Tax File number
    • Brazil
    • brCpfNumber - Brazilian CPF Number rule
  • Canada
    • caBritishColumbiaInsuranceNumber - British Columbian Personal Health Number (PHN)
    • caOntarioInsuranceNumber - Ontario Health Insurance Plan number
    • caPassport - Canadaian Passport
    • caQuebecInsuranceNumber - Quebec Health Insurance Number
    • caSin - Canadaian Social Insurance Number (SIN)
  • China
    • cnPassport - Chinese passport
    • Germany
    • dePassport - German passport
  • Spain
    • esNie - Spanish NIE Number rule
    • esNif - Spanish NIF Number rule
    • esPassport - Spanish passport
  • French
    • frIDCard - French National ID Card
    • frPassport - French passport
    • frSsn - French social security number (NIR)
  • India
    • inPersonalNumber - Indian Personal Permanent Account Number
  • Japan
    • jpPassport - Japanese passport
  • South Korea
    • krPassport - South Korean passport
  • Mexico
    • mxNationalNumber - Mexican National Identification Number
    • mxPassport- Mexican passport
  • Netherlands
    • nlIdNumber - Dutch national identification number (BSN)
  • United Kingdom
    • ukDrivingLicense - UK Driving License rule
    • ukNationalInsuranceNumber - UK National Insurance Number rule
    • ukNhsNumber - UK NHS Number rule
    • ukNumberPlate - UK Number Plate
    • ukPassport - UK passport
    • ukTaxpayerNumber -UK Taxpayer Identification Number
    • ukTelephone - UK telephone number
  • United States

Removing Rules

You may also want to remove from the JSON document rules that scan for data items that are not of interest to your organization. In that case, remove the entire section starting with the curly brace before the id, and ending with the comma preceding the next rule (unless you are removing the final rule in the document, in which case there is no comma). For example, if you don’t want the File Fabric to scan for Australian tax file numbers, you would remove this text (including the trailing comma) from the JSON document:

 {  
    "id":"au_taxfilenum",
    "tag":"Australia Tax File Number",
    "title":"Australia Tax File Number",
    "filters":[  
       {  
          "name":"The main Australia Tax File Number filter",
          "code":"auTaxFileNumber"
       }
    ]
 },

7. Changing the Name of the PII Classification

The PII classification is used to organize PII tags that are applied to files by the scanning process. The classification name appears on a pulldown list on the File Manager’s Tags tab:

The default classification name is “PII”. For on-premises File Fabric installations, the appliance administrator can change this name on the Classifications page that is accessed from appladmin’s Settings menu:

The classification cannot be deleted, and a second classification of type “pii” cannot be added.

Shared Team Folders

The File Fabric provides a feature to allow team members to have shared access to folders. Whether or not a Shared Team Folder contained PII files at the time that it was shared, if new files that file contain PII are uploaded or existing files are updated to contain PII then users with access to the folder will gain access to the PII in the files.

The File Fabric provides an option which, if enabled, allows users to generate links for sharing files by download with external recipients. Files containing PII can be shared this way, allowing link recipients to gain access the the PII in the file.

Depending on the settings that are in force when a file sharing link is created, a link may remain active for a long time. These links point to the shared file by name; if the file is updated then a recipient who uses the link to download the file after it has been updated will get a copy of the updated file. Whether or not the original file contained PII, It is possible that the updated file will contain PII. In that case the recipient of the link will gain access to the PII.

The File Fabric also provides an option which, if enabled, allows users to generate links for sharing folders with external recipients. When a folder has been shared this way, the recipient of the link can download any of the files in the folder. If any of those files contain PII then the recipient can gain access to the PII by downloading those files.

As with file sharing links, a folder sharing link may remain active for a long time. Whether or not the folder contained files which contained PII when the link was generated, It is possible that files with PII will be added to the folder or files in the folder will be updated to include PII. If PII has been added to the folder contents while the link is still active then link recipients may gain access to the PII by downloading the files that contain it.

Business Groups

Another File Feature, if enabled, allows the creation of groups of internal and external users with a shared workspace. If files containing PII are placed into a Business Group’s workspace then members of the Business Group will be able to access the PII by download those files.

See Also

Files Not Being Scanned on Upload

If no files are being scanned, check the following:

Is content search enabled on the File fabric? * Has PII scanning and detection been enabled in the package assigned to the Organization? * Is content search enabled on the provider(s) to which files are big uploaded? * Are the files of one of the supported file types? * Is the file within the configured size limit for content indexing?

If uploaded files are scanned in some cases and not in others, check the following: * Is the file of one of the types for which PII scanning is supported? * Is the file within the size limit for content indexing? * Was content search enabled in the settings of the storage provider to which the file was uploaded?

Files Being Scanned but PII Not Being Detected

If PII is not being detected, confirm that the JSON document that contains the scanning rules includes rules for the PII you expect to be found

Info Panel Doesn’t Show PII Values

Confirm that the user has PII authorization. Only administrators and members to whom PII authorization has been given will see the PII button in the Info panel.

If the button appears but some or all of the PII values are missing, check to see if the rules for the missing values are still in the JSON document. If they have been removed then the corresponding tags and values won’t be displayed.

Search Screen Doesn’t Show PII Search Panel

Confirm that the user has PII authorization. Only administrators and members to whom PII authorization has been given will see the PII search on the Search screen.

PII Classification Not Listed in Dropdown on Tags Tab

Confirm that the user has PII authorization. Only administrators and members to whom PII authorization has been given will see the PII classification on the Tags tab dropdown.