# Content Discovery Administration last updated on July 20, 2023 This page describes the steps that are required to enable, set-up and manage [[contentdiscovery]] to detect information of interest within documents and other files. Troubleshooting, and a discussion of information security considerations is also included. See also: * [[contentdiscovery]] * [[automationrules]] * [[server/solr]] ## Configuration ### Configuring the Appliance The Appliance Administrator (appladmin) is responsible for assuring the content search engine for the appliance is enabled, and that Content Discovery is enabled for an Organization. #### 1. Enabling the Content Search Engine Content search must be enabled for Content Discovery to work. Starting with v1808 it is enabled by default for new installations. See [[server/solr]] for more information. #### 2. Enabling Content Discovery for an Organization Content Discovery is only available to Organizations that have been assigned a User Package in which the feature is enabled. This can be set for a Package by: 1. Choosing “User Packages” from the hamburger menu; 2. Clicking on the pencil icon next to the package name to load the “User Packages :: Edit Package" page; 3. Activating “Content Detection” on the “Extra options (add-ons)” list (use the Ctrl key to avoid de-selecting previously selected options); 4. Saving the change. {{ :contentdiscoveryconfig:enable_cd_in_package.png?direct&300 |}} Assign the Package to the User if it has not already been assigned. ### Configuring the Organization After the appliance is configured Content Discovery must be enabled for each organization, and content search enabled for each participating storage provider. Users who will be using Content Discovery features will also need authorization. #### 1. Enabling Content Discovery Enable this feature under Organization > Content Discovery: {{ :contentdiscoveryconfig:org_cd_on_-_off.png?400 |}} #### 2. Adding Storage Providers for Content Search Search must be enabled for a provider data source when it is added. This is done by selecting “Index content for search” on the page that gathers authentication information: {{ ::contentdiscoveryconfig:cos_info.png?500 |}} Files that existed on the storage before the provider was added are indexed during the initial provider synchronization. Subsequently files are indexed when created or updated, or if a provider cloud sync is executed and new or updated files are discovered. To determine whether content search is enabled for a provider, as an organizational administrator go to the Dashboard. Select the Setting gear icon to go to see the data source provider detail. //The Content index// for search setting must be set to //Yes//. {{ :piidiscovery:provider_info.png?200 |}} #### 3. Authorizing Users for Content Discovery Users with the Administrator role will be able to use features such as search and notifications that are available with Content Discovery. Other users must belong to a role where the //Content Discovery// permission is enabled To enable a user to use Content Discovery features, to a user, the org. admin assigns the “review Content Discovery” permission to a role on the “Organization roles” page. {{ :contentdiscoveryconfig:roles_editor.png?direct&550 |}} Then assigns the role to the user on the “Organization staff :: Edit User” page. {{ :contentdiscoveryconfig:edit_user_roles.png?direct&250 |}} Note that the “review Content Discovery” permission replaces the “PII” permission that available in v1803 of Access Anywhere. ## Administration The Content Discovery module, accessible by administrators from the Organisation Menu, allows users to describe the content they want to detect as a set of text strings or patterns. //Content Detectors// match pieces of content, and are grouped into //Content Detection Categories// for classification. Content Detectors can be reused between different Content Detection Categories. {{ :contentdiscovery:detectors_can_be_shared.png?direct |}} ### Creating a Content Detection Category You can create a new Content Detection Category by entering the name and pressing the Create Content Discovery Category button. {{ :contentdiscoveryconfig:new_cd_category.png?direct&400 |}} You can also select an existing template to start with {{ :contentdiscoveryconfig:cd_categories_templates.png?direct&400 |}} When you create a Content Detection Category this way it will be pre-populated with the content detectors in the template. ### Editing a Content Detection Category Content Detection Categories appear in a list on the Content Detection page: {{ :contentdiscoveryconfig:cd_categories_list.png?direct&600 |}} To delete a Content Detection Category click on the ‘x’ next to its name. A category cannot be removed if matching content was detected in one or more files. To change the name of a Content Detection Category, click on the pencil icon. To expose the set of content detectors in the category, click on the triangle. {{ :contentdiscoveryconfig:cd_category_edit_and_delete.png |}} To remove a content detector from a category, click on the ‘x’ next to the detector’s name. To add a detector from the list of available detectors, click in the empty box at the bottom of the list of detectors in your category definition. {{ :contentdiscoveryconfig:list_of_content_detectors.png?direct&600 |}} Then select the detector from the drop-down list and click on the Add button. You can add more than one detector at a time by clicking on each prior to clicking the Add button: {{ :contentdiscoveryconfig:add_detectors.png?direct |}} To remove a detector from your category, click on the ‘X’ next to the detector’s name. You can use the expansion arrow next to the detector’s name to open the detector editor, which is discussed in the Content Detectors section later on this page. ### Creating Content Detectors A set of content detectors is supplied. Your can use these detectors as-is or change them. You can also add your own detectors, and remove detectors from the set available. To see the list of available content detectors, click on the triangle next to //Available Content Detectors// on the Content Detection page: Recall that when matching content is found during a document scan, the document is tagged using the category as the tag classification and the detector that detected the matching content as the tag value. To change a detector’s name or it’s tag value, click on the pencil next to its name to open a dialog box where these values can be changed. {{ :contentdiscovery:content_detector_name_and_tag_editor.png?direct&400 |}} To remove a detector from any categories in which it is used and delete it from your org’s list of available connectors, click on the ‘x’ next to its name: {{ :contentdiscovery:delete_content_detector.png?direct |}} To inspect or edit the detector definition, click on the triangle to the left of its name. This will expose the list of rules in the detector. {{ :contentdiscovery:detector_inspector_-_editor.png?direct&400 |}} To create a new detector, click on the Create New Detector button under the list of available detectors: {{ :contentdiscoveryconfig:create_new_detector.png?direct&150 |}} The New Content Detector window will open, allowing you to set the new detector’s Name and Tag: {{ :contentdiscoveryconfig:new_content_detector.png?direct&400 |}} ### Creating Filters Detectors contain filters which contain criteria for finding matching content. {{ :contentdiscovery:detectors_contain_filters.png?direct&400 |}} If any of the filters match content during a file scan, then that file is marked is tagged with the detector’s tag value. Each filter has a title, a type and a value. The type can be either RegEx or Built-in. Filters of type RegEx can be created and edited by the org. admin. The value of a RegEx filter is a regular expression in [[https://www.pcre.org/current/doc/html/pcre2syntax.html|PCRE syntax]]. The value of a built-in filter is the name of a small piece of content detection code that is provided with Access Anywhere for use inside of filters. To edit a filter, align the cursor with the name of the filter you wish to edit, and click on the arrow that appears to the right: {{ :contentdiscovery:edit_filter.png?direct&600 |}} The filter editor will open. Here you can change the name, type and value of the filter. If you choose the Built-in type and the click in the Value field, a pick list of built-in filters will be displayed. {{ :contentdiscovery:list_of_built-ins.png?direct&400 |}} You can pick the built-in that you want from this list. To add a filter, click the “Add a new filter” button under the bottom right of the Filters list. {{ :contentdiscovery:add_new_filter.png?direct&600 |}} This will open the Create New Filter window. {{ :contentdiscovery:create_new_filter.png?direct&400 |}} To remove a filter from a detector, click on the ‘x’ that appears to the right when the cursor is aligned with the filter name. Unassigned detectors get assigned to a Default Category which would cause it to scan for these every time. ### Removing Categories If a Content Discovery Category is in use then it cannot be deleted.  By "in use" we mean that there are files with values assigned to classifications within that Category. If a Content Discovery Category is not in use (no files have a value assigned to a classification within that Category) then the Category can be removed from the Content Discovery configuration, in which case the Category will no longer appear in the list of tags. (You may have to wait for cached information to be removed or log out and back in to see the change.) ### Removing Detectors from Categories If a Content Detector is in use within a Content Discovery Category then it cannot be removed from that Category. By "in use" we mean that there are files with values assigned to that Detector’s classification. If a Content Detector within a Content Discovery Category is not in use (no files have a value assigned to that Detector’s classification) then the Detector can be removed from the Content Discovery Category. #### Exporting Content Discovery Configuration The content discovery configuration (categories, detectors and filters) can be exported to a text file in JSON format. To export the current configuration, click on the Export Content Discovery Configuration button at the bottom of the Content Discovery page. #### Importing Content Discovery Configuration A content discovery configuration file, in JSON format, can be imported. First click on the triangle next to the Import Configuration label on the Content Discovery page. Then you can paste (or type) the JSON into the pane on the left, or drag a file containing the JSON configuration into the pane on the right. Finally, click on the Import Content Discovery Configuration to complete the operation. {{ :contentdiscoveryconfig:cd_import_configuration.png?direct&600 |}} When a new configuration is imported, neither categories nor content detectors that are in the current configuration but are not in the imported configuration will be removed from the current configuratio. Similarly, assignments of detectors to categories will not be removed. New assignments of detectors to categories will, however be reflected. Changes to the contents of detectors are handled differently. If a content detector exists in the current configuration and also in the imported configuration file then the new definition replaces the old. ## Information Security Considerations ### Shared Team Folders The Access Anywhere provides a feature to allow team members to have shared access to folders. Whether or not a Shared Team Folder contained matching content files at the time that it was shared, if new files that file contain matching content are uploaded, or if existing files are updated to contain matching content then users with access to the folder will gain access to the matching content in the files. ### File Sharing Links The Access Anywhere provides an option which, if enabled, allows users to generate links for sharing files by download with external recipients. Files containing matching content can be shared this way, allowing link recipients to gain access the matching content in the files. Depending on the settings that are in force when a file sharing link is created, a link may remain active for a long time. These links point to the shared file by name; if the file is updated then a recipient who uses the link to download the file after it has been updated will get a copy of the updated file. Whether or not the original file contained matching content, it is possible that the updated file will contain matching content. In that case the recipient of the link will gain access to the matching content. ### Folder Sharing Links The Access Anywhere also provides an option which, if enabled, allows users to generate links for sharing folders with external recipients. When a folder has been shared this way, the recipient of the link can download any of the files in the folder. If any of those files contain matching content then the recipient can gain access to the matching content by downloading those files. As with file sharing links, a folder sharing link may remain active for a long time. Whether or not the folder contained files which contained matching content when the link was generated, It is possible that files with matching content will be added to the folder or files in the folder will be updated to include PII. If matching content has been added to the folder contents while the link is still active then link recipients may gain access to the matching content by downloading the files that contain it. ### Business Groups Another File Feature, if enabled, allows the creation of groups of internal and external users with a shared workspace. If files containing matching content are placed into a Business Group’s workspace then members of the Business Group will be able to access the matching content by download those files. ## Troubleshooting ### Files Not Being Scanned on Upload If no files are being scanned, check the following: * Is content search enabled on Access Anywhere? * Has Content Discovery been enabled in the package assigned to the Organization? * Is content search enabled on the provider(s) to which files are big uploaded? * Are the files of one of the supported file types? * Is the file within the configured size limit for content indexing? If uploaded files are scanned in some cases and not in others, check the following: * Is the file that is not being scanned of one of the types for which Content Discovery is supported? * Is the file that is not being scanned within the size limit for content indexing? * Was content search enabled in the settings of the storage provider to which the file was uploaded? ### Files Being Scanned but matching content Not Being Detected If matching content is not being detected, confirm that the org.’s Content Discovery configuration contains includes categories with detectors for the content you expect to be found ### Info Pane Doesn’t Show Matching cCntent Confirm that the user has Content Discovery authorization. Only administrators and members to whom Content Discovery authorization has been given will see the Content Discovery button in the Info pane. If the button appears but some or all of the Content Discovery values are missing, check to see if the content detectors for the missing values are still in the org.’s Content Discovery configuration. If they have been removed then the corresponding tags and values won’t be displayed. ### Search Screen Doesn’t Show the Content Discovery Controls Confirm that the user has Content Discovery authorization. Only administrators and members to whom Content Discovery authorization has been given will see the “Content Detection Categories” and “Detected Content” controls on the Search screen. ### Content Discovery Classifications Not Listed in Dropdown on Tags Tab Confirm that the user has Content Discovery authorization. Only administrators and members to whom Content Discovery authorization has been given will see Content Discovery classifications on the Tags tab dropdown.