SMB/CIFS Connector (Single User)

last updated on April 21, 2022

For the multi-user SMB/CIFS connector click here.

Introduction

The SMB connector is a local storage connector for Access Anywhere. It enables companies to use local network storage with Access Anywhere by indexing and accessing unstructured data through the CIFS/SMB protocol.

This document covers some of the configuration choices for the CIFS Access Anywhere connector that customers should be aware of.

For enabling the connector see Enabling the SMB/CIFS Connector.

Terminology

CIFS stands for “Common Internet File System” and is a dialect of SMB created by Microsoft. SMB stands for “Server Message Block” and is a protocol that was designed to allow PC’s to read and write files to a remote host over a local area network. Often the term CIFS and SMB will be used interchangeable by technologists and organizations.

Access Anywhere uses the SMB protocol to connect to, index, and subsequently perform other operations on POSIX style file systems that may be remote to Access Anywhere, including CIFS based Windows systems. File metadata is indexed (aka cached) by Access Anywhere. Although ACLs are not cached by Access Anywhere, they are implicitly observed during file access.

Ports

The SMB connector will work with SMB systems that use port 445 or port 139.

Authentication & Authorization

The Access Anywhere can be integrated with corporate Identity Access Management systems (IAM), such as LDAP, Active Directory, and SAML, for authentication of users and granting of roles. Authorization of access to resources can be set up using Access Anywhere’s permissions management GUI. As Access Anywhere is a multi-cloud solution, the permissions model is applied consistently across all of the storage that is attached to Access Anywhere, making Access Anywhere ideal for deployments with multiple storage providers. This is especially useful with object storage as users are able to manage permissions using a familiar hierarchical, user/group/permission model.

The Access Anywhere supports connecting to CIFS/SMB shares using the following SMB protocol versions:

  • 1.0
  • 2.0
  • 2.1
  • 3.0

Preserving Permissions

In our experience, customers who want Access Anywhere to connect to their CIFS/SMB storage would like Access Anywhere to retain and honor the permissions that have been historically setup on the shares.

There are generally two main approaches to this:

  • IT Managed Permissions
  • Private Providers (connectors)

IT Managed Permissions

For corporate shared folders where IT is maintaining access permissions based on groups, it’s manageable to keep the permissions of the CIFS/SMB storage and Access Anywhere in sync manually. Each time a user logs in to Access Anywhere, their group/role membership is refreshed against the identity management system. In this case all access is through a single Access Anywhere provider (a provider is a connector that has been configured or use with a storage account), and every operation is attributed in Access Anywhere's audit logs to the org. admin.

Private Providers

In this scenario, each Access Anywhere user has a private connector to access the CIFS/SMB storage akin to their own account and home directory directly on a conventional CIFS/SMB system. This allows the user to see only what they would see if they directly accessed their CIFS share when logged in directly. When operations such as editing and moving files (as authorized by CIFS/SMB) are performed, they are attrinuted in Access Anywhere's audit logs to the user who performs them.

This functionality is powered by Access Anywhere’s Auto Provisioned Provider. The administrator inputs the basic configuration information such as the UNC path to the data that is to be exposed to the user and, when a user logs into Access Anywhere, the provider is automatically added to their Access Anywhere account using their credentials. This approach can be used to mount users' home directories. Using this model, the provider configured for each user implicitly falls back to the users predefined permissions.

As Access Anywhere is caching a unique metadata index for each user you should consider / be aware of:

  1. Sizing – Sizing calculations need to take into account the size of the metadata generated which will affect the DB sizing. A rough estimate is that you will require at list 500GB for 1 billion files that are indexed, and also some head-room above this for new data.
  2. Indexing – The index strategy needs to take into account the initiation and re-sync of providers for each user. In particular, indexing of file systems using the CIFS/SMB connector is not as fast as REST based API connectors. Speed is impacted by the directory structures – number of files per directory, number of nested subfolders, etc.

A rough estimate of indexing speed that one should expect is circa 1 million files indexed per hour for an initial synchronization. However, this is heavily dependent on the directory structure, vCPU, memory and number of sync threads and we have seen this drop to 500K per hour for very heavily nested directory structures.

Strategies for Indexing

There are a number of indexing strategies that can be used when deploying the CIFS/SMB connector:

Index on Demand

In this model shares (via the Auto-config provider) are indexed on demand, as users visit directories, rather than having to perform an initial synchronisation up-front. Non-user home shares can be indexed up-front and made available to end user using the sharing capabilities of Access Anywhere.

The advantage of this model is that for very large datasets the indexing does not occur immediately and is gradually done over time so it can be more efficient from an onboarding perspective.

The disadvantage of this model is that search and folder downloads are only available for the data that has been indexed. Such functionality can be disabled for users by the administrator.

Index on First Login

The Access Anywhere can be setup in such a way that on corporate authentication (AD, LDAP, SAML) a user’s home directory is indexed on as they login.

The advantage of this model is that Access Anywhere does not need to know or be setup for domain users in advance. It can be deployed, setup with whatever the corporate IAM solution is, and when users log in their home directories starts to be indexed. Other shares can separately be indexed and made available to end user or user roles.

The disadvantage of this model is that the index has to complete before an entire user’s home directory is available, but a user does not need to wait to use the system whilst this occurs. They can start to work with the data that is already indexed.

Further Tuning of Indexing

Above and beyond pure compute requirements for indexing, customers can choose to use

  • Dedicated sync node: From an architectural topology perspective a dedicated Access Anywhere sync node can be used to index CIFS/SMB metadata. A sync node is a regular application node of Access Anywhere that is only used for sync. This has the advantage of this is that it provides dedicated resources to the sync job and isolates the resources on other Access Anywhere application nodes for end user activity.
  • Sync Threads – Sync can be setup to use more than one thread. This can speed up the sync process. The number of optimum threads is going to be dependent on factors such as vCPU’s available, and whether a dedicate sync node is being used.
  • Concurrent Syncs – The Access Anywhere out of the box will only index one provider per user at a time. The number of concurrent syncs can be increased if needs require.

Other Tips

Understand the strategies and think about what is best for your company or organisation.

It is often better to have more discrete shares than one large share. The increased granularity makes them easier to manage from a Access Anywhere perspective.