SMB/CIFS Connector (Single User)
last updated on April 21, 2022
For the multi-user SMB/CIFS connector click here.
The SMB connector is a local storage connector for the Enterprise File Fabric. It enables companies to use local network storage with Enterprise File Fabric by indexing and accessing unstructured data through the CIFS/SMB protocol.
This document covers some of the configuration choices for the CIFS File Fabric connector that customers should be aware of.
For enabling the connector see Enabling the SMB/CIFS Connector.
CIFS stands for “Common Internet File System” and is a dialect of SMB created by Microsoft. SMB stands for “Server Message Block” and is a protocol that was designed to allow PC’s to read and write files to a remote host over a local area network. Often the term CIFS and SMB will be used interchangeable by technologists and organizations.
The Enterprise File Fabric uses the SMB protocol to connect to, index, and subsequently perform other operations on POSIX style file systems that may be remote to the File Fabric, including CIFS based Windows systems. File metadata is indexed (aka cached) by the File Fabric. Although ACLs are not cached by the File Fabric, they are implicitly observed during file access.
The SMB connector will work with SMB systems that use port 445 or port 139.
Authentication & Authorization
The File Fabric can be integrated with corporate Identity Access Management systems (IAM), such as LDAP, Active Directory, and SAML, for authentication of users and granting of roles. Authorization of access to resources can be set up using the File Fabric’s permissions management GUI. As the File Fabric is a multi-cloud solution, the permissions model is applied consistently across all of the storage that is attached to the File Fabric, making the File Fabric ideal for deployments with multiple storage providers. This is especially useful with object storage as users are able to manage permissions using a familiar hierarchical, user/group/permission model.
The File Fabric supports connecting to CIFS/SMB shares using the following SMB protocol versions:
In our experience, customers who want the File Fabric to connect to their CIFS/SMB storage would like the File Fabric to retain and honor the permissions that have been historically setup on the shares.
There are generally two main approaches to this:
- IT Managed Permissions
- Private Providers (connectors)
IT Managed Permissions
For corporate shared folders where IT is maintaining access permissions based on groups, it’s manageable to keep the permissions of the CIFS/SMB storage and the File Fabric in sync manually. Each time a user logs in to the File Fabric, their group/role membership is refreshed against the identity management system. In this case all access is through a single File Fabric provider (a provider is a connector that has been configured or use with a storage account), and every operation is attributed in the File Fabric's audit logs to the org. admin.
In this scenario, each File Fabric user has a private connector to access the CIFS/SMB storage akin to their own account and home directory directly on a conventional CIFS/SMB system. This allows the user to see only what they would see if they directly accessed their CIFS share when logged in directly. When operations such as editing and moving files (as authorized by CIFS/SMB) are performed, they are attrinuted in the File Fabric's audit logs to the user who performs them.
This functionality is powered by the File Fabric’s Auto Provisioned Provider. The administrator inputs the basic configuration information such as the UNC path to the data that is to be exposed to the user and, when a user logs into the File Fabric, the provider is automatically added to their File Fabric account using their credentials. This approach can be used to mount users' home directories. Using this model, the provider configured for each user implicitly falls back to the users predefined permissions.
As the File Fabric is caching a unique metadata index for each user you should consider / be aware of:
- Sizing – Sizing calculations need to take into account the size of the metadata generated which will affect the DB sizing. A rough estimate is that you will require at list 500GB for 1 billion files that are indexed, and also some head-room above this for new data.
- Indexing – The index strategy needs to take into account the initiation and re-sync of providers for each user. In particular, indexing of file systems using the CIFS/SMB connector is not as fast as REST based API connectors. Speed is impacted by the directory structures – number of files per directory, number of nested subfolders, etc.
A rough estimate of indexing speed that one should expect is circa 1 million files indexed per hour for an initial synchronization. However, this is heavily dependent on the directory structure, vCPU, memory and number of sync threads and we have seen this drop to 500K per hour for very heavily nested directory structures.
Strategies for Indexing
There are a number of indexing strategies that can be used when deploying the CIFS/SMB connector:
Index on Demand
In this model shares (via the Auto-config provider) are indexed on demand, as users visit directories, rather than having to perform an initial synchronisation up-front. Non-user home shares can be indexed up-front and made available to end user using the sharing capabilities of the File Fabric.
The advantage of this model is that for very large datasets the indexing does not occur immediately and is gradually done over time so it can be more efficient from an onboarding perspective.
The disadvantage of this model is that search and folder downloads are only available for the data that has been indexed. Such functionality can be disabled for users by the administrator.
Index on First Login
The File Fabric can be setup in such a way that on corporate authentication (AD, LDAP, SAML) a user’s home directory is indexed on as they login.
The advantage of this model is that the File Fabric does not need to know or be setup for domain users in advance. It can be deployed, setup with whatever the corporate IAM solution is, and when users log in their home directories starts to be indexed. Other shares can separately be indexed and made available to end user or user roles.
The disadvantage of this model is that the index has to complete before an entire user’s home directory is available, but a user does not need to wait to use the system whilst this occurs. They can start to work with the data that is already indexed.
Further Tuning of Indexing
Above and beyond pure compute requirements for indexing, customers can choose to use
- Dedicated sync node: From an architectural topology perspective a dedicated File Fabric sync node can be used to index CIFS/SMB metadata. A sync node is a regular application node of the File Fabric that is only used for sync. This has the advantage of this is that it provides dedicated resources to the sync job and isolates the resources on other File Fabric application nodes for end user activity.
- Sync Threads – Sync can be setup to use more than one thread. This can speed up the sync process. The number of optimum threads is going to be dependent on factors such as vCPU’s available, and whether a dedicate sync node is being used.
- Concurrent Syncs – The File Fabric out of the box will only index one provider per user at a time. The number of concurrent syncs can be increased if needs require.
Understand the strategies and think about what is best for your company or organisation.
It is often better to have more discrete shares than one large share. The increased granularity makes them easier to manage from a File Fabric perspective.