CIFS/SMB Connector Overview

The CIFS connector is a local storage connector for the Enterprise File Fabric. It enables companies to use local network storage with Enterprise File Fabric by indexing and accessing unstructured data through the CIFS/SMB protocol.

This document covers some of the configuration choices for the CIFS File Fabric connector that customers should be aware of.

For enabling the connector see Enabling the CIFS Provider.

CIFS stands for “Common Internet File System” and is a dialect of SMB created by Microsoft. SMB stands for “Server Message Block” and is a protocol that was designed to allow PC’s to read and write files to a remote host over a local area network. Often the term CIFS and SMB will be used interchangeable by technology and organization.

The Enterprise File Fabric uses the SMB protocol to connect to, index, and subsequently do other operations, to POSIX style file systems that may be remote to the Fabric, including CIFS based Windows systems. Currently file metadata is indexed (aka cached), but ACLs are not but are implicitly observed during file access.

The File Fabric can be integrated with corporate Identity Access Management systems (IAM), such as LDAP, Active Directory, and SAML, for authentication of users and granting of roles. Authorization of access to resources can be setup using the File Fabric’s visual permissions feature. As the File Fabric is a multi-cloud solution, this model transcends any one particular siloed permissions management and is therefore ideal for deployments with multiple storage providers, and with object storage in particular, as users are able to manage permissions using a familiar hierarchical, user/group/permission model.

The File Fabric supports connecting to C/SMB shares using the following protocol versions:

  • 1.0
  • 2.0
  • 2.1
  • 3.0

In our experience, customers who want the File Fabric to connect to their CIFS/SMB storage would like the File Fabric to retain and honor the permissions that have been historically setup on the shares.

There are generally two main approaches to this:

  • IT Managed Permissions
  • Private Providers (connectors)

IT Managed Permissions

For corporate shared folders, where IT is maintaining access permissions based on groups, it’s manageable to keep the permissions of the C/SMB and the File Fabric in sync manually. Each time a user logs to the File Fabric, their group/role membership is refreshed against the identity management system.

Private Providers

In this scenario, each File Fabric user has a private connector to access the C/SMB akin to their own account and home directory directly on C/SMB. This allows the user to see only what they would see if they directly accessed their CIFS share when logged in directly and supports operations such as editing and moving files (authorized by C/SMB) as that user. The audit trail (if setup) would also log activity under that user.

This functionality is powered by the File Fabric’s Auto Provisioned Provider. The administrator inputs the basic configuration details, like the UNC path the share, and when a user logs into the File Fabric, the provider is automatically added their account using their credentials. One such use case is mounting a users home directory. Using this model, the provider configured for each user implicitly falls back to the users predefined permissions.

As the File Fabric is caching a unique metadata index for each user you should consider / be aware of:

  1. Sizing – Sizing calculations need to take into account the size of the metadata generated which will affect the DB sizing. A rough estimate is that you will require at list 500GB for 1 billion files that are indexed, and also some head-room above this for new data.
  2. Indexing – The index strategy needs to take into account the initiation and resync of providers for each user. In particular, indexing of file systems using the C/SMB connector is not as fast as REST based API connectors. Speed is impacted by the directory structures – number of files per directory, number of nested subfolders, etc.

A rough estimate of indexing speed that one should expect is circa 1 Million files indexed per hour for an initial synchronization. However, this is heavily dependent on the directory structure, vCPU, memory and number of sync threads and we have seen this drop to 500K per hour for very heavily nested directory structures.

There are a number of indexing strategies that can be used when deploying the C/SMB connector:

Index on Demand

In this model shares (via the Auto-config provider) are indexed on demand, as users visit directories, rather than having to perform an initial synchronisation up-front. Non-user home shares can be indexed up-front and made available to end user using the sharing capabilities of the File Fabric.

The advantage of this model is that for very large datasets the indexing does not occur immediately and is gradually done over time so it can be more efficient from an onboarding perspective.

The disadvantage of this model is that search and folder downloads are only available for the data that has been indexed. Such functionality can be disabled for users by the administrator.

Index on First Login

The File Fabric can be setup in such a way that on corporate authentication (AD, LDAP, SAML) a user’s home directory is indexed on as they login.

The advantage of this model is that the File Fabric does not need to know or be setup for domain users in advance. It can be deployed, setup with whatever the corporate IAM solution is, and when users log in their home directories starts to be indexed. Other shares can separately be indexed and made available to end user or user roles.

The disadvantage of this model is that the index has to complete before an entire user’s home directory is available, but a user does not need to wait to use the system whilst this occurs. They can start to work with the data that is already indexed.

Above and beyond pure compute requirements for indexing, customers can choose to use

  • Dedicated sync node: From an architectural topology perspective a dedicated File Fabric sync node can be used to index C/SMB metadata. A sync node is a regular application node of the File Fabric that is only used for sync. This has the advantage of this is that it provides dedicated resources to the sync job and isolates the resources on other File Fabric application nodes for end user activity.
  • Sync Threads – Sync can be setup to use more than one thread. This can speed up the sync process. The number of optimum threads is going to be dependent on factors such as vCPU’s available, and whether a dedicate sync node is being used.
  • Concurrent Syncs – The File Fabric out of the box will only index one provider per user at a time. The number of concurrent syncs can be increased if needs require.

Understand the strategies and think about what is best for your company or organisation.

It is often better to have more discreet shares that are indexed than one large share. The increased granularity makes them easier to manage from a File Fabric perspective.