last updated April 10, 2022


Please follow the documents guidance, If it is not followed, there is a chance of data loss of the Solr indexes while failing back after an outage. No actual file data would be lost, and lost index data can be recreated, however until recreated files will be missing from search if you do not correctly follow this guide.


The Enterprise File Fabric as shipped is configured for deployment on a single virtual machine. However, a common deployment scenario for production deployments are redundant web frontends in front of a Highly Available Statefull Metadata server pair.

This guide will step through the setup of a Leader-Follower Solr database pair, which allows for automatic failover without any loss of data. When the leader returns online, there is additional work required to migrate any new index data back to the former leader - so as such automatic failback is not supported.

Part 1


This guide assumes you have working knowledge and an understanding of Linux operating systems, databases, etc. If any questions come up, please contact your account manager or SME support.

For this guide we are using the following hostnames: smeweb01, smeweb02, smesql01, smesql02, and smesearch vip. Setup of mysql database replication and HA web servers are handled in this document: mastermasterdb You are of course free to select your own names that matches your naming schema.

In addition, you should have DNS configured and verified for the above 5 DNS records and ip addresses, as well as opened up any internal firewalls that can restrict necessary traffic between the systems, including multicast traffic for keepalived.

Initial State

This guide assumes you set up the four appliance with HA setup following the instructions in the Appliance Installation guide mastermasterdb


Before you start, please be sure to collect / prepare the necessary information.

  • 4 SME Appliances deployed
  • SME linux root password
  • SME linux smeconfiguser password
  • 1 additional IP addresses for your LAN - for the new Solr search VIP
  • 1 DNS names for the VIP
Linux Login

For Linux command line operations, you must run the commands shown in this document as the root user unless otherwise specified. However, for security reasons you cannot connect with ssh to the machine directly. Instead, you should ssh to the box using smeconfiguser and then su to root:

ssh smeconfiguser@smeweb01

Enter the smeconfiguser password at the prompt. Once logged in, elevate your privileges to root.

su -

Part II

Configuring the Solr

You must perform these steps to create a specialized Solr server from the standard SME appliance distribution.

Install Solr Replica Containers

The standard solr containers deployed by default in the appliance do not support replication. Instead we will install containers designed for Leader/Follower replication

yum install sme-containers-solr-replicas

We can then stop the existing solr container and start up the replias version

cd /var/www/smestorage/containers/solr && docker-compose down

After finishing configuration we will start up the new replicas version

Solr configuration for HA

Solr Database Configuration

Configure Database Replication

Update Configuration to enable ReplicationHandler

We will edit the following file in order to turn on the Replication Handler within Solr, which will handle all Solr index replication.

Add the following into this file: /var/solr/data/sme/conf/solrconfig.xml

Add this after the “<!– Request Handlers” section like so:

<!-- Request Handlers


       Incoming queries will be dispatched to a specific handler by name
       based on the path specified in the request.

       If a Request Handler is declared with startup="lazy", then it will
       not be initialized until the first request that uses it.


<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="leader">
        <str name="enable">${enable.leader:false}</str>
        <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. -->
        <str name="replicateAfter">startup</str>
        <str name="replicateAfter">commit</str>


    <lst name="follower">

        <str name="enable">${enable.follower:false}</str>

        <!--fully qualified url to the leader core. It is possible to pass on this as a request param for the fetchindex command-->
        <str name="leaderUrl">http://smesearch:8983/solr/sme</str>

        <!--Interval in which the follower should poll leader .Format is HH:mm:ss . If this is absent follower does not poll automatically.
                                                But a fetchindex can be triggered from the admin or the http API -->
        <str name="pollInterval">00:00:20</str>
        <!--The following values are used when the follower connects to the leader to download the index files.
                                                Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify
         these unless the bandwidth is extremely low or if there is an extremely high latency-->
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>

        <!-- If HTTP Basic authentication is enabled on the leader, then the follower can be configured with the following -->
        <str name="httpBasicAuthUser">solr</str>
        <str name="httpBasicAuthPassword">drom6etsh9Onk</str>


Please note the use of the smesearch dns name for leaderUrl. If you have a different dns name please update the above configuration accordingly.

Define Leader and Follower

Each Solr instance is configured to be able to act as either a leader or a follower. In order to define the state of each we will use core properties.

On smesql01 to make it leader add the following two lines at the bottom of var/solr/data/sme/core.properties

enable.leader=true
enable.follower=false

On smesql02 to make it a follower, add the following two lines at the bottom of /var/solr/data/sme/core.properties

enable.leader=false
enable.follower=true

Allow replication whitelist

Next we will configuration the whitelist to allow the solr containers to replicate:

We will edit /var/solr/data/solr.xml to add the following configuration at the bottom:

<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:600000}</int>
    <int name="connTimeout">${connTimeout:60000}</int>
    <str name="shardsWhitelist">${solr.shardsWhitelist:smesql01:8983/solr/sme,smesql02:8983/solr/sme}</str>
</shardHandlerFactory>

Replacing smesql01/02 with their respective ip addresses.

Start solr containers

Finally, we will start the Solr replica containers on both hosts in order to have those changes take effect:

cd /var/www/smestorage/containers/solr-replicas/ && docker-compose up -d
Replacing smesql01/02 with their respective ip addresses. == Start solr containers == Finally, we will start the Solr replica containers on both hosts in order to have those changes take effect: <code> cd /var/www/smestorage/containers/solr-replicas/ && docker-compose up -d </code> ==== Part III ==== === Using Keepalived to manage VIP and automatic failover === Like with the mysql database failover, we will use the opensource application keepalived to provide management of the VIP, as well as provide automated failover in the case of an outage of the server. We will update our existing configuration to add the new vip in: /etc/keepalived/keepalived.conf like so: Note: Update the first line below to replace all items in < > with your environment specific entries == smesql01 keepalived.conf == <code> ! Configuration File for keepalived globaldefs { notificationemail { } vrrpskipcheckadvaddr vrrpstrict vrrpgarpinterval 0 vrrpgnainterval 0 enablescript_security } vrrpscript chkmariadb { script “/sbin/pidof mysqld” interval 2 rise 5 fall 5 } vrrpscript chkmemcache { script “/sbin/pidof memcached” interval 2 rise 5 fall 5 } #### update to add solr check script here #### vrrpscript chksolr { script “/sbin/pidof java” interval 2 rise 5 fall 5 } vrrpinstance DB { state MASTER interface eth0 virtualrouterid 51 priority 105 nopreempt virtualipaddress { <db VIP address - ex: ''> } trackscript { chkmariadb } authentication { authtype PASS authpass <8 character password> } notify “/usr/libexec/keepalived/keepalived_state.sh” root } vrrpinstance MEMCACHE { state MASTER interface eth0 virtualrouterid 61 priority 105 nopreempt virtualipaddress { <memcached VIP address - ex: ''> } trackscript { chkmemcache } authentication { authtype PASS authpass <8 character password> } notify “/usr/libexec/keepalived/keepalived_state.sh” root } #### update to add solr vip configuration here #### vrrpinstance SOLR { state MASTER interface eth0 virtualrouterid 71 priority 105 nopreempt virtualipaddress { <solr VIP address - ex: ''> } trackscript { chksolr } authentication { authtype PASS authpass <8 character password> } notify “/usr/libexec/keepalived/keepalived_state.sh” root } </code> Note: Update the first line below to replace all items in < > with your environment specific entries == smesql02 keepalived.conf == <code> ! Configuration File for keepalived globaldefs { notificationemail { } vrrpskipcheckadvaddr vrrpstrict vrrpgarpinterval 0 vrrpgnainterval 0 enablescript_security } vrrpscript chkmariadb { script “/sbin/pidof mysqld” interval 2 rise 5 fall 5 } vrrpscript chkmemcache { script “/sbin/pidof memcached” interval 2 rise 5 fall 5 } #### update to add solr check script here #### vrrpscript chksolr { script “/sbin/pidof java” interval 2 rise 5 fall 5 } vrrpinstance DB { state BACKUP interface eth0 virtualrouterid 51 priority 100 nopreempt virtualipaddress { <db VIP address - ex: ''> } trackscript { chkmariadb } authentication { authtype PASS authpass <8 character password> } notify “/usr/libexec/keepalived/keepalivedstate.sh” root } vrrpinstance MEMCACHE { state BACKUP interface eth0 virtualrouterid 61 priority 100 nopreempt virtualipaddress { <memcached VIP address - ex: ''> } trackscript { chkmemcache } authentication { authtype PASS authpass <8 character password> } notify “/usr/libexec/keepalived/keepalivedstate.sh” root } #### update to add solr vip configuration here #### } vrrpinstance SOLR { state BACKUP interface eth0 virtualrouterid 71 priority 100 nopreempt virtualipaddress { <solr VIP address - ex: ''> } trackscript { chksolr } authentication { authtype PASS authpass <8 character password> } notify “/usr/libexec/keepalived/keepalived_state.sh” root } </code> == Restart Keepalived == We will now restart keepalived to apply the new configuration. If this is a running production environment, please take care to shutdown keepalived on the follower, restart the leader and then start the follower, otherwise there will be a re-election and failover of mysql and memcache during this restart <code> systemstl restart keepalived </code> == Keepalived Notes == == State == For the State files: all running instances of keepalived will be in one of 3 states: MASTER = currently responsible for that VIP, and will be actively responding to traffic directed to the VIP BACKUP = in standby, waiting to take over the VIP if the master is no longer in a MASTER state FAULT = after our check scripts have failed (due to the service no longer running), it moves to a fault state and is not eligible to be in a MASTER or BACKUP role. == Check scripts == By default we are using the following details for our regular checks to validate that the services are running <code> vrrpscript chksolr { script “/sbin/pidof java” interval 2 rise 5 fall 5 } </code> This means that every 2 seconds (interval 2), we will run a check to see if the java process is running (
script “/sbin/pidof java”). If it fails for 5 consecutive checks (fall 5) than that instance will move to a FAULT state. In addition, if it passes for 5 consecutive checks (rise 5), it then moves out of a fault state. In the vrrp_instance section, we also set the attribute of nopreempt. This means that if smesql01 is in a MASTER state (as defined by the config file), and it moves to a FAULT state, when it exits that FAULT state it will move into BACKUP state. smesql01 will not become master again until either 1) smesql02 enters a FAULT state (or the machine is no longer running) or 2) You restart keepalived in order to reset the state and force it back to MASTER status (# systemctl restart keepalived). There is no additional benefit or risk of leaving smesql02 in a MASTER state, so it is recommended you retain these default settings. ==== Part IV ==== === Configure the application servers === We will now update the application servers to point to the new VIP for search. Login to your web interface as the appladmin user. Go to Settings > Search Integrations Replace the Solr uri as follows: <code> http://smesearch:8983/solr/sme </code> Then click “Test Settings” to verify and finally “Update Settings” to apply. ==== Part V ==== === Failover and Recovery === In the case of an outage of the Solr service, or the smesql01 service, keepalived will fail over traffic to the Solr instance running on smesql02. This process is all automatic. All new Solr read and writes will now occur on the smesql02 server without any intervention. However, when smesq01 server/solr service is available again, we will not fail traffic back over in the other direction, as the smesql01 database will NOT contain any of the new indexes created during the outage. Solr replication is setup to run in only one direction at a time, so unlike the mysql setup, the smesql01 Solr service will not automatically copy back any changes. In order to get the leader up to date you will need to change the leader/follower status of each host. On smesql01 we will update the /var/solr/data/sme/core.properties file and change leader/follower status <code> enable.leader=false enable.follower=true </code> On smesql02 /var/solr/data/sme/core.properties we will do the opposite. <code> enable.leader=true enable.follower=false </code> Finally, we will restart Solr on both hosts in order to have those changes take effect: <code> cd /var/www/smestorage/containers/solr-replicas/ && docker-compose down && docker-compose up -d </code> This will switch the status and start replicating data from smesql02 (the new leader) over to smesql01 (new follower). We can check replication status on the hosts via this webpage: http://<smesql01_or_smesql02>:8983/solr/#/sme/replication Once both have the same Version/Gen number they are in sync. From there you can then leave the hosts in their current leader/follower state, or revert back by adjusting the /var/solr/data/sme/core.properties and restarting. Do not fail back over the keepalived vip until replication is back in sync and you make this update to make smesql01 leader again.