**This is an old revision of the document!**
Solr Replication for Highly Available EFF Content Search
last updated April 10, 2022
Disclaimer
The information in this document is provided on an as-is basis. You use it at your own risk. We accept no responsibility for errors or omissions, nor do we have any obligation to provide support for implementing or maintaining the configuration described here. Furthermore, we do not warrant that the design presented here is appropriate for your requirements.
SME designs, implements and supports HA File Fabric solutions for customers on a paid professional services basis. For more information please contact sales@storagemadeeasy.com.
Please follow the documents guidance, If it is not followed, there is a chance of data loss of the Solr indexes while failing back after an outage. No actual file data would be lost, and lost index data can be recreated, however until recreated files will be missing from search if you do not correctly follow this guide.
Introduction
The Enterprise File Fabric as shipped is configured for deployment on a single virtual machine. However, a common deployment scenario for production deployments are redundant web frontends in front of a Highly Available Statefull Metadata server pair.
This guide will step through the setup of a Master-Slave Solr database pair, which allows for automatic failover without any loss of data. When the master returns online, there is additional work required to migrate any new index data back to the former master - so as such automatic failback is not supported.
Part 1
Assumptions
This guide assumes you have working knowledge and an understanding of Linux operating systems, databases, etc. If any questions come up, please contact your account manager or SME support.
For this guide we are using the following hostnames: smeweb01, smeweb02, smesql01, smesql02, and smesearch vip. Setup of mysql database replication and HA web servers are handled in this document: mastermasterdb You are of course free to select your own names that matches your naming schema.
In addition, you should have DNS configured and verified for the above 5 DNS records and ip addresses, as well as opened up any internal firewalls that can restrict necessary traffic between the systems, including multicast traffic for keepalived.
Initial State
This guide assumes you set up the four appliance with HA setup following the instructions in the Appliance Installation guide mastermasterdb
Preparation
Before you start, please be sure to collect / prepare the necessary information.
- 4 SME Appliances deployed
- SME linux root password
- SME linux smeconfiguser password
- 1 additional IP addresses for your LAN - for the new Solr search VIP
- 1 DNS names for the VIP
Linux Login
For Linux command line operations, you must run the commands shown in this document as the root user unless otherwise specified. However, for security reasons you cannot connect with ssh to the machine directly. Instead, you should ssh to the box using smeconfiguser and then su to root:
ssh smeconfiguser@smeweb01
Enter the smeconfiguser password at the prompt. Once logged in, elevate your privileges to root.
su -
Part II
Configuring the Solr
You must perform these steps to create a specialized Solr server from the standard SME appliance distribution.
Install Solr Replica Containers
The standard solr containers deployed by default in the appliance do not support replication. Instead we will install containers designed for Leader/Follower replication
yum install sme-containers-solr-replicas
We can then stop the existing solr container and start up the replias version
cd /var/www/smestorage/containers/solr && docker-compose down
After finishing configuration we will start up the new replicas version
Solr configuration for HA
Solr Database Configuration
Configure Database Replication
Update Configuration to enable ReplicationHandler
We will edit the following file in order to turn on the Replication Handler within Solr, which will handle all Solr index replication.
Add the following into this file: /var/solr/data/sme/conf/solrconfig.xml
Add this after the “<!– Request Handlers” section like so:
<!-- Request Handlers http://wiki.apache.org/solr/SolrRequestHandler Incoming queries will be dispatched to a specific handler by name based on the path specified in the request. If a Request Handler is declared with startup="lazy", then it will not be initialized until the first request that uses it. --> <requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="leader"> <str name="enable">${enable.leader:false}</str> <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. --> <str name="replicateAfter">startup</str> <str name="replicateAfter">commit</str> </lst> <lst name="follower"> <str name="enable">${enable.follower:false}</str> <!--fully qualified url to the master core. It is possible to pass on this as a request param for the fetchindex command--> <str name="leaderUrl">http://smesearch:8983/solr/sme/replication</str> <!--Interval in which the follower should poll leader .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API --> <str name="pollInterval">00:00:20</str> <!--The following values are used when the slave connects to the leader to download the index files. Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency--> <str name="httpConnTimeout">5000</str> <str name="httpReadTimeout">10000</str> <!-- If HTTP Basic authentication is enabled on the leader, then the follower can be configured with the following --> <str name="httpBasicAuthUser">solr</str> <str name="httpBasicAuthPassword">drom6etsh9Onk</str> </lst> </requestHandler>
Please note the use of the smesearch dns name for leaderUrl. If you have a different dns name please update the above configuration accordingly.
Define Leader and Follower
Each Solr instance is configured to be able to act as either a master or a slave. In order to define the state of each we will use core properties.
On smesql01 to make it master add the following two lines at the bottom of /smedata/sme_solr/sme/core.properties
enable.leader=true enable.follower=false
On smesql02 to make it a slave, dd the following two lines at the bottom of/var/solr/data/sme/core.properties
enable.leader=false enable.follower=true
Allow replication whitelist
Next we will configuration the whitelist to allow the solr containers to replicate:
We will edit /var/solr/data/solr.xml to add the following configuration at the bottom:
<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory"> <int name="socketTimeout">${socketTimeout:600000}</int> <int name="connTimeout">${connTimeout:60000}</int> <str name="shardsWhitelist">${solr.shardsWhitelist:smesql01:8983/solr/sme,smesql02:8983/solro/sme}</str> </shardHandlerFactory>
Replacing smesql01/02 with their respective ip addresses.
Start solr containers
Finally, we will start the Solr replica containers on both hosts in order to have those changes take effect:
cd /var/www/smestorage/containers/solr-replicas/ && docker-compose up -d
Part III
Using Keepalived to manage VIP and automatic failover
Like with the mysql database failover, we will use the opensource application keepalived to provide management of the VIP, as well as provide automated failover in the case of an outage of the server.
We will update our existing configuration to add the new vip in: /etc/keepalived/keepalived.conf like so:
Note: Update the first line below to replace all items in < > with your environment specific entries
smesql01 keepalived.conf
! Configuration File for keepalived global_defs { notification_email { } vrrp_skip_check_adv_addr vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 enable_script_security } vrrp_script chk_mariadb { script "/sbin/pidof mysqld" interval 2 rise 5 fall 5 } vrrp_script chk_memcache { script "/sbin/pidof memcached" interval 2 rise 5 fall 5 } #### update to add solr check script here #### vrrp_script chk_solr { script "/sbin/pidof java" interval 2 rise 5 fall 5 } vrrp_instance DB { state MASTER interface eth0 virtual_router_id 51 priority 105 nopreempt virtual_ipaddress { <db VIP address - ex: '10.10.10.1'> } track_script { chk_mariadb } authentication { auth_type PASS auth_pass <8 character password> } notify "/usr/libexec/keepalived/keepalived_state.sh" root } vrrp_instance MEMCACHE { state MASTER interface eth0 virtual_router_id 61 priority 105 nopreempt virtual_ipaddress { <memcached VIP address - ex: '10.10.10.2'> } track_script { chk_memcache } authentication { auth_type PASS auth_pass <8 character password> } notify "/usr/libexec/keepalived/keepalived_state.sh" root } #### update to add solr vip configuration here #### vrrp_instance SOLR { state MASTER interface eth0 virtual_router_id 71 priority 105 nopreempt virtual_ipaddress { <solr VIP address - ex: '10.10.10.3'> } track_script { chk_solr } authentication { auth_type PASS auth_pass <8 character password> } notify "/usr/libexec/keepalived/keepalived_state.sh" root }
Note: Update the first line below to replace all items in < > with your environment specific entries
smesql02 keepalived.conf
! Configuration File for keepalived global_defs { notification_email { } vrrp_skip_check_adv_addr vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 enable_script_security } vrrp_script chk_mariadb { script "/sbin/pidof mysqld" interval 2 rise 5 fall 5 } vrrp_script chk_memcache { script "/sbin/pidof memcached" interval 2 rise 5 fall 5 } #### update to add solr check script here #### vrrp_script chk_solr { script "/sbin/pidof java" interval 2 rise 5 fall 5 } vrrp_instance DB { state BACKUP interface eth0 virtual_router_id 51 priority 100 nopreempt virtual_ipaddress { <db VIP address - ex: '10.10.10.1'> } track_script { chk_mariadb } authentication { auth_type PASS auth_pass <8 character password> } notify "/usr/libexec/keepalived/keepalived_state.sh" root } vrrp_instance MEMCACHE { state BACKUP interface eth0 virtual_router_id 61 priority 100 nopreempt virtual_ipaddress { <memcached VIP address - ex: '10.10.10.2'> } track_script { chk_memcache } authentication { auth_type PASS auth_pass <8 character password> } notify "/usr/libexec/keepalived/keepalived_state.sh" root } #### update to add solr vip configuration here #### } vrrp_instance SOLR { state BACKUP interface eth0 virtual_router_id 71 priority 100 nopreempt virtual_ipaddress { <solr VIP address - ex: '10.10.10.3'> } track_script { chk_solr } authentication { auth_type PASS auth_pass <8 character password> } notify "/usr/libexec/keepalived/keepalived_state.sh" root }
Restart Keepalived
We will now restart keepalived to apply the new configuration. If this is a running production environment, please take care to shutdown keepalived on the slave, restart the master and then start the slave, otherwise there will be a re-election and failover of mysql and memcache during this restart
systemstl restart keepalived
Keepalived Notes
State
For the State files: all running instances of keepalived will be in one of 3 states: MASTER = currently responsible for that VIP, and will be actively responding to traffic directed to the VIP BACKUP = in standby, waiting to take over the VIP if the master is no longer in a MASTER state FAULT = after our check scripts have failed (due to the service no longer running), it moves to a fault state and is not eligible to be in a MASTER or BACKUP role.
Check scripts
By default we are using the following details for our regular checks to validate that the services are running
vrrp_script chk_solr { script "/sbin/pidof java" interval 2 rise 5 fall 5 }
This means that every 2 seconds (interval 2), we will run a check to see if the java process is running (script “/sbin/pidof java”). If it fails for 5 consecutive checks (fall 5) than that instance will move to a FAULT state. In addition, if it passes for 5 consecutive checks (rise 5), it then moves out of a fault state.
In the vrrp_instance section, we also set the attribute of nopreempt. This means that if smesql01 is in a MASTER state (as defined by the config file), and it moves to a FAULT state, when it exits that FAULT state it will move into BACKUP state.
smesql01 will not become master again until either 1) smesql02 enters a FAULT state (or the machine is no longer running) or 2) You restart keepalived in order to reset the state and force it back to MASTER status (# systemctl restart keepalived).
There is no additional benefit or risk of leaving smesql02 in a MASTER state, so it is recommended you retain these default settings.
Part IV
Configure the application servers
We will now update the application servers to point to the new VIP for search.
Login to your web interface as the appladmin user.
Go to Settings > Search Integrations
Replace the Solr uri as follows:
http://smesearch:8983/solr/sme
Then click “Test Settings” to verify and finally “Update Settings” to apply.
Part V
Failover and Recovery
In the case of an outage of the Solr service, or the smesql01 service, keepalived will fail over traffic to the Solr instance running on smesql02.
This process is all automatic. All new Solr read and writes will now occur on the smesql02 server without any intervention.
However, when smesq01 server/solr service is available again, we will not fail traffic back over in the other direction, as the smesql01 database will NOT contain any of the new indexes created during the outage. Solr replication is setup to run in only one direction at a time, so unlike the mysql setup, the smesql01 Solr service will not automatically copy back any changes.
In order to get the leader up to date you will need to change the leader/follower status of each host.
On smesql01 we will update the /var/solr/data/sme/core.properties file and change master/slave status
enable.leader=false enable.follower=true
On smesql02 /var/solr/data/sme/core.properties we will do the opposite.
enable.leader=true enable.follower=false
Finally, we will restart Solr on both hosts in order to have those changes take effect:
cd /var/www/smestorage/containers/solr-replicas/ && docker-compose down && docker-compose up -d
This will switch the status and start replicating data from smesql02 (the new leader) over to smesql01 (new follower).
We can check replication status on the hosts via this webpage:
http://<smesql01_or_smesql02>:8983/solr/sme/#/replication
Once both have the same Version/Gen number they are in sync.
From there you can then leave the hosts in their current master/slave state, or revert back by adjusting the /var/solr/data/sme/core.properties and restarting.
Do not fail back over the keepalived vip until replication is back in sync and you make this update to make smesql01 leader again.