cloudappliance:solrreplication [Access Anywhere Documentation]

This is an old revision of the document!

Solr Replication for Highly Available EFF Content Search

last updated April 10, 2022

Disclaimer

The information in this document is provided on an as-is basis. You use it at your own risk. We accept no responsibility for errors or omissions, nor do we have any obligation to provide support for implementing or maintaining the configuration described here. Furthermore, we do not warrant that the design presented here is appropriate for your requirements.

SME designs, implements and supports HA File Fabric solutions for customers on a paid professional services basis. For more information please contact sales@storagemadeeasy.com.

Please follow the documents guidance, If it is not followed, there is a chance of data loss of the Solr indexes while failing back after an outage. No actual file data would be lost, and lost index data can be recreated, however until recreated files will be missing from search if you do not correctly follow this guide.

Introduction

The Enterprise File Fabric as shipped is configured for deployment on a single virtual machine. However, a common deployment scenario for production deployments are redundant web frontends in front of a Highly Available Statefull Metadata server pair.

This guide will step through the setup of a Master-Slave Solr database pair, which allows for automatic failover without any loss of data. When the master returns online, there is additional work required to migrate any new index data back to the former master - so as such automatic failback is not supported.

Part 1

Assumptions

This guide assumes you have working knowledge and an understanding of Linux operating systems, databases, etc. If any questions come up, please contact your account manager or SME support.

For this guide we are using the following hostnames: smeweb01, smeweb02, smesql01, smesql02, and smesearch vip. Setup of mysql database replication and HA web servers are handled in this document: mastermasterdb You are of course free to select your own names that matches your naming schema.

In addition, you should have DNS configured and verified for the above 5 DNS records and ip addresses, as well as opened up any internal firewalls that can restrict necessary traffic between the systems, including multicast traffic for keepalived.

Initial State

This guide assumes you set up the four appliance with HA setup following the instructions in the Appliance Installation guide mastermasterdb

Preparation

Before you start, please be sure to collect / prepare the necessary information.

4 SME Appliances deployed
SME linux root password
SME linux smeconfiguser password
1 additional IP addresses for your LAN - for the new Solr search VIP
1 DNS names for the VIP

Linux Login

For Linux command line operations, you must run the commands shown in this document as the root user unless otherwise specified. However, for security reasons you cannot connect with ssh to the machine directly. Instead, you should ssh to the box using smeconfiguser and then su to root:

ssh smeconfiguser@smeweb01

Enter the smeconfiguser password at the prompt. Once logged in, elevate your privileges to root.

su -

Part II

Configuring the Solr

You must perform these steps to create a specialized Solr server from the standard SME appliance distribution.

Install Solr Replica Containers

The standard solr containers deployed by default in the appliance do not support replication. Instead we will install containers designed for Leader/Follower replication

yum install sme-containers-solr-replicas

We can then stop the existing solr container and start up the replias version

cd /var/www/smestorage/containers/solr && docker-compose down

After finishing configuration we will start up the new replicas version

Solr configuration for HA

Solr Database Configuration

Configure Database Replication

Update Configuration to enable ReplicationHandler

We will edit the following file in order to turn on the Replication Handler within Solr, which will handle all Solr index replication.

Add the following into this file: /var/solr/data/sme/conf/solrconfig.xml

Add this after the “<!– Request Handlers” section like so:

<!-- Request Handlers

       http://wiki.apache.org/solr/SolrRequestHandler

       Incoming queries will be dispatched to a specific handler by name
       based on the path specified in the request.

       If a Request Handler is declared with startup="lazy", then it will
       not be initialized until the first request that uses it.

    -->

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="leader">
        <str name="enable">${enable.leader:false}</str>
        <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. -->
        <str name="replicateAfter">startup</str>
        <str name="replicateAfter">commit</str>

    </lst>

    <lst name="follower">

        <str name="enable">${enable.follower:false}</str>

        <!--fully qualified url to the master core. It is possible to pass on this as a request param for the fetchindex command-->
        <str name="leaderUrl">http://smesearch:8983/solr/sme/replication</str>

        <!--Interval in which the follower should poll leader .Format is HH:mm:ss . If this is absent slave does not poll automatically.
                                                But a fetchindex can be triggered from the admin or the http API -->
        <str name="pollInterval">00:00:20</str>
        <!--The following values are used when the slave connects to the leader to download the index files.
                                                Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify
         these unless the bandwidth is extremely low or if there is an extremely high latency-->
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>

        <!-- If HTTP Basic authentication is enabled on the leader, then the follower can be configured with the following -->
        <str name="httpBasicAuthUser">solr</str>
        <str name="httpBasicAuthPassword">drom6etsh9Onk</str>

     </lst>
</requestHandler>

Please note the use of the smesearch dns name for leaderUrl. If you have a different dns name please update the above configuration accordingly.

Define Leader and Follower

Each Solr instance is configured to be able to act as either a master or a slave. In order to define the state of each we will use core properties.

On smesql01 to make it master add the following two lines at the bottom of /smedata/sme_solr/sme/core.properties

enable.leader=true
enable.follower=false

On smesql02 to make it a slave, dd the following two lines at the bottom of/var/solr/data/sme/core.properties

enable.leader=false
enable.follower=true

Allow replication whitelist

Next we will configuration the whitelist to allow the solr containers to replicate:

We will edit /var/solr/data/solr.xml to add the following configuration at the bottom:

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:600000}</int>
    <int name="connTimeout">${connTimeout:60000}</int>
     <str name="shardsWhitelist">${solr.shardsWhitelist:smesql01:8983/solr/sme,smesql02:8983/solro/sme}</str>
  </shardHandlerFactory>

Replacing smesql01/02 with their respective ip addresses.

Start solr containers

Finally, we will start the Solr replica containers on both hosts in order to have those changes take effect:

cd /var/www/smestorage/containers/solr-replicas/ && docker-compose up -d

Part III

Using Keepalived to manage VIP and automatic failover

Like with the mysql database failover, we will use the opensource application keepalived to provide management of the VIP, as well as provide automated failover in the case of an outage of the server.

We will update our existing configuration to add the new vip in: /etc/keepalived/keepalived.conf like so:

Note: Update the first line below to replace all items in < > with your environment specific entries

smesql01 keepalived.conf

! Configuration File for keepalived

global_defs {
  notification_email {
  }
  vrrp_skip_check_adv_addr
  vrrp_strict
  vrrp_garp_interval 0
  vrrp_gna_interval 0
  enable_script_security
}

vrrp_script chk_mariadb {
  script "/sbin/pidof mysqld"
  interval 2
  rise 5
  fall 5
}

vrrp_script chk_memcache {
  script "/sbin/pidof memcached"
  interval 2
  rise 5
  fall 5
}

#### update to add solr check script here ####
vrrp_script chk_solr {
  script "/sbin/pidof java"
  interval 2
  rise 5
  fall 5
}

vrrp_instance DB {
  state MASTER
  interface eth0
  virtual_router_id 51
  priority 105
  nopreempt
  virtual_ipaddress {
    <db VIP address - ex: '10.10.10.1'>
  }
  track_script {
    chk_mariadb
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

vrrp_instance MEMCACHE {
  state MASTER
  interface eth0
  virtual_router_id 61
  priority 105
  nopreempt
  virtual_ipaddress {
    <memcached VIP address - ex: '10.10.10.2'>
  }
  track_script {
    chk_memcache
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

#### update to add solr vip configuration here ####
vrrp_instance SOLR {
  state MASTER
  interface eth0
  virtual_router_id 71
  priority 105
  nopreempt
  virtual_ipaddress {
    <solr VIP address - ex: '10.10.10.3'>
  }
  track_script {
    chk_solr
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

Note: Update the first line below to replace all items in < > with your environment specific entries

smesql02 keepalived.conf

! Configuration File for keepalived

global_defs {
  notification_email {
  }
  vrrp_skip_check_adv_addr
  vrrp_strict
  vrrp_garp_interval 0
  vrrp_gna_interval 0
  enable_script_security
}

vrrp_script chk_mariadb {
  script "/sbin/pidof mysqld"
  interval 2
  rise 5
  fall 5
}

vrrp_script chk_memcache {
  script "/sbin/pidof memcached"
  interval 2
  rise 5
  fall 5
}

#### update to add solr check script here ####
vrrp_script chk_solr {
  script "/sbin/pidof java"
  interval 2
  rise 5
  fall 5
}

vrrp_instance DB {
  state BACKUP
  interface eth0
  virtual_router_id 51
  priority 100
  nopreempt
  virtual_ipaddress {
    <db VIP address - ex: '10.10.10.1'>
  }
  track_script {
    chk_mariadb
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
	notify "/usr/libexec/keepalived/keepalived_state.sh" root
}
vrrp_instance MEMCACHE {
  state BACKUP
  interface eth0
  virtual_router_id 61
  priority 100
  nopreempt
  virtual_ipaddress {
    <memcached VIP address - ex: '10.10.10.2'>
  }
  track_script {
    chk_memcache
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

#### update to add solr vip configuration here ####
}
vrrp_instance SOLR {
  state BACKUP
  interface eth0
  virtual_router_id 71
  priority 100
  nopreempt
  virtual_ipaddress {
    <solr VIP address - ex: '10.10.10.3'>
  }
  track_script {
    chk_solr
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

Restart Keepalived

We will now restart keepalived to apply the new configuration. If this is a running production environment, please take care to shutdown keepalived on the slave, restart the master and then start the slave, otherwise there will be a re-election and failover of mysql and memcache during this restart

systemstl restart keepalived

Keepalived Notes

State

For the State files: all running instances of keepalived will be in one of 3 states: MASTER = currently responsible for that VIP, and will be actively responding to traffic directed to the VIP BACKUP = in standby, waiting to take over the VIP if the master is no longer in a MASTER state FAULT = after our check scripts have failed (due to the service no longer running), it moves to a fault state and is not eligible to be in a MASTER or BACKUP role.

Check scripts

By default we are using the following details for our regular checks to validate that the services are running

vrrp_script chk_solr {
	script "/sbin/pidof java"
	interval 2
	rise 5
	fall 5
}

This means that every 2 seconds (interval 2), we will run a check to see if the java process is running (script “/sbin/pidof java”). If it fails for 5 consecutive checks (fall 5) than that instance will move to a FAULT state. In addition, if it passes for 5 consecutive checks (rise 5), it then moves out of a fault state.

In the vrrp_instance section, we also set the attribute of nopreempt. This means that if smesql01 is in a MASTER state (as defined by the config file), and it moves to a FAULT state, when it exits that FAULT state it will move into BACKUP state.

smesql01 will not become master again until either 1) smesql02 enters a FAULT state (or the machine is no longer running) or 2) You restart keepalived in order to reset the state and force it back to MASTER status (# systemctl restart keepalived).

There is no additional benefit or risk of leaving smesql02 in a MASTER state, so it is recommended you retain these default settings.

Part IV

Configure the application servers

We will now update the application servers to point to the new VIP for search.

Login to your web interface as the appladmin user.

Go to Settings > Search Integrations

Replace the Solr uri as follows:

http://smesearch:8983/solr/sme

Then click “Test Settings” to verify and finally “Update Settings” to apply.

Part V

Failover and Recovery

In the case of an outage of the Solr service, or the smesql01 service, keepalived will fail over traffic to the Solr instance running on smesql02.

This process is all automatic. All new Solr read and writes will now occur on the smesql02 server without any intervention.

However, when smesq01 server/solr service is available again, we will not fail traffic back over in the other direction, as the smesql01 database will NOT contain any of the new indexes created during the outage. Solr replication is setup to run in only one direction at a time, so unlike the mysql setup, the smesql01 Solr service will not automatically copy back any changes.

In order to get the leader up to date you will need to change the leader/follower status of each host.

On smesql01 we will update the /var/solr/data/sme/core.properties file and change master/slave status

enable.leader=false
enable.follower=true

On smesql02 /var/solr/data/sme/core.properties we will do the opposite.

enable.leader=true
enable.follower=false

Finally, we will restart Solr on both hosts in order to have those changes take effect:

cd /var/www/smestorage/containers/solr-replicas/ && docker-compose down && docker-compose up -d

This will switch the status and start replicating data from smesql02 (the new leader) over to smesql01 (new follower).

We can check replication status on the hosts via this webpage:

http://<smesql01_or_smesql02>:8983/solr/sme/#/replication

Once both have the same Version/Gen number they are in sync.

From there you can then leave the hosts in their current master/slave state, or revert back by adjusting the /var/solr/data/sme/core.properties and restarting.

Do not fail back over the keepalived vip until replication is back in sync and you make this update to make smesql01 leader again.

Table of Contents

**This is an old revision of the document!**