**This is an old revision of the document!**

last updated March 06, 2020

Disclaimer

The information in this document is provided on an as-is basis. You use it at your own risk. We accept no responsibility for errors or omissions, nor do we have any obligation to provide support for implementing or maintaining the configuration described here. Furthermore, we do not warrant that the design presented here is appropriate for your requirements.

SME designs, implements and supports HA File Fabric solutions for customers on a paid professional services basis. For more information please contact sales@storagemadeeasy.com.

Please follow the documents guidance, If it is not followed, there is a chance of data loss of the Solr indexes while failing back after an outage. No actual file data would be lost, and lost index data can be recreated, however until recreated files will be missing from search if you do not correctly follow this guide.

Introduction

The Enterprise File Fabric as shipped is configured for deployment on a single virtual machine. However, a common deployment scenario for production deployments are redundant web frontends in front of a Highly Available Statefull Metadata server pair.

This guide will step through the setup of a Master-Slave Solr database pair, which allows for automatic failover without any loss of data. When the master returns online, there is additional work required to migrate any new index data back to the former master - so as such automatic failback is not supported.

Part 1

Assumptions

This guide assumes you have working knowledge and an understanding of Linux operating systems, databases, etc. If any questions come up, please contact your account manager or SME support.

For this guide we are using the following hostnames: smeweb01, smeweb02, smesql01, smesql02, and smesearch vip. Setup of mysql database replication and HA web servers are handled in this document: mastermasterdb You are of course free to select your own names that matches your naming schema.

In addition, you should have DNS configured and verified for the above 5 DNS records and ip addresses, as well as opened up any internal firewalls that can restrict necessary traffic between the systems, including multicast traffic for keepalived.

Initial State

This guide assumes you set up the four appliance with HA setup following the instructions in the Appliance Installation guide mastermasterdb

Preparation

Before you start, please be sure to collect / prepare the necessary information.

  • 4 SME Appliances deployed
  • SME linux root password
  • SME linux smeconfiguser password
  • 1 additional IP addresses for your LAN - for the new Solr search VIP
  • 1 DNS names for the VIP
Linux Login

For Linux command line operations, you must run the commands shown in this document as the root user unless otherwise specified. However, for security reasons you cannot connect with ssh to the machine directly. Instead, you should ssh to the box using smeconfiguser and then su to root:

ssh smeconfiguser@smeweb01

Enter the smeconfiguser password at the prompt. Once logged in, elevate your privileges to root.

su -

Part II

Configuring the Solr

You must perform these steps to create a specialized solr server from the standard SME appliance distribution.

Restrict external access

The solr server does not serve web pages and does not need to be accessible from outside WAN. The only traffic you need to allow is TCP port 7070 from all web frontend servers.

iptables for dbservers

On both smesql01 and smesql02, you must update iptables to allow incoming connections to mariadb, do the following.

As root:

iptables-save > /var/tmp/iptables_backup_`date -I`
ipt_line=`iptables -L RH-Firewall-1-INPUT -n --line-numbers | grep REJECT | awk '{print $1}'`
insert_line=`expr $ipt_line - 1`
iptables -I RH-Firewall-1-INPUT $insert_line -p tcp -m state --state NEW -m tcp --dport 7070 -j ACCEPT
iptables-save > /etc/sysconfig/iptables

Solr configuration for HA

Solr Database Configuration

The settings for Solr startup are stored in /home/sme/sme_jetty.We will update the jetty.host to listen on all IPs on the host. We will perform this on both smesql01 and smesql02.

jetty.host=0.0.0.0

Then restart solr for this change to take effect

systemctl restart jetty

Configure Database Replication

Update Configuration to enable ReplicationHandler

We will edit the following file in order to turn on the Replication Handler within solr, which will handle all solr index replication.

Add the following into this file: /smedata/sme_solr/sme/conf/solrconfig.xml

Add this after the “<!– Request Handlers” section like so:

<!-- Request Handlers

       http://wiki.apache.org/solr/SolrRequestHandler

       Incoming queries will be dispatched to a specific handler by name
       based on the path specified in the request.

       Legacy behavior: If the request path uses "/select" but no Request
       Handler has that name, and if handleSelect="true" has been specified in
       the requestDispatcher, then the Request Handler is dispatched based on
       the qt parameter.  Handlers without a leading '/' are accessed this way
       like so: http://host/app/[core/]select?qt=name  If no qt is
       given, then the requestHandler that declares default="true" will be
       used or the one named "standard".

       If a Request Handler is declared with startup="lazy", then it will
       not be initialized until the first request that uses it.

    -->

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">
        <str name="enable">${enable.master:false}</str>
        <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. -->
        <str name="replicateAfter">startup</str>
        <str name="replicateAfter">commit</str>
        
        <!--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to specify this -->
        <str name="commitReserveDuration">00:00:10</str>
    </lst>

    <lst name="slave">

        <str name="enable">${enable.slave:false}</str>

        <!--fully qualified url to the master core. It is possible to pass on this as a request param for the fetchindex command-->
        <str name="masterUrl">http://smesearch:7070/sme/replication</str>

        <!--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically.
                                   But a fetchindex can be triggered from the admin or the http API -->
        <str name="pollInterval">00:00:20</str>
        <!--The following values are used when the slave connects to the master to download the index files.
                                   Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify
         these unless the bandwidth is extremely low or if there is an extremely high latency-->
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>

        <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following -->
        <str name="httpBasicAuthUser">solr</str>
        <str name="httpBasicAuthPassword"> <SOLR USER PASSWORD>  </str>

     </lst>
</requestHandler>

Please note the use of the smesearch dns name for masterUrl. If you have a different dns name please update the above configuration accordingly.

Define Master and Slaves

Each solr instance is configured to be able to act as either a master or a slave. In order to define the state of each we will use core properties.

On smesql01 to make it master add the following two lines at the bottom of /smedata/sme_solr/sme/core.properties

enable.master=true
enable.slave=false

On smesql01 to make it a slave, dd the following two lines at the bottom of /smedata/sme_solr/sme/core.properties

enable.master=false
enable.slave=true

Finally, we will restart solr on both hosts in order to have those changes take effect:

systemctl restart jetty

Part III

Using Keepalived to manage VIP and automatic failover

Like with the mysql database failover, we will use the opensource application keepalived to provide management of the VIP, as well as provide automated failover in the case of an outage of the server.

We will update our existing configuration to add the new vip in: /etc/keepalived/keepalived.conf like so:

Note: Update the first line below to replace all items in < > with your environment specific entries

smesql01 keepalived.conf
! Configuration File for keepalived

global_defs {
  notification_email {
  }
  vrrp_skip_check_adv_addr
  vrrp_strict
  vrrp_garp_interval 0
  vrrp_gna_interval 0
  enable_script_security
}

vrrp_script chk_mariadb {
  script "/sbin/pidof mysqld"
  interval 2
  rise 5
  fall 5
}

vrrp_script chk_memcache {
  script "/sbin/pidof memcached"
  interval 2
  rise 5
  fall 5
}

#### update to add solr check script here ####
vrrp_script chk_solr {
  script "/sbin/pidof java"
  interval 2
  rise 5
  fall 5
}

vrrp_instance DB {
  state MASTER
  interface eth0
  virtual_router_id 51
  priority 105
  nopreempt
  virtual_ipaddress {
    <db VIP address - ex: '10.10.10.1'>
  }
  track_script {
    chk_mariadb
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

vrrp_instance MEMCACHE {
  state MASTER
  interface eth0
  virtual_router_id 61
  priority 105
  nopreempt
  virtual_ipaddress {
    <memcached VIP address - ex: '10.10.10.2'>
  }
  track_script {
    chk_memcache
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

#### update to add solr vip configuration here ####
vrrp_instance SOLR {
  state MASTER
  interface eth0
  virtual_router_id 71
  priority 105
  nopreempt
  virtual_ipaddress {
    <solr VIP address - ex: '10.10.10.3'>
  }
  track_script {
    chk_solr
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

Note: Update the first line below to replace all items in < > with your environment specific entries

smesql02 keepalived.conf
! Configuration File for keepalived

global_defs {
  notification_email {
  }
  vrrp_skip_check_adv_addr
  vrrp_strict
  vrrp_garp_interval 0
  vrrp_gna_interval 0
  enable_script_security
}

vrrp_script chk_mariadb {
  script "/sbin/pidof mysqld"
  interval 2
  rise 5
  fall 5
}

vrrp_script chk_memcache {
  script "/sbin/pidof memcached"
  interval 2
  rise 5
  fall 5
}

#### update to add solr check script here ####
vrrp_script chk_solr {
  script "/sbin/pidof java"
  interval 2
  rise 5
  fall 5
}

vrrp_instance DB {
  state BACKUP
  interface eth0
  virtual_router_id 51
  priority 100
  nopreempt
  virtual_ipaddress {
    <db VIP address - ex: '10.10.10.1'>
  }
  track_script {
    chk_mariadb
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
	notify "/usr/libexec/keepalived/keepalived_state.sh" root
}
vrrp_instance MEMCACHE {
  state BACKUP
  interface eth0
  virtual_router_id 61
  priority 100
  nopreempt
  virtual_ipaddress {
    <memcached VIP address - ex: '10.10.10.2'>
  }
  track_script {
    chk_memcache
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}

#### update to add solr vip configuration here ####
}
vrrp_instance SOLR {
  state BACKUP
  interface eth0
  virtual_router_id 71
  priority 100
  nopreempt
  virtual_ipaddress {
    <solr VIP address - ex: '10.10.10.3'>
  }
  track_script {
    chk_solr
  }
  authentication {
    auth_type PASS
    auth_pass <8 character password> 
  }
  notify "/usr/libexec/keepalived/keepalived_state.sh" root
}
Restart Keepalived

We will now restart keepalived to apply the new configuration. If this is a running production enviornment, please take care to shutdown keepalived on the slave, restart the master and then start the slave, otherwise there will be a relection and failover of mysql and memcache during this restart

systemstl restart keepalived
Keepalived Notes
State

For the State files: all running instances of keepalived will be in one of 3 states: MASTER = currently responsible for that VIP, and will be actively responding to traffic directed to the VIP BACKUP = in standby, waiting to take over the VIP if the master is no longer in a MASTER state FAULT = after our check scripts have failed (due to the service no longer running), it moves to a fault state and is not eligible to be in a MASTER or BACKUP role.

Check scripts

By default we are using the following details for our regular checks to validate that the services are running

vrrp_script chk_solr {
	script "/sbin/pidof java"
	interval 2
	rise 5
	fall 5
}

This means that every 2 seconds (interval 2), we will run a check to see if the java process is running (script “/sbin/pidof java”). If it fails for 5 consecutive checks (fall 5) than that instance will move to a FAULT state. In addition, if it passes for 5 consecutive checks (rise 5), it then moves out of a fault state.

In the vrrp_instance section, we also set the attribute of nopreempt. This means that if smesql01 is in a MASTER state (as defined by the config file), and it moves to a FAULT state, when it exits that FAULT state it will move into BACKUP state.

smesql01 will not become master again until either 1) smesql02 enters a FAULT state (or the machine is no longer running) or 2) You restart keepalived in order to reset the state and force it back to MASTER status (# systemctl restart keepalived).

There is no additional benefit or risk of leaving smesql02 in a MASTER state, so it is recommended you retain these default settings.

Part IV

Configure the application servers

We will now update the application servers to point to the new VIP for search.

Login to your web interface as the appladmin user.

Go to Settings > Search Integrations

Replace the solr uri as follows:

http://smesearch:7070/sme/

Then click “Test Settings” to verify and finally “Update Settings” to apply.

Part V

Failover and Recovery

In the case of an outage of the solr service, or the smesql01 service, keepalived will fail over traffic to the solr instance running on smesql02.

This process is all automatic. All new solr read and writes will now occur on the smesql02 server without any intervention.

However, when smesq01 server/solr service is available again, we will not fail traffic back over in the other direction, as the smesql01 database will NOT contain any of the new indexes created during the outage. Solr replication is setup to run in only one direction at a time, so unlike the mysql setup, the smesql01 solr service will not automatically copy back any changes.

In order to get the master up to date you will need to change the master/slave status of each host.

On smesql01 we will update the /smedata/sme_solr/sme/core.properties file and change master/slave status

enable.master=false
enable.slave=true

On smesql02 /smedata/sme_solr/sme/core.properties we will do the opposite.

enable.master=true
enable.slave=false

Finally, we will restart solr on both hosts in order to have those changes take effect:

systemctl restart jetty

This will switch the status and start replicating data from smesql02 (the new master) over to smesql01 (new slave).

We can check replicatin status on the hosts via this webpage: http:<smesql01/smesql02>:7070/#/sme/replication Once both have the same Version/Gen number they are in sync. From there you can then leave the hosts in their current master/slave state, or revert back by adjusting the /smedata/sme_solr/sme/core.properties and restarting. Do not fail back over the keepalived vip until replication is back in sync and you make this update to make smesql01 master again.