Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cloudappliance/solrreplication [2020_03_10 14:08] – [Disclaimer] jimcloudappliance:solrreplication [2024_01_12 18:16] (current) – removed steven
Line 1: Line 1:
-===== Solr Replication for Highly Available EFF Content Search ===== 
-== last updated March 06, 2020 == 
- 
-==== Disclaimer ==== 
-The information in this document is provided on an as-is basis. You use it at your own risk. We accept no responsibility for errors or omissions, nor do we have any obligation to provide support for implementing or maintaining the configuration described here. Furthermore, we do not warrant that the design presented here is appropriate for your requirements. 
- 
-SME designs, implements and supports HA File Fabric solutions for customers on a paid professional services basis. For more information please contact [[mailto:sales@storagemadeeasy.com?subject=SOLR consultancy enquiry|sales@storagemadeeasy.com]]. 
- 
-<WRAP center round important 100%> 
-Please follow the documents guidance, If it is not followed, there is a chance of data loss of the Solr indexes while failing back after an outage. No actual file data would be lost, and lost index data can be recreated, however until recreated files will be missing from search if you do not correctly follow this guide.  
-</WRAP> 
-==== Introduction ==== 
- 
- 
-The Enterprise File Fabric as shipped is configured for deployment on a single virtual machine. However, a common deployment scenario for production deployments are redundant web frontends in front of a Highly Available Statefull Metadata server pair.  
- 
-This guide will step through the setup of a Master-Slave Solr database pair, which allows for automatic failover without any loss of data. When the master returns online, there is additional work required to migrate any new index data back to the former master - so as such automatic failback is not supported.  
- 
-==== Part 1 ==== 
-=== Assumptions === 
- 
- 
-This guide assumes you have working knowledge and an understanding of Linux operating systems, databases, etc. If any questions come up, please contact your account manager or SME support. 
- 
-For this guide we are using the following hostnames: smeweb01, smeweb02, smesql01, smesql02, and smesearch vip. Setup of mysql database replication and HA web servers are handled in this document: [[cloudappliance/mastermasterdb|]] 
-You are of course free to select your own names that matches your naming schema.  
- 
-In addition, you should have DNS configured and verified for the above 5 DNS records and ip addresses, as well as opened up any internal firewalls that can restrict necessary traffic between the systems, including multicast traffic for keepalived.   
- 
-== Initial State == 
-This guide assumes you set up the four appliance with HA setup following the instructions in the Appliance Installation guide [[cloudappliance/mastermasterdb|]] 
- 
-== Preparation == 
-Before you start, please be sure to collect / prepare the necessary information.  
-  * 4 SME Appliances deployed 
-  * SME linux root password 
-  * SME linux smeconfiguser password 
-  * 1 additional IP addresses for your LAN - for the new Solr search VIP 
-  * 1 DNS names for the VIP 
- 
- 
-== Linux Login == 
-For Linux command line operations, you must run the commands shown in this document as the root user unless otherwise specified. However, for security reasons you cannot connect with ssh to the machine directly. Instead, you should ssh to the box using smeconfiguser and then su to root:  
- 
-<code> 
-ssh smeconfiguser@smeweb01 
-</code> 
- 
-Enter the smeconfiguser password at the prompt. Once logged in, elevate your privileges to root.  
-<code> 
-su - 
-</code> 
- 
- 
- 
- 
-==== Part II ==== 
-=== Configuring the Solr === 
- 
-You must perform these steps to create a specialized solr server from the standard SME appliance distribution.  
- 
-=== Restrict external access  === 
-The solr server does not serve web pages and does not need to be accessible from outside WAN. The only traffic you need to allow is TCP port 7070 from all web frontend servers. 
- 
- 
-=== iptables for dbservers === 
- 
-On both smesql01 and smesql02, you must update iptables to allow incoming connections to mariadb, do the following. 
- 
-As root: 
- 
-<code> 
-iptables-save > /var/tmp/iptables_backup_`date -I` 
-ipt_line=`iptables -L RH-Firewall-1-INPUT -n --line-numbers | grep REJECT | awk '{print $1}'` 
-insert_line=`expr $ipt_line - 1` 
-iptables -I RH-Firewall-1-INPUT $insert_line -p tcp -m state --state NEW -m tcp --dport 7070 -j ACCEPT 
-iptables-save > /etc/sysconfig/iptables 
-</code> 
- 
-=== Solr configuration for HA === 
-== Solr Database Configuration == 
-The settings for Solr startup are stored in /home/sme/sme_jetty.We will update the jetty.host to listen on all IPs on the host. 
-We will perform this on both smesql01 and smesql02. 
- 
-<code> 
-jetty.host=0.0.0.0 
-</code> 
- 
- 
-Then restart solr for this change to take effect 
- 
-<code> 
-systemctl restart jetty 
-</code> 
- 
-=== Configure Database Replication === 
- 
-== Update Configuration to enable ReplicationHandler == 
- 
-We will edit the following file in order to turn on the Replication Handler within solr, which will handle all solr index replication.  
- 
-Add the following into this file: /smedata/sme_solr/sme/conf/solrconfig.xml 
- 
-Add this after the "<!-- Request Handlers" section like so: 
-<code> 
- 
-<!-- Request Handlers 
- 
-       http://wiki.apache.org/solr/SolrRequestHandler 
- 
-       Incoming queries will be dispatched to a specific handler by name 
-       based on the path specified in the request. 
- 
-       Legacy behavior: If the request path uses "/select" but no Request 
-       Handler has that name, and if handleSelect="true" has been specified in 
-       the requestDispatcher, then the Request Handler is dispatched based on 
-       the qt parameter.  Handlers without a leading '/' are accessed this way 
-       like so: http://host/app/[core/]select?qt=name  If no qt is 
-       given, then the requestHandler that declares default="true" will be 
-       used or the one named "standard". 
- 
-       If a Request Handler is declared with startup="lazy", then it will 
-       not be initialized until the first request that uses it. 
- 
-    --> 
- 
-<requestHandler name="/replication" class="solr.ReplicationHandler" > 
-    <lst name="master"> 
-        <str name="enable">${enable.master:false}</str> 
-        <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. --> 
-        <str name="replicateAfter">startup</str> 
-        <str name="replicateAfter">commit</str> 
-         
-        <!--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to specify this --> 
-        <str name="commitReserveDuration">00:00:10</str> 
-    </lst> 
- 
-    <lst name="slave"> 
- 
-        <str name="enable">${enable.slave:false}</str> 
- 
-        <!--fully qualified url to the master core. It is possible to pass on this as a request param for the fetchindex command--> 
-        <str name="masterUrl">http://smesearch:7070/sme/replication</str> 
- 
-        <!--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. 
-                                   But a fetchindex can be triggered from the admin or the http API --> 
-        <str name="pollInterval">00:00:20</str> 
-        <!--The following values are used when the slave connects to the master to download the index files. 
-                                   Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify 
-         these unless the bandwidth is extremely low or if there is an extremely high latency--> 
-        <str name="httpConnTimeout">5000</str> 
-        <str name="httpReadTimeout">10000</str> 
- 
-        <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following --> 
-        <str name="httpBasicAuthUser">solr</str> 
-        <str name="httpBasicAuthPassword"> <SOLR USER PASSWORD>  </str> 
- 
-     </lst> 
-</requestHandler> 
-</code> 
-Please note the use of the smesearch dns name for masterUrl. If you have a different dns name please update the above configuration accordingly.  
- 
-== Define Master and Slaves == 
-Each solr instance is configured to be able to act as either a master or a slave. In order to define the state of each we will use core properties.  
- 
-On smesql01 to make it master add the following two lines at the bottom of /smedata/sme_solr/sme/core.properties 
- 
-<code> 
-enable.master=true 
-enable.slave=false 
-</code> 
- 
-On smesql01 to make it a slave, dd the following two lines at the bottom of /smedata/sme_solr/sme/core.properties 
- 
-<code> 
-enable.master=false 
-enable.slave=true 
-</code> 
- 
-Finally, we will restart solr on both hosts in order to have those changes take effect: 
- 
-<code> 
-systemctl restart jetty 
-</code> 
- 
- 
-==== Part III ==== 
-=== Using Keepalived to manage VIP and automatic failover === 
- 
-Like with the mysql database failover, we will use the opensource application [[http://www.keepalived.org/|keepalived]] to provide management of the VIP, as well as provide automated failover in the case of an outage of the server. 
- 
-We will update our existing configuration to add the new vip in: /etc/keepalived/keepalived.conf like so: 
- 
-** Note: Update the first line below to replace all items in < > with your environment specific entries ** 
- 
-== smesql01 keepalived.conf == 
-<code> 
-! Configuration File for keepalived 
- 
-global_defs { 
-  notification_email { 
-  } 
-  vrrp_skip_check_adv_addr 
-  vrrp_strict 
-  vrrp_garp_interval 0 
-  vrrp_gna_interval 0 
-  enable_script_security 
-} 
- 
-vrrp_script chk_mariadb { 
-  script "/sbin/pidof mysqld" 
-  interval 2 
-  rise 5 
-  fall 5 
-} 
- 
-vrrp_script chk_memcache { 
-  script "/sbin/pidof memcached" 
-  interval 2 
-  rise 5 
-  fall 5 
-} 
- 
-#### update to add solr check script here #### 
-vrrp_script chk_solr { 
-  script "/sbin/pidof java" 
-  interval 2 
-  rise 5 
-  fall 5 
-} 
- 
-vrrp_instance DB { 
-  state MASTER 
-  interface eth0 
-  virtual_router_id 51 
-  priority 105 
-  nopreempt 
-  virtual_ipaddress { 
-    <db VIP address - ex: '10.10.10.1'> 
-  } 
-  track_script { 
-    chk_mariadb 
-  } 
-  authentication { 
-    auth_type PASS 
-    auth_pass <8 character password>  
-  } 
-  notify "/usr/libexec/keepalived/keepalived_state.sh" root 
-} 
- 
-vrrp_instance MEMCACHE { 
-  state MASTER 
-  interface eth0 
-  virtual_router_id 61 
-  priority 105 
-  nopreempt 
-  virtual_ipaddress { 
-    <memcached VIP address - ex: '10.10.10.2'> 
-  } 
-  track_script { 
-    chk_memcache 
-  } 
-  authentication { 
-    auth_type PASS 
-    auth_pass <8 character password>  
-  } 
-  notify "/usr/libexec/keepalived/keepalived_state.sh" root 
-} 
- 
-#### update to add solr vip configuration here #### 
-vrrp_instance SOLR { 
-  state MASTER 
-  interface eth0 
-  virtual_router_id 71 
-  priority 105 
-  nopreempt 
-  virtual_ipaddress { 
-    <solr VIP address - ex: '10.10.10.3'> 
-  } 
-  track_script { 
-    chk_solr 
-  } 
-  authentication { 
-    auth_type PASS 
-    auth_pass <8 character password>  
-  } 
-  notify "/usr/libexec/keepalived/keepalived_state.sh" root 
-} 
-</code> 
- 
- 
-** Note: Update the first line below to replace all items in < > with your environment specific entries ** 
-== smesql02 keepalived.conf == 
-<code> 
-! Configuration File for keepalived 
- 
-global_defs { 
-  notification_email { 
-  } 
-  vrrp_skip_check_adv_addr 
-  vrrp_strict 
-  vrrp_garp_interval 0 
-  vrrp_gna_interval 0 
-  enable_script_security 
-} 
- 
-vrrp_script chk_mariadb { 
-  script "/sbin/pidof mysqld" 
-  interval 2 
-  rise 5 
-  fall 5 
-} 
- 
-vrrp_script chk_memcache { 
-  script "/sbin/pidof memcached" 
-  interval 2 
-  rise 5 
-  fall 5 
-} 
- 
-#### update to add solr check script here #### 
-vrrp_script chk_solr { 
-  script "/sbin/pidof java" 
-  interval 2 
-  rise 5 
-  fall 5 
-} 
- 
-vrrp_instance DB { 
-  state BACKUP 
-  interface eth0 
-  virtual_router_id 51 
-  priority 100 
-  nopreempt 
-  virtual_ipaddress { 
-    <db VIP address - ex: '10.10.10.1'> 
-  } 
-  track_script { 
-    chk_mariadb 
-  } 
-  authentication { 
-    auth_type PASS 
-    auth_pass <8 character password>  
-  } 
- notify "/usr/libexec/keepalived/keepalived_state.sh" root 
-} 
-vrrp_instance MEMCACHE { 
-  state BACKUP 
-  interface eth0 
-  virtual_router_id 61 
-  priority 100 
-  nopreempt 
-  virtual_ipaddress { 
-    <memcached VIP address - ex: '10.10.10.2'> 
-  } 
-  track_script { 
-    chk_memcache 
-  } 
-  authentication { 
-    auth_type PASS 
-    auth_pass <8 character password>  
-  } 
-  notify "/usr/libexec/keepalived/keepalived_state.sh" root 
-} 
- 
-#### update to add solr vip configuration here #### 
-} 
-vrrp_instance SOLR { 
-  state BACKUP 
-  interface eth0 
-  virtual_router_id 71 
-  priority 100 
-  nopreempt 
-  virtual_ipaddress { 
-    <solr VIP address - ex: '10.10.10.3'> 
-  } 
-  track_script { 
-    chk_solr 
-  } 
-  authentication { 
-    auth_type PASS 
-    auth_pass <8 character password>  
-  } 
-  notify "/usr/libexec/keepalived/keepalived_state.sh" root 
-} 
-</code> 
- 
-== Restart Keepalived == 
-We will now restart keepalived to apply the new configuration. 
-If this is a running production enviornment, please take care to shutdown keepalived on the slave, restart the master and then start the slave, otherwise there will be a relection and failover of mysql and memcache during this restart 
- 
-<code> 
-systemstl restart keepalived 
-</code> 
- 
- 
-== Keepalived Notes == 
- 
-== State == 
-For the State files: all running instances of keepalived will be in one of 3 states: 
-MASTER = currently responsible for that VIP, and will be actively responding to traffic directed to the VIP 
-BACKUP = in standby, waiting to take over the VIP if the master is no longer in a MASTER state 
-FAULT = after our check scripts have failed (due to the service no longer running), it moves to a fault state and is not eligible to be in a MASTER or BACKUP role.  
- 
-== Check scripts == 
- 
-By default we are using the following details for our regular checks to validate that the services are running 
- 
-<code> 
-vrrp_script chk_solr { 
- script "/sbin/pidof java" 
- interval 2 
- rise 5 
- fall 5 
-} 
-</code> 
-This means that every 2 seconds (interval 2), we will run a check to see if the java process is running (//script "/sbin/pidof java"//). If it fails for 5 consecutive checks (//fall 5//) than that instance will move to a FAULT state.  
-In addition, if it passes for 5 consecutive checks (//rise 5//), it then moves out of a fault state.  
- 
-In the //vrrp_instance// section, we also set the attribute of //nopreempt//. This means that if smesql01 is in a MASTER state (as defined by the config file), and it moves to a FAULT state, when it exits that FAULT state it will move into BACKUP state.  
- 
-smesql01 will not become master again until either 
-1) smesql02 enters a FAULT state (or the machine is no longer running) or 
-2) You restart keepalived in order to reset the state and force it back to MASTER status (//# systemctl restart keepalived//).  
- 
-There is no additional benefit or risk of leaving smesql02 in a MASTER state, so it is recommended you retain these default settings.  
- 
-==== Part IV ==== 
- 
- 
-=== Configure the application servers === 
-We will now update the application servers to point to the new VIP for search.  
- 
-Login to your web interface as the **appladmin** user. 
- 
-Go to 
-Settings > Search Integrations 
- 
-Replace the solr uri as follows: 
-<code> 
-http://smesearch:7070/sme/ 
-</code> 
- 
-Then click "Test Settings" to verify and finally "Update Settings" to apply.  
- 
-==== Part V ==== 
- 
-=== Failover and Recovery === 
- 
-In the case of an outage of the solr service, or the smesql01 service, keepalived will fail over traffic to the solr instance running on smesql02.  
- 
-This process is all automatic. All new solr read and writes will now occur on the smesql02 server without any intervention.  
- 
-However, when smesq01 server/solr service is available again, we will not fail traffic back over in the other direction, as the smesql01 database will NOT contain any of the new indexes created during the outage. Solr replication is setup to run in only one direction at a time, so unlike the mysql setup, the smesql01 solr service will not automatically copy back any changes.  
- 
-In order to get the master up to date you will need to change the master/slave status of each host.  
- 
-On smesql01 we will update the /smedata/sme_solr/sme/core.properties file and change master/slave status 
- 
-<code> 
-enable.master=false 
-enable.slave=true 
-</code> 
- 
-On smesql02 /smedata/sme_solr/sme/core.properties we will do the opposite.  
- 
-<code> 
-enable.master=true 
-enable.slave=false 
-</code> 
- 
-Finally, we will restart solr on both hosts in order to have those changes take effect: 
- 
-<code> 
-systemctl restart jetty 
-</code> 
- 
-This will switch the status and start replicating data from smesql02 (the new master) over to smesql01 (new slave).  
- 
-We can check replicatin status on the hosts via this webpage: 
-http://<smesql01/smesql02>:7070/#/sme/replication 
- 
-Once both have the same Version/Gen number they are in sync.  
- 
-From there you can then leave the hosts in their current master/slave state, or revert back by adjusting the /smedata/sme_solr/sme/core.properties and restarting.  
- 
-Do not fail back over the keepalived vip until replication is back in sync and you make this update to make smesql01 master again.