Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cloudappliance/appliance-troubleshooting [2020_05_04 16:13] – [Error Log] stevencloudappliance:appliance-troubleshooting [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-# Appliance Troubleshooting 
-Last updated on April 6, 2020. 
- 
-This document covers troubleshooting of the Enterprise File Fabric Server. The Server runs as a virtual machine which may be scaled out horizontally (run as multiple instances). 
- 
-The virtual appliance is a hardened CentOS image with: 
-   * Enterprise File Fabric Engine components 
-   * MariaDB database server 
-   * Apache Solr search engine 
- 
-The most common deployment scenarios are: 
-  * Single server with database - with the File Fabric Engine and database server on a single instance 
-  * Single server separate database - The File Fabric Engine runs on one instance. The database runs separately either as an additional instance or through a database-as-as-service. 
-  * High availability - The File Fabric Engine runs on multiple instances behind a load balancer. A high availability database cluster or database-as-as-service is used. 
-  * The Apache Solr service is optional, but required to support content search. It should run on a separate machine instance. 
- 
-## Prerequisites 
- 
-Before you begin troubleshooting and checking the Enterprise File Fabric Server you should have the following information: 
- 
-   * Server domain name 
-   * Are multiple servers being used? 
-     * Load balancer domain name 
-     * All server instance hostnames or IP addresses 
-  * Is Cloud FTP being used? 
-  * Is ClamAV being used? 
-  * Passwords: 
-    * smeconfigure password(s) 
-    * root password(s) 
-    * database credentials (optional) 
- 
-## Accessible from Network 
- 
-The web server should start up automatically and be accessible when the appliance server starts. From a remote machine use a browser or the following command to test connectivity to the web server. Curl is available on Linux and Mac. 
- 
-    curl -k https://hostname | head -n 20 
- 
-If port 443 is not responding, and to validate network performance, run ping from a remote machine. Ping is available on Linux, Windows and Mac. 
- 
-    ping hostname 
- 
-## Command Line Access 
- 
-If the server is reachable remote shell into the system as smeconfiguser. 
- 
-Run this command from the command line of a machine that has the ssh utility installed (Linux or Mac), or run the equivalent using a Windows tool like putty.  You may be prompted about the authenticity of the host. If so, answer 'yes'. You will be prompted for the password. 
- 
-    ssh smeconfiguser@hostname 
- 
-On success you will see a Linux command prompt. Unless otherwise noted commands in this document can be run as smeconfiguser.  
- 
-To open a shell as root or change to the user smestorage use the command su. You cannot log into the machine remotely directly as root: 
- 
-    su root 
-    su smestorage 
- 
-### Connection Refused 
- 
-If the password fails several times you will be locked out. 
- 
-    ssh: connect to host 10.0.10.194 port 22: Connection refused 
- 
-To verify, log in via the console as root and execute: 
- 
-    iptables -L f2b-SSH -n 
- 
-If your IP address is locked you can unlock via fail2ban (as root): 
- 
-    fail2ban-client set ssh-iptables unbanip 192.168.1.1 
- 
-## Appliance Logs 
- 
-### Log 
- 
-This is a general log file for the appliance. 
- 
-    tail -f /var/www/smestorage/sitelogs/logits.txt 
- 
-You should see the last few lines of the log file, and new lines should appear from time to time as the appliance is used. Lines containing the word "Error" indicate a possible problem with the way the appliance has been set up or is being used. 
- 
-The tail -f command will run until you terminate it (Ctrl-c). 
- 
-### Error Log 
- 
-These files are created the first time an error is received: 
- 
-    tail -f /var/www/smestorage/sitelogs/errorlogs.txt 
-    tail -f /var/www/smestorage/sitelogs/errorlogs_trace.txt 
- 
-You may see the last few lines of the log file, and new lines may appear from time to time as the appliance is used. Lines containing the word "Error" indicate a possible problem with the appliance. The file <code>errorlogs_trace.txt</code> contains a full trace of errors in <code>errorlogs.txt</code>. 
- 
-### Upload Log 
- 
-This log provides information on upload threads including M-Stream. 
- 
-    tail -f /var/www/smestorage/tmp/tempstorage/log.txt 
-     
- 
- 
-### Email Log 
- 
-    tail -f /var/www/smestorage/sitelogs/allemails.txt 
- 
-Sent emails “To” addresses and “Subject”s are logged here. 
- 
-### Cron Job Log 
- 
-Cron jobs kick off housekeeping services and background tasks 
- 
-    tail -f /var/www/smestorage/cron/log.txt 
- 
- 
- 
-### Log Archive 
- 
-An archive of logs can be found at: 
- 
-    /var/www/smestorage/tmp/logsarchive 
- 
-### Log Rotation 
- 
-A log rotation and archive script runs under cron.  Logs are removed after 30 days. 
- 
-    /var/www/smestorage/cron/logroller.pl 
- 
-The configuration file is at: 
- 
-    /var/www/smestorage/cron/config_logroller.conf 
- 
-## Ports In Use 
- 
-To resolve a port conflict or to determine what ports are in use by what service use: 
- 
-    netstat -plnt 
- 
-## CPU Check 
- 
-Check CPU usage or check for a runaway process using top. Investigate processes maxing out CPU over three refreshes. 
- 
-    top 
- 
-In the third line, which is labelled "%Cpu(s):", the fourth number (labelled "id") shows the percentage of CPU cycles that are idle. If this number is less than 10% then your CPU is very busy. If t remains at less than 10% for more than a few seconds then your CPU may be overloaded. Sometimes this indicates that a program is in an error state. 
- 
-## Memory Check 
- 
-Look for memory issues with top. 
- 
-    top 
- 
-In the fourth line, which is labelled "KiB Mem :" or "Mem:" depending on the version of top, the fourth number (labelled "free") shows the amount of free memory in kilobytes. If this number is less than 150,000 then your server is probably low on memory. 
- 
-## Disk Space Check 
- 
-    df -h 
- 
-In the "use" column, a value close to or at 100% indicates a severe lack of space. Generally a value above 89% indicates that space should be cleared. 
- 
-## Disk Space procedure  
- 
-You can check the disk space by running the command 
- 
-    df -kh 
-     
-If a table has run out of memory you will see errors in the SME Error Logs 
- 
-    /var/www/smestorage/sitelogs/errorlogs.txt 
- 
-If you have configured a notification email, then you will receive notification email with the errors. 
- 
-If you ran out of diskspace please see the instruction below: 
-  
-DB! Table './smestorage/TABLE' is marked as crashed and should be repaired 
-Symptom - You open the configured SME appliance url in a browser and see an empty page 
- 
-## Action to Bring up the Appliance 
- 
-### Increase Disk Size 
- 
-To increase the diskspace on SME appliance see the recipe to increase disk space: https://storagemadeeasy.com/wiki/cloudappliance/appladmin 
- 
-### Repair the Database 
- 
-ssh into the appliance as smeconfiguser 
- 
-Backup the database, this is the easiest way to find the crashed tables. 
- 
-    mysqldump -u smestore -p --opt smestorage > smestorage.sql 
- 
-IF YOU GET AN ERROR MESSAGE INDICATING A CRASHED TABLE: 
- 
-ssh to appliance and run the following command 
- 
-    mysql -u smestore -p smestorage 
- 
-Enter the password 
- 
-Make sure the database is smestorage  
- 
-    use smestorage 
- 
-And then repair the table that has crases 
- 
-    repair table <TABLE_NAME> 
- 
-Go back to the database backup step until the backup completes without errors.  
- 
-### Delete compiled templates 
- 
-SME uses compiled templates. If disk space is low the templates can become corrupted. To fix this: 
- 
-ssh in to appliance as smeconfiguser 
- 
-Then sudo as root and then smestorage linux user.  
- 
-    su - root 
-    su - smestorage 
- 
-Go to the templates directory 
- 
-    cd /var/www/smestorage/public_html/smarty/site/templates_c 
- 
-Delete all the compiled templates by executing the following command 
- 
-    rm *.tpl.php 
- 
-This should help you get your Appliance back online. 
- 
-## Find Problem Files 
- 
-This command finds the top 50 files above 10M: 
- 
-    find / -xdev -type f -size +10M -exec du -sh {} ';' | sort -rh | head -n50 
- 
-## Process Check 
- 
-Check that the following services are healthy (more detail in sections below) 
- 
-    systemctl status php-fpm 
-    systemctl status httpd 
-    systemctl status jetty 
-    systemctl status crond 
-    systemctl status mariadb   # if running locally 
-    systemctl status memcached 
- 
-## PHP-FPM Service 
- 
-If you see this error check status of PHP-FPM: 
- 
-    The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later. 
- 
-You can do this by issuing this command: 
- 
-    systemctl status php-fpm 
- 
-You'll see the real status of the service, it may have hung and the status will be active - either way it's a good idea to restart the PHP-FPM service. You will need root access (or use sudo). 
- 
-    sudo systemctl restart php-fpm 
- 
-### Logs 
- 
-    /var/log/php-fpm/error.log 
- 
-## Apache HTTP Server (Web Server) 
- 
-Check that the Apache HTTP Server is running. su to root. 
- 
-    systemctl status httpd 
- 
-To start the HTTP Server 
- 
-    systemctl status httpd 
- 
-To stop the HTTP Server 
- 
-    systemctl status httpd 
- 
-### Configuration 
- 
-HTTPD server configuration files are located in the following two directories: 
- 
-    /etc/httpd/conf 
- 
-    /etc/httpd/conf.d 
- 
-### Logs 
- 
-Apache Httpd server logs are located at: 
- 
-    /etc/httpd/logs 
- 
-    tail /etc/httpd/logs/access_log 
-    tail /etc/httpd/logs/filefabric-error_log 
-    tail /etc/httpd/logs/filefabric-access_log 
-    tail /etc/httpd/logs/ssl_accees_log 
-    tail /etc/httpd/logs/ssl_error_log 
-    tail /etc/httpd/logs/webdav.filefabric-error_log 
-    tail /etc/httpd/logs/webdav.filefabric-access_log 
-    tail /etc/httpd/logs/ssl_webdavfilefabric_log 
- 
-## Memcache 
- 
-If your memcached stops working or it hangs then this manifests itself as users being unable to upload files. 
- 
-When you’ll try to upload something you will get a message: 
- 
-**Can not find uploading process meta data.** 
- 
-This means a record could not be added to memcached and because of that the upload failed. 
- 
-To solve this as a root issue the following command: 
- 
-    systemctl restart memcached 
- 
-After that you can also check the service status: 
- 
-    systemctl status memcached 
- 
-You should see something similar to the below: 
- 
-    Active: active (running) since Thu 2016-08-25 13:30:00 BST; 1s ago 
- 
-## Jetty / Apache Solr / PDF Annotation 
- 
-The Jetty service is used for Apache Solr and PDF Annotation. It runs as a Java process, by default listening on localhost port 7070. 
- 
-Check the health of jetty using the command: 
- 
-    systemctl status jetty 
- 
-You should see a few lines of output including one that says, "Active: active (exited) ". If you do not then the service is not running. This will prevent content search from working. 
- 
-To check that Apache Solr is running and responsive on the appliance run: 
- 
-    curl -u solr:drom6etsh9Onk "http://127.0.0.1:7070/sme/select?q=the&start=0&rows=100&wt=json&indent=true" 
- 
-### Configuration Files 
- 
-    /home/sme/sme_jetty/start.ini 
- 
-    /smedata/sme_solr/solr.xml 
- 
-See https://docs.storagemadeeasy.com/cloudappliance/solr for more information. 
- 
-### Logs 
- 
-    /home/sme/sme_jetty/logs/solr.log 
- 
-### Production 
- 
-Note: For production Apache Solr should be running on a separate instance to the Enterprise File Fabric Server (Web Tier). 
- 
-### Access Solr Admin GUI Remotely 
- 
-To access the Solr admin from another machine: 
- 
-Add this line to /etc/sysconfig/iptables: 
- 
-    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 7070 -j ACCEPT 
- 
-Restart iptables 
- 
-    systemctl reload iptables 
- 
-Comment out the line in /home/sme/sme_jetty/start.ini: 
- 
-    #jetty.host=127.0.0.1 
- 
-From a browser: 
- 
-    http://hostname:7070/  with user solr password drom6etsh9Onk 
- 
-## Cloud FTP Service (CloudFTP/CloudFTPS) 
- 
-By default the File Fabric Cloud FTP service is configured to run an FTP service on port 21 and FTPS (FTP over SSL) service on port 990. See [[cloudappliance/sftpsetup]] for more information. 
- 
-Check the health of the Cloud FTP service using the service command: 
- 
-    systemctl status cloudftp 
- 
-You should see a few lines of output including one that says, "Active: active (exited)". 
- 
-To start the service: 
- 
-    systemctl start cloudftp 
- 
-To stop the service: 
- 
-    systemctl stop cloudftp 
- 
-### Client Testing 
- 
-    sftp -v  user@hostname 
- 
-### Configuration 
- 
-    /var/www/smestorage/ftpserver/ftpserver.conf 
- 
-### Log 
- 
-    /var/www/smestorage/ftpserver/ftpserver.txt 
- 
-## Cloud SFTP Service 
- 
-The Cloud SFTP service implements the SSH File Transfer Protocol for File Fabric. By default it's available on port 2200. See [[cloudappliance/sftpsetup]] for more information. 
- 
-### Configuration 
- 
-    /var/www/smestorage/ftpserver/sftpserver/sftpserver.conf 
- 
-### Restart 
-     
-To restart the service after a configuration change: 
- 
-    systemctl start cloudftp 
- 
-### Log 
- 
-    /var/www/smestorage/ftpserver/sftpserver/log.txt 
- 
-## Local FTP Service 
- 
-For production use this service should be removed along with the demo accounts. 
- 
-An appliance FTP server listens by default on IP Address 127.0.0.1 and port 2001. It is used for default storage for the clouduser. 
- 
-Status of local ftp service 
- 
-    systemctl status vsftpd 
- 
-Start FTP Server 
- 
-    systemctl start vsftpd 
- 
-To stop 
- 
-    systemctl stop vsftpd 
- 
-### FTP Server Configuration 
- 
-    /etc/vsftpd/vsftpd.conf 
- 
-## Cron Service 
- 
-The cron service executes cron jobs that roll logs and kick off periodic tasks for the system such as daily maintenance tasks. These scripts should be run only once in a multi-server environment. 
- 
-For version 1705.00 and above cron runs on all instances using cronmutex.php to make sure only one is executed: 
- 
-    php /usr/bin/cronmutex.php default 716f3 900 && /var/www/smestorage/cron/scheduler_daily.pl 
- 
-Check the health of cron using the service command: 
- 
-    systemctl status crond 
- 
-You should see a few lines of output including one that says, "Active: active (running) ". If you do not then the service is not running. This will prevent some functions from working. 
- 
-If it is not then you should enable it: 
- 
-    systemctl start crond 
- 
-### Logs 
- 
-    /var/log/cron 
-    /var/www/smestorage/cron/log.txt 
- 
-### Configuration 
- 
-To see cron jobs for a server: 
- 
-    crontab -u smestorage -l       
- 
-### Note 
- 
-These cron jobs use scripts in: 
- 
-    /var/www/smestorage/cron 
-    /var/www/smestorage/config/cron/config.conf 
- 
-To see crontab jobs run as root (currently only freshclam): 
- 
-    cat /etc/crontab             
- 
-## CloudDAV 
- 
-CloudDAV is our implementation of WebDAV on top of the File Fabric. It runs as a CGI script from /var/www/smestorage/webdav_html/cgi-bin. 
- 
-### Log 
- 
-    /var/www/smestorage/webdav_html/cgi-bin/log.txt 
- 
-### Configuration 
- 
-    /var/www/smestorage/config/webdav_html/configuration 
- 
-## Cloud S3 
- 
-Cloud S3 is our implementation of an Amazon S3 compatible API on top of the File Fabric. 
- 
-### Log 
- 
-    /var/www/smestorage/ftpserver/sftpserver/log.txt 
- 
-### Configuration 
- 
-    # See <VirtualHost *:80> 
-    #  DocumentRoot /var/www/smestorage/s3_html 
-    /etc/httpd/conf/httpd.conf  
- 
- 
-## ClamAV Virus Scanner 
- 
-The Antivirus scanner ClamAV is included with the appliance and can be used to check all uploaded files. 
- 
-In order to be used virus scanning must be enabled on a per-organization (tenant) basis through Organization Policies. Scanning of individual uploaded files can be verified through the audit log if logging of File add/updates is turned on. 
- 
-The ClamAV process is called <code>/usr/sbin/clamd</code> and runs as a system daemon. When the daemon is running it creates a filesystem socket that is used to communicate to and from the file upload process. 
- 
-Check the health of the ClamAV scanner using the service command: 
- 
-    systemctl status clamd@scan 
- 
-In High Availability configurations each appliance leverages its local copy of ClamAV. 
- 
-### Error Messages 
- 
-If a file is uploaded, antivirus scanning is enabled, and the daemon is not running, the user will see the following message: 
- 
-> Uploading of 1 files failed 
-> 
-> [Restart] [Cancel] 
-> 
-> Seems file is not uploaded. Uploading in progress? 
-> 
-> [Close] 
- 
-Verify scans are being successful through the audit trail. The policy “Audit File add/update” must be enabled: 
- 
-File sme-solution-brief.pdf uploaded to My Cloud files/mybucket. Scanned with antivirus ClamAV 0.99.2/24143/ 
- 
-### Log 
- 
-    /var/log/sme-clamd.log 
- 
-### Configuration 
- 
-    /etc/clamd.d/scan.conf 
- 
-    https://linux.die.net/man/5/clamd.conf  
- 
-### ClamAV Antivirus Database Updater 
- 
-Virus definitions are updated once an hour (see /etc/crontab) with freshclam. To check connection to the online database (and update definitions) run as root: 
- 
-    freshclam 
- 
-### Log 
- 
-    /var/log/freshclam.log 
- 
-### Configuration 
- 
-    /var/www/smestorage/config/clamd/freshclam.conf 
- 
-## Email 
- 
-The appliance includes an email server 
- 
-    /var/log/maillog 
- 
-## License (Enterprise File Fabric) 
- 
-This error on attempted login indicates problems with the license key including not present or expired. 
- 
-> Sorry, Organization accounts are not supported. No valid key. Contact with your administrator. 
- 
-The license is configured for each appliance through the Appliance Administration interface under Settings > License Key. This is reached by logging in as the appladmin user at https://hostname which you can do without a valid license. 
- 
-The appliance license can also be viewed and changed from within the appliance at: 
- 
-    /var/www/smestorage/config/public_html/license.txt 
- 
-High Availability: The license must be configured on every instance. 
- 
-## Version 
- 
-There are several ways to check the version of the appliance.  
- 
-### Appliance Administration 
- 
-Log in as the appliance admin. From the hamburger menu under the menu “Admin” see the appliance version and build number. 
- 
-    System version: 1803.02 
- 
-    Version build: 2018022700008 
-     
-(The hotfix number (.xx) is only shown for versions 1803.00 and above)  
- 
-### Command line 
- 
-1) From the shell use the System Package Manager 
-    
-   yum info sme-ff-filefabric.x86_64 
- 
-2) From the database 
- 
-   mysql> SELECT * FROM smestorage.se_version; 
- 
-3) From the shell as smeconfiguser run the alias: 
- 
-   smeversion 
- 
-## Upgrade Backups 
- 
-    /var/www/smestorage/patches 
- 
-Keeps copy of public_html after upgrades. 
- 
-## Database Service 
- 
-If the database is running locally check the service is running: 
- 
-    systemctl status mariadb 
- 
-You should see a few lines of output including one that says, "Active: active (running) ". If you do not then the service is not running. This will prevent the Cloud File Manager from working. 
- 
-You can log into the local database through: 
- 
-    mysql 
- 
-You should be successful or see an error message like"ERROR 1045 (28000): Access denied for user 'smeconfiguser'@'localhost' (using password: NO)". 
- 
-If you do not then the database is not accessible from this machine. For remote databases this may be a network issue. 
- 
-### Local Database Service 
- 
-If there is something wrong with the MariaDB Server you will most likely see this page when attempting to access the cloud file manager: 
- 
-> It seems we encountered a problem. Please contact support and provide as much details as possible as to how this occurred. 
-> 
-> Thanks, and apologies for any inconvenience. 
- 
-> Please first check the mysql status: 
-> 
->     systemctl status mariadb 
- 
-If the service is up and running it is likely there are some corrupted tables due to a power outage. You can see next step how to fix them. 
- 
-Or you can try to restart the service – after restart mysql will check the state of tables and will try to repair them 
- 
-    systemctl restart mariadb 
- 
-### Backup 
- 
-You can backup the database using the following command: 
- 
-    mysqldump smestorage >smestorage.sql 
- 
-### Configuration 
- 
-    /etc/my.cnf 
- 
-## Fail2Ban 
- 
-The SME Appliance ships with a customized version of Fail2Ban (http://www.fail2ban.org/).  
- 
-Fail2Ban scans logs file for malicious patterns ie. DoS attacks, too many password failures, SSH logins, seeking exploits, trying to scan for download links etc.  
- 
-If a malicious pattern is detected it automatically updates the firewall rules to reject IP addresses for a specified amount of time (10 minutes). Fail2Ban is constantly working and scanning providing extra protection for the appliance. 
- 
-### Log 
- 
-    /var/log/fail2ban.log 
- 
-### Unbanning an IP Address 
- 
-Once you are the root user we'll need to find the IP address that was banned, and then un-ban it. To do this run : 
- 
-    iptables -L f2b-SSH -n 
- 
-In that list you may see your IP address. With that IP address we then want to run (swap the IP address with your one): 
- 
-    fail2ban-client set ssh-iptables unbanip 192.168.1.1 
- 
-Your IP address should now be unbanned. 
- 
-https://www.storagemadeeasy.com/wiki/cloudappliance/bestpractices/  
- 
-## Appliance Backups 
- 
-We recommend customers use tools from the hypervisor vendor or third-parties to backup the appliance and database.