Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
cloudappliance/appliance-troubleshooting [2020_05_04 16:13] – [Error Log] steven | cloudappliance:appliance-troubleshooting [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | # Appliance Troubleshooting | ||
- | Last updated on April 6, 2020. | ||
- | |||
- | This document covers troubleshooting of the Enterprise File Fabric Server. The Server runs as a virtual machine which may be scaled out horizontally (run as multiple instances). | ||
- | |||
- | The virtual appliance is a hardened CentOS image with: | ||
- | * Enterprise File Fabric Engine components | ||
- | * MariaDB database server | ||
- | * Apache Solr search engine | ||
- | |||
- | The most common deployment scenarios are: | ||
- | * Single server with database - with the File Fabric Engine and database server on a single instance | ||
- | * Single server separate database - The File Fabric Engine runs on one instance. The database runs separately either as an additional instance or through a database-as-as-service. | ||
- | * High availability - The File Fabric Engine runs on multiple instances behind a load balancer. A high availability database cluster or database-as-as-service is used. | ||
- | * The Apache Solr service is optional, but required to support content search. It should run on a separate machine instance. | ||
- | |||
- | ## Prerequisites | ||
- | |||
- | Before you begin troubleshooting and checking the Enterprise File Fabric Server you should have the following information: | ||
- | |||
- | * Server domain name | ||
- | * Are multiple servers being used? | ||
- | * Load balancer domain name | ||
- | * All server instance hostnames or IP addresses | ||
- | * Is Cloud FTP being used? | ||
- | * Is ClamAV being used? | ||
- | * Passwords: | ||
- | * smeconfigure password(s) | ||
- | * root password(s) | ||
- | * database credentials (optional) | ||
- | |||
- | ## Accessible from Network | ||
- | |||
- | The web server should start up automatically and be accessible when the appliance server starts. From a remote machine use a browser or the following command to test connectivity to the web server. Curl is available on Linux and Mac. | ||
- | |||
- | curl -k https:// | ||
- | |||
- | If port 443 is not responding, and to validate network performance, | ||
- | |||
- | ping hostname | ||
- | |||
- | ## Command Line Access | ||
- | |||
- | If the server is reachable remote shell into the system as smeconfiguser. | ||
- | |||
- | Run this command from the command line of a machine that has the ssh utility installed (Linux or Mac), or run the equivalent using a Windows tool like putty. | ||
- | |||
- | ssh smeconfiguser@hostname | ||
- | |||
- | On success you will see a Linux command prompt. Unless otherwise noted commands in this document can be run as smeconfiguser. | ||
- | |||
- | To open a shell as root or change to the user smestorage use the command su. You cannot log into the machine remotely directly as root: | ||
- | |||
- | su root | ||
- | su smestorage | ||
- | |||
- | ### Connection Refused | ||
- | |||
- | If the password fails several times you will be locked out. | ||
- | |||
- | ssh: connect to host 10.0.10.194 port 22: Connection refused | ||
- | |||
- | To verify, log in via the console as root and execute: | ||
- | |||
- | iptables -L f2b-SSH -n | ||
- | |||
- | If your IP address is locked you can unlock via fail2ban (as root): | ||
- | |||
- | fail2ban-client set ssh-iptables unbanip 192.168.1.1 | ||
- | |||
- | ## Appliance Logs | ||
- | |||
- | ### Log | ||
- | |||
- | This is a general log file for the appliance. | ||
- | |||
- | tail -f / | ||
- | |||
- | You should see the last few lines of the log file, and new lines should appear from time to time as the appliance is used. Lines containing the word " | ||
- | |||
- | The tail -f command will run until you terminate it (Ctrl-c). | ||
- | |||
- | ### Error Log | ||
- | |||
- | These files are created the first time an error is received: | ||
- | |||
- | tail -f / | ||
- | tail -f / | ||
- | |||
- | You may see the last few lines of the log file, and new lines may appear from time to time as the appliance is used. Lines containing the word " | ||
- | |||
- | ### Upload Log | ||
- | |||
- | This log provides information on upload threads including M-Stream. | ||
- | |||
- | tail -f / | ||
- | | ||
- | |||
- | |||
- | ### Email Log | ||
- | |||
- | tail -f / | ||
- | |||
- | Sent emails “To” addresses and “Subject”s are logged here. | ||
- | |||
- | ### Cron Job Log | ||
- | |||
- | Cron jobs kick off housekeeping services and background tasks | ||
- | |||
- | tail -f / | ||
- | |||
- | |||
- | |||
- | ### Log Archive | ||
- | |||
- | An archive of logs can be found at: | ||
- | |||
- | / | ||
- | |||
- | ### Log Rotation | ||
- | |||
- | A log rotation and archive script runs under cron. Logs are removed after 30 days. | ||
- | |||
- | / | ||
- | |||
- | The configuration file is at: | ||
- | |||
- | / | ||
- | |||
- | ## Ports In Use | ||
- | |||
- | To resolve a port conflict or to determine what ports are in use by what service use: | ||
- | |||
- | netstat -plnt | ||
- | |||
- | ## CPU Check | ||
- | |||
- | Check CPU usage or check for a runaway process using top. Investigate processes maxing out CPU over three refreshes. | ||
- | |||
- | top | ||
- | |||
- | In the third line, which is labelled " | ||
- | |||
- | ## Memory Check | ||
- | |||
- | Look for memory issues with top. | ||
- | |||
- | top | ||
- | |||
- | In the fourth line, which is labelled "KiB Mem :" or " | ||
- | |||
- | ## Disk Space Check | ||
- | |||
- | df -h | ||
- | |||
- | In the " | ||
- | |||
- | ## Disk Space procedure | ||
- | |||
- | You can check the disk space by running the command | ||
- | |||
- | df -kh | ||
- | | ||
- | If a table has run out of memory you will see errors in the SME Error Logs | ||
- | |||
- | / | ||
- | |||
- | If you have configured a notification email, then you will receive notification email with the errors. | ||
- | |||
- | If you ran out of diskspace please see the instruction below: | ||
- | |||
- | DB! Table ' | ||
- | Symptom - You open the configured SME appliance url in a browser and see an empty page | ||
- | |||
- | ## Action to Bring up the Appliance | ||
- | |||
- | ### Increase Disk Size | ||
- | |||
- | To increase the diskspace on SME appliance see the recipe to increase disk space: https:// | ||
- | |||
- | ### Repair the Database | ||
- | |||
- | ssh into the appliance as smeconfiguser | ||
- | |||
- | Backup the database, this is the easiest way to find the crashed tables. | ||
- | |||
- | mysqldump -u smestore -p --opt smestorage > smestorage.sql | ||
- | |||
- | IF YOU GET AN ERROR MESSAGE INDICATING A CRASHED TABLE: | ||
- | |||
- | ssh to appliance and run the following command | ||
- | |||
- | mysql -u smestore -p smestorage | ||
- | |||
- | Enter the password | ||
- | |||
- | Make sure the database is smestorage | ||
- | |||
- | use smestorage | ||
- | |||
- | And then repair the table that has crases | ||
- | |||
- | repair table < | ||
- | |||
- | Go back to the database backup step until the backup completes without errors. | ||
- | |||
- | ### Delete compiled templates | ||
- | |||
- | SME uses compiled templates. If disk space is low the templates can become corrupted. To fix this: | ||
- | |||
- | ssh in to appliance as smeconfiguser | ||
- | |||
- | Then sudo as root and then smestorage linux user. | ||
- | |||
- | su - root | ||
- | su - smestorage | ||
- | |||
- | Go to the templates directory | ||
- | |||
- | cd / | ||
- | |||
- | Delete all the compiled templates by executing the following command | ||
- | |||
- | rm *.tpl.php | ||
- | |||
- | This should help you get your Appliance back online. | ||
- | |||
- | ## Find Problem Files | ||
- | |||
- | This command finds the top 50 files above 10M: | ||
- | |||
- | find / -xdev -type f -size +10M -exec du -sh {} ';' | ||
- | |||
- | ## Process Check | ||
- | |||
- | Check that the following services are healthy (more detail in sections below) | ||
- | |||
- | systemctl status php-fpm | ||
- | systemctl status httpd | ||
- | systemctl status jetty | ||
- | systemctl status crond | ||
- | systemctl status mariadb | ||
- | systemctl status memcached | ||
- | |||
- | ## PHP-FPM Service | ||
- | |||
- | If you see this error check status of PHP-FPM: | ||
- | |||
- | The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later. | ||
- | |||
- | You can do this by issuing this command: | ||
- | |||
- | systemctl status php-fpm | ||
- | |||
- | You'll see the real status of the service, it may have hung and the status will be active - either way it's a good idea to restart the PHP-FPM service. You will need root access (or use sudo). | ||
- | |||
- | sudo systemctl restart php-fpm | ||
- | |||
- | ### Logs | ||
- | |||
- | / | ||
- | |||
- | ## Apache HTTP Server (Web Server) | ||
- | |||
- | Check that the Apache HTTP Server is running. su to root. | ||
- | |||
- | systemctl status httpd | ||
- | |||
- | To start the HTTP Server | ||
- | |||
- | systemctl status httpd | ||
- | |||
- | To stop the HTTP Server | ||
- | |||
- | systemctl status httpd | ||
- | |||
- | ### Configuration | ||
- | |||
- | HTTPD server configuration files are located in the following two directories: | ||
- | |||
- | / | ||
- | |||
- | / | ||
- | |||
- | ### Logs | ||
- | |||
- | Apache Httpd server logs are located at: | ||
- | |||
- | / | ||
- | |||
- | tail / | ||
- | tail / | ||
- | tail / | ||
- | tail / | ||
- | tail / | ||
- | tail / | ||
- | tail / | ||
- | tail / | ||
- | |||
- | ## Memcache | ||
- | |||
- | If your memcached stops working or it hangs then this manifests itself as users being unable to upload files. | ||
- | |||
- | When you’ll try to upload something you will get a message: | ||
- | |||
- | **Can not find uploading process meta data.** | ||
- | |||
- | This means a record could not be added to memcached and because of that the upload failed. | ||
- | |||
- | To solve this as a root issue the following command: | ||
- | |||
- | systemctl restart memcached | ||
- | |||
- | After that you can also check the service status: | ||
- | |||
- | systemctl status memcached | ||
- | |||
- | You should see something similar to the below: | ||
- | |||
- | Active: active (running) since Thu 2016-08-25 13:30:00 BST; 1s ago | ||
- | |||
- | ## Jetty / Apache Solr / PDF Annotation | ||
- | |||
- | The Jetty service is used for Apache Solr and PDF Annotation. It runs as a Java process, by default listening on localhost port 7070. | ||
- | |||
- | Check the health of jetty using the command: | ||
- | |||
- | systemctl status jetty | ||
- | |||
- | You should see a few lines of output including one that says, " | ||
- | |||
- | To check that Apache Solr is running and responsive on the appliance run: | ||
- | |||
- | curl -u solr: | ||
- | |||
- | ### Configuration Files | ||
- | |||
- | / | ||
- | |||
- | / | ||
- | |||
- | See https:// | ||
- | |||
- | ### Logs | ||
- | |||
- | / | ||
- | |||
- | ### Production | ||
- | |||
- | Note: For production Apache Solr should be running on a separate instance to the Enterprise File Fabric Server (Web Tier). | ||
- | |||
- | ### Access Solr Admin GUI Remotely | ||
- | |||
- | To access the Solr admin from another machine: | ||
- | |||
- | Add this line to / | ||
- | |||
- | -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 7070 -j ACCEPT | ||
- | |||
- | Restart iptables | ||
- | |||
- | systemctl reload iptables | ||
- | |||
- | Comment out the line in / | ||
- | |||
- | # | ||
- | |||
- | From a browser: | ||
- | |||
- | http:// | ||
- | |||
- | ## Cloud FTP Service (CloudFTP/ | ||
- | |||
- | By default the File Fabric Cloud FTP service is configured to run an FTP service on port 21 and FTPS (FTP over SSL) service on port 990. See [[cloudappliance/ | ||
- | |||
- | Check the health of the Cloud FTP service using the service command: | ||
- | |||
- | systemctl status cloudftp | ||
- | |||
- | You should see a few lines of output including one that says, " | ||
- | |||
- | To start the service: | ||
- | |||
- | systemctl start cloudftp | ||
- | |||
- | To stop the service: | ||
- | |||
- | systemctl stop cloudftp | ||
- | |||
- | ### Client Testing | ||
- | |||
- | sftp -v user@hostname | ||
- | |||
- | ### Configuration | ||
- | |||
- | / | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ## Cloud SFTP Service | ||
- | |||
- | The Cloud SFTP service implements the SSH File Transfer Protocol for File Fabric. By default it's available on port 2200. See [[cloudappliance/ | ||
- | |||
- | ### Configuration | ||
- | |||
- | / | ||
- | |||
- | ### Restart | ||
- | | ||
- | To restart the service after a configuration change: | ||
- | |||
- | systemctl start cloudftp | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ## Local FTP Service | ||
- | |||
- | For production use this service should be removed along with the demo accounts. | ||
- | |||
- | An appliance FTP server listens by default on IP Address 127.0.0.1 and port 2001. It is used for default storage for the clouduser. | ||
- | |||
- | Status of local ftp service | ||
- | |||
- | systemctl status vsftpd | ||
- | |||
- | Start FTP Server | ||
- | |||
- | systemctl start vsftpd | ||
- | |||
- | To stop | ||
- | |||
- | systemctl stop vsftpd | ||
- | |||
- | ### FTP Server Configuration | ||
- | |||
- | / | ||
- | |||
- | ## Cron Service | ||
- | |||
- | The cron service executes cron jobs that roll logs and kick off periodic tasks for the system such as daily maintenance tasks. These scripts should be run only once in a multi-server environment. | ||
- | |||
- | For version 1705.00 and above cron runs on all instances using cronmutex.php to make sure only one is executed: | ||
- | |||
- | php / | ||
- | |||
- | Check the health of cron using the service command: | ||
- | |||
- | systemctl status crond | ||
- | |||
- | You should see a few lines of output including one that says, " | ||
- | |||
- | If it is not then you should enable it: | ||
- | |||
- | systemctl start crond | ||
- | |||
- | ### Logs | ||
- | |||
- | / | ||
- | / | ||
- | |||
- | ### Configuration | ||
- | |||
- | To see cron jobs for a server: | ||
- | |||
- | crontab -u smestorage -l | ||
- | |||
- | ### Note | ||
- | |||
- | These cron jobs use scripts in: | ||
- | |||
- | / | ||
- | / | ||
- | |||
- | To see crontab jobs run as root (currently only freshclam): | ||
- | |||
- | cat / | ||
- | |||
- | ## CloudDAV | ||
- | |||
- | CloudDAV is our implementation of WebDAV on top of the File Fabric. It runs as a CGI script from / | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ### Configuration | ||
- | |||
- | / | ||
- | |||
- | ## Cloud S3 | ||
- | |||
- | Cloud S3 is our implementation of an Amazon S3 compatible API on top of the File Fabric. | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ### Configuration | ||
- | |||
- | # See < | ||
- | # DocumentRoot / | ||
- | / | ||
- | |||
- | |||
- | ## ClamAV Virus Scanner | ||
- | |||
- | The Antivirus scanner ClamAV is included with the appliance and can be used to check all uploaded files. | ||
- | |||
- | In order to be used virus scanning must be enabled on a per-organization (tenant) basis through Organization Policies. Scanning of individual uploaded files can be verified through the audit log if logging of File add/updates is turned on. | ||
- | |||
- | The ClamAV process is called < | ||
- | |||
- | Check the health of the ClamAV scanner using the service command: | ||
- | |||
- | systemctl status clamd@scan | ||
- | |||
- | In High Availability configurations each appliance leverages its local copy of ClamAV. | ||
- | |||
- | ### Error Messages | ||
- | |||
- | If a file is uploaded, antivirus scanning is enabled, and the daemon is not running, the user will see the following message: | ||
- | |||
- | > Uploading of 1 files failed | ||
- | > | ||
- | > [Restart] [Cancel] | ||
- | > | ||
- | > Seems file is not uploaded. Uploading in progress? | ||
- | > | ||
- | > [Close] | ||
- | |||
- | Verify scans are being successful through the audit trail. The policy “Audit File add/ | ||
- | |||
- | File sme-solution-brief.pdf uploaded to My Cloud files/ | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ### Configuration | ||
- | |||
- | / | ||
- | |||
- | https:// | ||
- | |||
- | ### ClamAV Antivirus Database Updater | ||
- | |||
- | Virus definitions are updated once an hour (see / | ||
- | |||
- | freshclam | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ### Configuration | ||
- | |||
- | / | ||
- | |||
- | |||
- | |||
- | The appliance includes an email server | ||
- | |||
- | / | ||
- | |||
- | ## License (Enterprise File Fabric) | ||
- | |||
- | This error on attempted login indicates problems with the license key including not present or expired. | ||
- | |||
- | > Sorry, Organization accounts are not supported. No valid key. Contact with your administrator. | ||
- | |||
- | The license is configured for each appliance through the Appliance Administration interface under Settings > License Key. This is reached by logging in as the appladmin user at https:// | ||
- | |||
- | The appliance license can also be viewed and changed from within the appliance at: | ||
- | |||
- | / | ||
- | |||
- | High Availability: | ||
- | |||
- | ## Version | ||
- | |||
- | There are several ways to check the version of the appliance. | ||
- | |||
- | ### Appliance Administration | ||
- | |||
- | Log in as the appliance admin. From the hamburger menu under the menu “Admin” see the appliance version and build number. | ||
- | |||
- | System version: 1803.02 | ||
- | |||
- | Version build: 2018022700008 | ||
- | | ||
- | (The hotfix number (.xx) is only shown for versions 1803.00 and above) | ||
- | |||
- | ### Command line | ||
- | |||
- | 1) From the shell use the System Package Manager | ||
- | |||
- | yum info sme-ff-filefabric.x86_64 | ||
- | |||
- | 2) From the database | ||
- | |||
- | | ||
- | |||
- | 3) From the shell as smeconfiguser run the alias: | ||
- | |||
- | | ||
- | |||
- | ## Upgrade Backups | ||
- | |||
- | / | ||
- | |||
- | Keeps copy of public_html after upgrades. | ||
- | |||
- | ## Database Service | ||
- | |||
- | If the database is running locally check the service is running: | ||
- | |||
- | systemctl status mariadb | ||
- | |||
- | You should see a few lines of output including one that says, " | ||
- | |||
- | You can log into the local database through: | ||
- | |||
- | mysql | ||
- | |||
- | You should be successful or see an error message like" | ||
- | |||
- | If you do not then the database is not accessible from this machine. For remote databases this may be a network issue. | ||
- | |||
- | ### Local Database Service | ||
- | |||
- | If there is something wrong with the MariaDB Server you will most likely see this page when attempting to access the cloud file manager: | ||
- | |||
- | > It seems we encountered a problem. Please contact support and provide as much details as possible as to how this occurred. | ||
- | > | ||
- | > Thanks, and apologies for any inconvenience. | ||
- | > | ||
- | > Please first check the mysql status: | ||
- | > | ||
- | > | ||
- | |||
- | If the service is up and running it is likely there are some corrupted tables due to a power outage. You can see next step how to fix them. | ||
- | |||
- | Or you can try to restart the service – after restart mysql will check the state of tables and will try to repair them | ||
- | |||
- | systemctl restart mariadb | ||
- | |||
- | ### Backup | ||
- | |||
- | You can backup the database using the following command: | ||
- | |||
- | mysqldump smestorage > | ||
- | |||
- | ### Configuration | ||
- | |||
- | /etc/my.cnf | ||
- | |||
- | ## Fail2Ban | ||
- | |||
- | The SME Appliance ships with a customized version of Fail2Ban (http:// | ||
- | |||
- | Fail2Ban scans logs file for malicious patterns ie. DoS attacks, too many password failures, SSH logins, seeking exploits, trying to scan for download links etc. | ||
- | |||
- | If a malicious pattern is detected it automatically updates the firewall rules to reject IP addresses for a specified amount of time (10 minutes). Fail2Ban is constantly working and scanning providing extra protection for the appliance. | ||
- | |||
- | ### Log | ||
- | |||
- | / | ||
- | |||
- | ### Unbanning an IP Address | ||
- | |||
- | Once you are the root user we'll need to find the IP address that was banned, and then un-ban it. To do this run : | ||
- | |||
- | iptables -L f2b-SSH -n | ||
- | |||
- | In that list you may see your IP address. With that IP address we then want to run (swap the IP address with your one): | ||
- | |||
- | fail2ban-client set ssh-iptables unbanip 192.168.1.1 | ||
- | |||
- | Your IP address should now be unbanned. | ||
- | |||
- | https:// | ||
- | |||
- | ## Appliance Backups | ||
- | |||
- | We recommend customers use tools from the hypervisor vendor or third-parties to backup the appliance and database. | ||