Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cloudappliance/appliance-troubleshooting [2020_05_04 16:13] – [Error Log] steven | cloudappliance:appliance-troubleshooting [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | # Appliance Troubleshooting | ||
| - | Last updated on April 6, 2020. | ||
| - | |||
| - | This document covers troubleshooting of the Enterprise File Fabric Server. The Server runs as a virtual machine which may be scaled out horizontally (run as multiple instances). | ||
| - | |||
| - | The virtual appliance is a hardened CentOS image with: | ||
| - | * Enterprise File Fabric Engine components | ||
| - | * MariaDB database server | ||
| - | * Apache Solr search engine | ||
| - | |||
| - | The most common deployment scenarios are: | ||
| - | * Single server with database - with the File Fabric Engine and database server on a single instance | ||
| - | * Single server separate database - The File Fabric Engine runs on one instance. The database runs separately either as an additional instance or through a database-as-as-service. | ||
| - | * High availability - The File Fabric Engine runs on multiple instances behind a load balancer. A high availability database cluster or database-as-as-service is used. | ||
| - | * The Apache Solr service is optional, but required to support content search. It should run on a separate machine instance. | ||
| - | |||
| - | ## Prerequisites | ||
| - | |||
| - | Before you begin troubleshooting and checking the Enterprise File Fabric Server you should have the following information: | ||
| - | |||
| - | * Server domain name | ||
| - | * Are multiple servers being used? | ||
| - | * Load balancer domain name | ||
| - | * All server instance hostnames or IP addresses | ||
| - | * Is Cloud FTP being used? | ||
| - | * Is ClamAV being used? | ||
| - | * Passwords: | ||
| - | * smeconfigure password(s) | ||
| - | * root password(s) | ||
| - | * database credentials (optional) | ||
| - | |||
| - | ## Accessible from Network | ||
| - | |||
| - | The web server should start up automatically and be accessible when the appliance server starts. From a remote machine use a browser or the following command to test connectivity to the web server. Curl is available on Linux and Mac. | ||
| - | |||
| - | curl -k https:// | ||
| - | |||
| - | If port 443 is not responding, and to validate network performance, | ||
| - | |||
| - | ping hostname | ||
| - | |||
| - | ## Command Line Access | ||
| - | |||
| - | If the server is reachable remote shell into the system as smeconfiguser. | ||
| - | |||
| - | Run this command from the command line of a machine that has the ssh utility installed (Linux or Mac), or run the equivalent using a Windows tool like putty. | ||
| - | |||
| - | ssh smeconfiguser@hostname | ||
| - | |||
| - | On success you will see a Linux command prompt. Unless otherwise noted commands in this document can be run as smeconfiguser. | ||
| - | |||
| - | To open a shell as root or change to the user smestorage use the command su. You cannot log into the machine remotely directly as root: | ||
| - | |||
| - | su root | ||
| - | su smestorage | ||
| - | |||
| - | ### Connection Refused | ||
| - | |||
| - | If the password fails several times you will be locked out. | ||
| - | |||
| - | ssh: connect to host 10.0.10.194 port 22: Connection refused | ||
| - | |||
| - | To verify, log in via the console as root and execute: | ||
| - | |||
| - | iptables -L f2b-SSH -n | ||
| - | |||
| - | If your IP address is locked you can unlock via fail2ban (as root): | ||
| - | |||
| - | fail2ban-client set ssh-iptables unbanip 192.168.1.1 | ||
| - | |||
| - | ## Appliance Logs | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | This is a general log file for the appliance. | ||
| - | |||
| - | tail -f / | ||
| - | |||
| - | You should see the last few lines of the log file, and new lines should appear from time to time as the appliance is used. Lines containing the word " | ||
| - | |||
| - | The tail -f command will run until you terminate it (Ctrl-c). | ||
| - | |||
| - | ### Error Log | ||
| - | |||
| - | These files are created the first time an error is received: | ||
| - | |||
| - | tail -f / | ||
| - | tail -f / | ||
| - | |||
| - | You may see the last few lines of the log file, and new lines may appear from time to time as the appliance is used. Lines containing the word " | ||
| - | |||
| - | ### Upload Log | ||
| - | |||
| - | This log provides information on upload threads including M-Stream. | ||
| - | |||
| - | tail -f / | ||
| - | | ||
| - | |||
| - | |||
| - | ### Email Log | ||
| - | |||
| - | tail -f / | ||
| - | |||
| - | Sent emails “To” addresses and “Subject”s are logged here. | ||
| - | |||
| - | ### Cron Job Log | ||
| - | |||
| - | Cron jobs kick off housekeeping services and background tasks | ||
| - | |||
| - | tail -f / | ||
| - | |||
| - | |||
| - | |||
| - | ### Log Archive | ||
| - | |||
| - | An archive of logs can be found at: | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Log Rotation | ||
| - | |||
| - | A log rotation and archive script runs under cron. Logs are removed after 30 days. | ||
| - | |||
| - | / | ||
| - | |||
| - | The configuration file is at: | ||
| - | |||
| - | / | ||
| - | |||
| - | ## Ports In Use | ||
| - | |||
| - | To resolve a port conflict or to determine what ports are in use by what service use: | ||
| - | |||
| - | netstat -plnt | ||
| - | |||
| - | ## CPU Check | ||
| - | |||
| - | Check CPU usage or check for a runaway process using top. Investigate processes maxing out CPU over three refreshes. | ||
| - | |||
| - | top | ||
| - | |||
| - | In the third line, which is labelled " | ||
| - | |||
| - | ## Memory Check | ||
| - | |||
| - | Look for memory issues with top. | ||
| - | |||
| - | top | ||
| - | |||
| - | In the fourth line, which is labelled "KiB Mem :" or " | ||
| - | |||
| - | ## Disk Space Check | ||
| - | |||
| - | df -h | ||
| - | |||
| - | In the " | ||
| - | |||
| - | ## Disk Space procedure | ||
| - | |||
| - | You can check the disk space by running the command | ||
| - | |||
| - | df -kh | ||
| - | | ||
| - | If a table has run out of memory you will see errors in the SME Error Logs | ||
| - | |||
| - | / | ||
| - | |||
| - | If you have configured a notification email, then you will receive notification email with the errors. | ||
| - | |||
| - | If you ran out of diskspace please see the instruction below: | ||
| - | |||
| - | DB! Table ' | ||
| - | Symptom - You open the configured SME appliance url in a browser and see an empty page | ||
| - | |||
| - | ## Action to Bring up the Appliance | ||
| - | |||
| - | ### Increase Disk Size | ||
| - | |||
| - | To increase the diskspace on SME appliance see the recipe to increase disk space: https:// | ||
| - | |||
| - | ### Repair the Database | ||
| - | |||
| - | ssh into the appliance as smeconfiguser | ||
| - | |||
| - | Backup the database, this is the easiest way to find the crashed tables. | ||
| - | |||
| - | mysqldump -u smestore -p --opt smestorage > smestorage.sql | ||
| - | |||
| - | IF YOU GET AN ERROR MESSAGE INDICATING A CRASHED TABLE: | ||
| - | |||
| - | ssh to appliance and run the following command | ||
| - | |||
| - | mysql -u smestore -p smestorage | ||
| - | |||
| - | Enter the password | ||
| - | |||
| - | Make sure the database is smestorage | ||
| - | |||
| - | use smestorage | ||
| - | |||
| - | And then repair the table that has crases | ||
| - | |||
| - | repair table < | ||
| - | |||
| - | Go back to the database backup step until the backup completes without errors. | ||
| - | |||
| - | ### Delete compiled templates | ||
| - | |||
| - | SME uses compiled templates. If disk space is low the templates can become corrupted. To fix this: | ||
| - | |||
| - | ssh in to appliance as smeconfiguser | ||
| - | |||
| - | Then sudo as root and then smestorage linux user. | ||
| - | |||
| - | su - root | ||
| - | su - smestorage | ||
| - | |||
| - | Go to the templates directory | ||
| - | |||
| - | cd / | ||
| - | |||
| - | Delete all the compiled templates by executing the following command | ||
| - | |||
| - | rm *.tpl.php | ||
| - | |||
| - | This should help you get your Appliance back online. | ||
| - | |||
| - | ## Find Problem Files | ||
| - | |||
| - | This command finds the top 50 files above 10M: | ||
| - | |||
| - | find / -xdev -type f -size +10M -exec du -sh {} ';' | ||
| - | |||
| - | ## Process Check | ||
| - | |||
| - | Check that the following services are healthy (more detail in sections below) | ||
| - | |||
| - | systemctl status php-fpm | ||
| - | systemctl status httpd | ||
| - | systemctl status jetty | ||
| - | systemctl status crond | ||
| - | systemctl status mariadb | ||
| - | systemctl status memcached | ||
| - | |||
| - | ## PHP-FPM Service | ||
| - | |||
| - | If you see this error check status of PHP-FPM: | ||
| - | |||
| - | The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later. | ||
| - | |||
| - | You can do this by issuing this command: | ||
| - | |||
| - | systemctl status php-fpm | ||
| - | |||
| - | You'll see the real status of the service, it may have hung and the status will be active - either way it's a good idea to restart the PHP-FPM service. You will need root access (or use sudo). | ||
| - | |||
| - | sudo systemctl restart php-fpm | ||
| - | |||
| - | ### Logs | ||
| - | |||
| - | / | ||
| - | |||
| - | ## Apache HTTP Server (Web Server) | ||
| - | |||
| - | Check that the Apache HTTP Server is running. su to root. | ||
| - | |||
| - | systemctl status httpd | ||
| - | |||
| - | To start the HTTP Server | ||
| - | |||
| - | systemctl status httpd | ||
| - | |||
| - | To stop the HTTP Server | ||
| - | |||
| - | systemctl status httpd | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | HTTPD server configuration files are located in the following two directories: | ||
| - | |||
| - | / | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Logs | ||
| - | |||
| - | Apache Httpd server logs are located at: | ||
| - | |||
| - | / | ||
| - | |||
| - | tail / | ||
| - | tail / | ||
| - | tail / | ||
| - | tail / | ||
| - | tail / | ||
| - | tail / | ||
| - | tail / | ||
| - | tail / | ||
| - | |||
| - | ## Memcache | ||
| - | |||
| - | If your memcached stops working or it hangs then this manifests itself as users being unable to upload files. | ||
| - | |||
| - | When you’ll try to upload something you will get a message: | ||
| - | |||
| - | **Can not find uploading process meta data.** | ||
| - | |||
| - | This means a record could not be added to memcached and because of that the upload failed. | ||
| - | |||
| - | To solve this as a root issue the following command: | ||
| - | |||
| - | systemctl restart memcached | ||
| - | |||
| - | After that you can also check the service status: | ||
| - | |||
| - | systemctl status memcached | ||
| - | |||
| - | You should see something similar to the below: | ||
| - | |||
| - | Active: active (running) since Thu 2016-08-25 13:30:00 BST; 1s ago | ||
| - | |||
| - | ## Jetty / Apache Solr / PDF Annotation | ||
| - | |||
| - | The Jetty service is used for Apache Solr and PDF Annotation. It runs as a Java process, by default listening on localhost port 7070. | ||
| - | |||
| - | Check the health of jetty using the command: | ||
| - | |||
| - | systemctl status jetty | ||
| - | |||
| - | You should see a few lines of output including one that says, " | ||
| - | |||
| - | To check that Apache Solr is running and responsive on the appliance run: | ||
| - | |||
| - | curl -u solr: | ||
| - | |||
| - | ### Configuration Files | ||
| - | |||
| - | / | ||
| - | |||
| - | / | ||
| - | |||
| - | See https:// | ||
| - | |||
| - | ### Logs | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Production | ||
| - | |||
| - | Note: For production Apache Solr should be running on a separate instance to the Enterprise File Fabric Server (Web Tier). | ||
| - | |||
| - | ### Access Solr Admin GUI Remotely | ||
| - | |||
| - | To access the Solr admin from another machine: | ||
| - | |||
| - | Add this line to / | ||
| - | |||
| - | -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 7070 -j ACCEPT | ||
| - | |||
| - | Restart iptables | ||
| - | |||
| - | systemctl reload iptables | ||
| - | |||
| - | Comment out the line in / | ||
| - | |||
| - | # | ||
| - | |||
| - | From a browser: | ||
| - | |||
| - | http:// | ||
| - | |||
| - | ## Cloud FTP Service (CloudFTP/ | ||
| - | |||
| - | By default the File Fabric Cloud FTP service is configured to run an FTP service on port 21 and FTPS (FTP over SSL) service on port 990. See [[cloudappliance/ | ||
| - | |||
| - | Check the health of the Cloud FTP service using the service command: | ||
| - | |||
| - | systemctl status cloudftp | ||
| - | |||
| - | You should see a few lines of output including one that says, " | ||
| - | |||
| - | To start the service: | ||
| - | |||
| - | systemctl start cloudftp | ||
| - | |||
| - | To stop the service: | ||
| - | |||
| - | systemctl stop cloudftp | ||
| - | |||
| - | ### Client Testing | ||
| - | |||
| - | sftp -v user@hostname | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ## Cloud SFTP Service | ||
| - | |||
| - | The Cloud SFTP service implements the SSH File Transfer Protocol for File Fabric. By default it's available on port 2200. See [[cloudappliance/ | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Restart | ||
| - | | ||
| - | To restart the service after a configuration change: | ||
| - | |||
| - | systemctl start cloudftp | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ## Local FTP Service | ||
| - | |||
| - | For production use this service should be removed along with the demo accounts. | ||
| - | |||
| - | An appliance FTP server listens by default on IP Address 127.0.0.1 and port 2001. It is used for default storage for the clouduser. | ||
| - | |||
| - | Status of local ftp service | ||
| - | |||
| - | systemctl status vsftpd | ||
| - | |||
| - | Start FTP Server | ||
| - | |||
| - | systemctl start vsftpd | ||
| - | |||
| - | To stop | ||
| - | |||
| - | systemctl stop vsftpd | ||
| - | |||
| - | ### FTP Server Configuration | ||
| - | |||
| - | / | ||
| - | |||
| - | ## Cron Service | ||
| - | |||
| - | The cron service executes cron jobs that roll logs and kick off periodic tasks for the system such as daily maintenance tasks. These scripts should be run only once in a multi-server environment. | ||
| - | |||
| - | For version 1705.00 and above cron runs on all instances using cronmutex.php to make sure only one is executed: | ||
| - | |||
| - | php / | ||
| - | |||
| - | Check the health of cron using the service command: | ||
| - | |||
| - | systemctl status crond | ||
| - | |||
| - | You should see a few lines of output including one that says, " | ||
| - | |||
| - | If it is not then you should enable it: | ||
| - | |||
| - | systemctl start crond | ||
| - | |||
| - | ### Logs | ||
| - | |||
| - | / | ||
| - | / | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | To see cron jobs for a server: | ||
| - | |||
| - | crontab -u smestorage -l | ||
| - | |||
| - | ### Note | ||
| - | |||
| - | These cron jobs use scripts in: | ||
| - | |||
| - | / | ||
| - | / | ||
| - | |||
| - | To see crontab jobs run as root (currently only freshclam): | ||
| - | |||
| - | cat / | ||
| - | |||
| - | ## CloudDAV | ||
| - | |||
| - | CloudDAV is our implementation of WebDAV on top of the File Fabric. It runs as a CGI script from / | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | / | ||
| - | |||
| - | ## Cloud S3 | ||
| - | |||
| - | Cloud S3 is our implementation of an Amazon S3 compatible API on top of the File Fabric. | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | # See < | ||
| - | # DocumentRoot / | ||
| - | / | ||
| - | |||
| - | |||
| - | ## ClamAV Virus Scanner | ||
| - | |||
| - | The Antivirus scanner ClamAV is included with the appliance and can be used to check all uploaded files. | ||
| - | |||
| - | In order to be used virus scanning must be enabled on a per-organization (tenant) basis through Organization Policies. Scanning of individual uploaded files can be verified through the audit log if logging of File add/updates is turned on. | ||
| - | |||
| - | The ClamAV process is called < | ||
| - | |||
| - | Check the health of the ClamAV scanner using the service command: | ||
| - | |||
| - | systemctl status clamd@scan | ||
| - | |||
| - | In High Availability configurations each appliance leverages its local copy of ClamAV. | ||
| - | |||
| - | ### Error Messages | ||
| - | |||
| - | If a file is uploaded, antivirus scanning is enabled, and the daemon is not running, the user will see the following message: | ||
| - | |||
| - | > Uploading of 1 files failed | ||
| - | > | ||
| - | > [Restart] [Cancel] | ||
| - | > | ||
| - | > Seems file is not uploaded. Uploading in progress? | ||
| - | > | ||
| - | > [Close] | ||
| - | |||
| - | Verify scans are being successful through the audit trail. The policy “Audit File add/ | ||
| - | |||
| - | File sme-solution-brief.pdf uploaded to My Cloud files/ | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | / | ||
| - | |||
| - | https:// | ||
| - | |||
| - | ### ClamAV Antivirus Database Updater | ||
| - | |||
| - | Virus definitions are updated once an hour (see / | ||
| - | |||
| - | freshclam | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | / | ||
| - | |||
| - | |||
| - | |||
| - | The appliance includes an email server | ||
| - | |||
| - | / | ||
| - | |||
| - | ## License (Enterprise File Fabric) | ||
| - | |||
| - | This error on attempted login indicates problems with the license key including not present or expired. | ||
| - | |||
| - | > Sorry, Organization accounts are not supported. No valid key. Contact with your administrator. | ||
| - | |||
| - | The license is configured for each appliance through the Appliance Administration interface under Settings > License Key. This is reached by logging in as the appladmin user at https:// | ||
| - | |||
| - | The appliance license can also be viewed and changed from within the appliance at: | ||
| - | |||
| - | / | ||
| - | |||
| - | High Availability: | ||
| - | |||
| - | ## Version | ||
| - | |||
| - | There are several ways to check the version of the appliance. | ||
| - | |||
| - | ### Appliance Administration | ||
| - | |||
| - | Log in as the appliance admin. From the hamburger menu under the menu “Admin” see the appliance version and build number. | ||
| - | |||
| - | System version: 1803.02 | ||
| - | |||
| - | Version build: 2018022700008 | ||
| - | | ||
| - | (The hotfix number (.xx) is only shown for versions 1803.00 and above) | ||
| - | |||
| - | ### Command line | ||
| - | |||
| - | 1) From the shell use the System Package Manager | ||
| - | |||
| - | yum info sme-ff-filefabric.x86_64 | ||
| - | |||
| - | 2) From the database | ||
| - | |||
| - | | ||
| - | |||
| - | 3) From the shell as smeconfiguser run the alias: | ||
| - | |||
| - | | ||
| - | |||
| - | ## Upgrade Backups | ||
| - | |||
| - | / | ||
| - | |||
| - | Keeps copy of public_html after upgrades. | ||
| - | |||
| - | ## Database Service | ||
| - | |||
| - | If the database is running locally check the service is running: | ||
| - | |||
| - | systemctl status mariadb | ||
| - | |||
| - | You should see a few lines of output including one that says, " | ||
| - | |||
| - | You can log into the local database through: | ||
| - | |||
| - | mysql | ||
| - | |||
| - | You should be successful or see an error message like" | ||
| - | |||
| - | If you do not then the database is not accessible from this machine. For remote databases this may be a network issue. | ||
| - | |||
| - | ### Local Database Service | ||
| - | |||
| - | If there is something wrong with the MariaDB Server you will most likely see this page when attempting to access the cloud file manager: | ||
| - | |||
| - | > It seems we encountered a problem. Please contact support and provide as much details as possible as to how this occurred. | ||
| - | > | ||
| - | > Thanks, and apologies for any inconvenience. | ||
| - | > | ||
| - | > Please first check the mysql status: | ||
| - | > | ||
| - | > | ||
| - | |||
| - | If the service is up and running it is likely there are some corrupted tables due to a power outage. You can see next step how to fix them. | ||
| - | |||
| - | Or you can try to restart the service – after restart mysql will check the state of tables and will try to repair them | ||
| - | |||
| - | systemctl restart mariadb | ||
| - | |||
| - | ### Backup | ||
| - | |||
| - | You can backup the database using the following command: | ||
| - | |||
| - | mysqldump smestorage > | ||
| - | |||
| - | ### Configuration | ||
| - | |||
| - | /etc/my.cnf | ||
| - | |||
| - | ## Fail2Ban | ||
| - | |||
| - | The SME Appliance ships with a customized version of Fail2Ban (http:// | ||
| - | |||
| - | Fail2Ban scans logs file for malicious patterns ie. DoS attacks, too many password failures, SSH logins, seeking exploits, trying to scan for download links etc. | ||
| - | |||
| - | If a malicious pattern is detected it automatically updates the firewall rules to reject IP addresses for a specified amount of time (10 minutes). Fail2Ban is constantly working and scanning providing extra protection for the appliance. | ||
| - | |||
| - | ### Log | ||
| - | |||
| - | / | ||
| - | |||
| - | ### Unbanning an IP Address | ||
| - | |||
| - | Once you are the root user we'll need to find the IP address that was banned, and then un-ban it. To do this run : | ||
| - | |||
| - | iptables -L f2b-SSH -n | ||
| - | |||
| - | In that list you may see your IP address. With that IP address we then want to run (swap the IP address with your one): | ||
| - | |||
| - | fail2ban-client set ssh-iptables unbanip 192.168.1.1 | ||
| - | |||
| - | Your IP address should now be unbanned. | ||
| - | |||
| - | https:// | ||
| - | |||
| - | ## Appliance Backups | ||
| - | |||
| - | We recommend customers use tools from the hypervisor vendor or third-parties to backup the appliance and database. | ||