Alerts keep you informed on the status of system and application metrics. Each alert has a set of user-defined trigger conditions pertaining to a particular metric (CPU, Disk, Network, Database, Web Server, etc.). When these trigger conditions occur, the alert state changes and a specified action takes place (such as sending a notification).
The alert state tells you the status of a system metric, relative to the trigger conditions defined for the alert. Smart Tools provides the following alert states:
- OK: Indicates that no trigger conditions for the alert currently exist.
- Problem: Indicates that the trigger conditions for the alert have been satisfied and your system might need attention.
- Unknown: Indicates that there is insufficient data to determine if the specified conditions of the alert have been satisfied.
You can choose to send alert notifications to selected recipients through email, SMS, and/or Web Hook. You can also use web hooks to setup notifications through API call.
Under If the above triggers happen, take the following actions select Notify.
Then choose a notification delivery method, as follows:
- Email: If you wish to send notifications through email, choose "Email" from the dropdown, then select your email recipients from the list. To add or remove email recipients available for alert notifications, see Mass Edits.
- SMS: If you wish to send notifications through SMS, choose "SMS" from the dropdown, then select the SMS recipients from the list. You can add recipient phone numbers directly in the UI.
- Web Hook: A web hook Initiates an API call that lets you programmatically integrate monitoring alerts into your existing processes, such as ticketing, scaling, and so on. A Web Hook is an API call that is triggered by a specific event. In Smart Tools, you can create web hooks that perform API calls based on user-defined thresholds for any alert -- when threshold conditions are met, the alert state changes (e.g., from OK to PROBLEM), and the designated API call is performed. You can use web hooks to seamlessly integrate Smart Tools's fine-grained monitoring and alerts functionality with ticketing, auto-scaling, and a wide variety of internal processes that your company might have. For example, you might set up a web hook to send alert notification data to your ticketing system through HTTP POST and programmatically generate a new support ticket when an alert is triggered.
Note: You must have a server that can accept a POST request and a script that performs an action in response, such as generating a ticket. Smart Tools currently supports GET and POST HTTP methods.
Run Script from library
The Run Script from Library function lets you set up alert-based script jobs that run in response to changes in alert conditions. This lets you predefine automated scripted jobs that enable self-healing and auto-optimization of your systems. For example, you might set up self-healing for an Apache server by designating a "Restart Apache" script that is triggered when an "Apache is Down" alert changes to the Problem state. If the Apache server stops running, the script is executed and the server is automatically restarted. You can configure alert jobs on Windows machines that require user impersonation, such as a domain user or local user.
The Run blueprint function lets you designate automated workflow blueprints that run in response to changes in alert conditions. A blueprint performs a series of interconnected IT actions and may include anything from launching cloud servers with pre-configured applications on board to deploying a complex multi-tier web site. Blueprints can also include routine tasks, such as backups, log rotations, firewall checks, and so on, which can be scaled up or down as needed.
Note: A new alert will appear in the Unknown state until it cycles through the alert for the first time. If trigger conditions exist, then the alert will change to the Problem state. If no triggers exist, the alert will change to OK.
View alert messages
You can view alert messages related to system activities. Data is collected at regular intervals for each metric to determine if trigger conditions exist for your alerts.
- From the Deployments page, move the pointer over the deployment you want to view and click Alerts.
Alternatively, you can click Details and click the Alerts tab. All the active alert messages triggered for the selected deployment are listed.
- Under the Alerts tab, click a server name to view alert messages triggered for that particular machine or VM.
The alerts related to the selected machine or VM are displayed.
View and manage alerts
- Under the Alerts tab, click the server name for which you want to view and manage alerts.
- Move the pointer under the Alerts tab and click Configure.
Alerts list appears. Alert names and descriptions are displayed.
- To view alert details, move the pointer over the alert and click the greater than symbol available before the alert name.
The alert details such as Alert name, triggers description, and alert actions are displayed.
- To edit alert details, move the pointer over the alert you want to edit, click Actions and select Edit.
- Modify the alert details and click Save to update the modified alert details.
- To delete an existing alert, move the pointer over the alert, click Actions and select Delete. Click OK.
You cannot retrieve a deleted alert. You can create new alerts or modify existing alerts.
Add new alerts
Step 1: Enter basic information
- Under the Alerts tab, click the server name for which you want to add new alerts.
- Move the pointer under the Alerts tab and click Configure.
- Click Add new alert. The Add New Alert page appears.
- Enter a name for the alert.
- Enter a description that best describes the alert.
- Select one of the following statuses to communicate the severity of the alert:
Step 2: Define alert triggers
- From the If box, select One or All to set whether one or all the conditions need to be met to trigger the alert.
- Select a metric item. Based on your requirements, you can select a metric item under catergories such as IO Stats, Smart Tools Agent, Processes, CPU, Physical memory, General System, Network, and Disk Usage.
- Set up threshold conditions for the selected metric item. For example, to set conditions for a "Disk Almost Full" alert, setup the threshold conditions as follows: "dev/xvda1 Used % Average > 90 for 2 minutes." In this case, if the average disk space used is greater than 90% for 2 minutes, the alert is triggered.
Step 3: Assign actions
- Set up the actions to be taken when the alert is triggered. For each alert you can set up one or more of the following actions:
- Notify: Sends alert notifications to selected recipients through email, SMS, or web hook.
- Run Script from Library: Runs a pre-defined script job based on changes in alert condition.
- Run Blueprint: Runs a pre-defined process job based on changes in alert condition.
Note: You can assign multiple actions for each alert. Click the plus sign to configure an additional action.