Chaos Engineering with Gremlin is a powerful way to tune your monitoring to ensure you are gathering actionable data and to train your teams to leverage these tools so that observability expertise isn't siloed in your organization. AppDynamics is an application performance management tool used by companies worldwide to monitor their workloads. Used in combination, these two tools can help you lower your mean time to detection (MTTD) and increase the availability of your applications. This tutorial will walk you through how you can correlate an attack from Gremlin to its impact in AppDynamics.
First, you need to add a role for Gremlin to post custom events to AppDynamics. Go to Settings -> Administration
Select Roles -> Create. Provide a Name: Gremlin Events
. Then, select Applications -> Check “View” and click on “Edit” and check “Create Events”.
Click “Save”.
Now you need a user that has the Gremlin Events role. Go to Users -> Display Users from “AppDynamics”, click “Create”.
Enter Username: gremlin, Email: {your_email}
, Name: Gremlin Events
, Password: {password}
.
Roles -> Add “Gremlin Events”
Click “Save”.
You’ll need to get your user path and endpoint to send to AppDynamics. In AppDynamics, go to Settings -> License
Go to Account. Take note of your Account Name next to “Name”, you’ll use that in the next step.
Open the Applications dashboard and select the application you wish to experiment on. In the URL, you’ll find application={app_id}
. You can see mine below is 10691
. Grab that app_id number for Step 5.
Now that you’ve gathered that information, you’ll need to encode it for the Authorization header in the webhook. In your terminal (Mac or Linux) enter:
1echo -n 'gremlin@{Account Name)':'{password}' | base64
or Command Prompt (Windows) enter:
1[Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(‘gremlin@{account_name}:{password}’))
Save that output for the next step.
The next step is to create two webhooks - one for when the Gremlin attack starts and one for when it finishes. Go to Settings -> Team Settings
Select Webhooks -> New Webhook. Enter the Name AppD Basic Webhook Start
, your Description and the following URL with your own controller’s address:
1https://{controller_address}/controller/rest/applications/{app_id}/events?severity=INFO&summary=gremlinStart&eventtype=CUSTOM&customeventtype=gremlinStart
Then add a header key:value with the key Authorization
and the value using the key generated from the previous step:
1Basic {your_encoded_key}
And select “Attack Running” and Save.
Add a second webhook with the Name AppD Basic Webhook Finish
, your Description and the following URL with your own controller’s address:
1https://{controller_address}/controller/rest/applications/{app_id}/events?severity=INFO&summary=gremlinFinish&eventtype=CUSTOM&customeventtype=gremlinFinish
Then add a header key:value with the key Authorization
and the value using the key generated from the previous step:
1Basic {your_encoded_key}
And select “Attack Finished” and Save.
You’ll need a way to visualize the attack. In AppDynamics, go to Dashboard & Reports -> Create Dashboard. Enter the Name Gremlin Attack Dashboard
.
Click “Add a Widget” -> “Time Series Graph” and click the + sign under Data. Under “Select Data Source” select “Servers” and under “Select a Metric” select “Hardware Resources|CPU|%Busy” and Save.
Under Events, select “Show Events” and the Data Source select the application you grabbed the app_id from in Step 4. Under “Filter Criteria”, unselect all items and select “Custom”. Click Save.
Click “Add Widget” again and select “Health Rules & Events” then “Event List”.
Under Events select Show As “Timeline” and the Data Source as your application you chose in Step 4. Under Filter Criteria, unselect all then select Custom. Click Save.
Your simple dashboard is all set up.
In Gremlin, go to Attacks -> New Attack. Select the host(s) where you have the AppDynamics agent(s) installed. Select “Choose a Gremlin” and Resource -> CPU. Set the length to 300
seconds, CPU Capacity of 60
%, and All Cores
.
Click “Unleash Gremlin” and head back over to your AppDynamics dashboard. In the dashboard, you can see where the attack started and the CPU spike and when it finished and the CPU wound down.
The CPU attack is a great first attack, but using Gremlin and AppDynamics together, you can do many more experiments, like tracing the impact of a little backend latency to front end latency to watch for exponential latency. Additionally, using Gremlin, you can test your thresholds to tune your AppDynamics alerting to prevent noisy alerts. Fire up an attack and make sure your alerts fire at the appropriate time. Target random hosts to make sure you cover your application.
We look forward to seeing what you build!
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started