On one of my recent projects, I just got added to a distribution list that receives alerts from our monitoring system. first few days I tried to read some of those notifications, but then one day when I opened my email it was flooded with about 500 or so messages. Some of them were more of less the same message coming every few seconds/minutes. over the course of last few weeks, I get like 100+ messages every day and most of them when I am asleep.
The most interesting things that came to my notice are:
- most of these alerts while are warnings, they don’t really bring any our services down anytime soon. some of these don’t even get actioned upon and they self recover.
- the team actually actions upon a couple of these really; everything else is more of an info
- inboxes get flooded during night when our core support team is sleeping and there is no way to know for the core support team if something is going to fail soon
it’s like jumping into my car and every time i see the dashboard every light in there is brightly lighted up – to the point that one day i stop caring. eventually, someday something will fail – i just hope it’s not the day when I am driving to someplace in an emergency
when I reached out to the team and articulated the issue i have with out notification strategy, the prompt response I received was to create a new DL, which i believe will be the goto list where all notifications go. Yes, i will be receiving lesser emails and maybe none. And it solves for nothing.
This is just a big symptom if you see in your organization you should think if the team in on top of knowing when something is really gonna fail. Or are you relying on a system that sends everything it sees wrong as a notification and let’s a bunch of humans decide what to act upon or maybe not. also, you cant avoid the fact that many of these notifications are going over an channel that has no way to “push” notify a user of an issue.
Think of a car dashboard with all these light sitting no in front of a driver, but in the glove box. someone would have to open the glove box to see if a light is on or not. the light maybe on for hours before someone realizes something’s gone missing.
I don’t have a technical solution in place for my project, but something I am going to speak to my team about, but the analogy that I will leave you all with is that think of what a notification/alerting system should be like?
- Have you car’s dashboard light up with a warn Green telling something has happened (like an indicator has been switched on and it’s blinking)
- Green and soft clicking sounds – eventually a driver will see and will turn it off (was the case with cars in 1990s with low or no sensitivity indicators). but you don’t want to alarm the driver – it’s not detrimental
- Have car’s dashboard light up in Yellow like a warning. I have my car light up a fuel warning as soon as the levels are dangerously low. I can still drive 80-100 KMS based on how i drive but it’s more than enough for me to eventually see it and get to a refueling station
- Have car’s dashboard flash a Bridge RED – like Doors open. Well, you wont want to drive your cars with doors/hood open. hence a bridge red sometimes accompanied with a few sounds.
- Or have a Sound beep every few seconds – i like how my car alerts me every few seconds when i dont have my seat belt on. Or when I drive over 120KMPH. It’s like reminding me every 10 seconds that I have something fatal going wrong and i can die from it
How will this translate for me and my project team is something i don’t know yet. But, as we go about fixing this, I will post if here. What have you done to address your strategy or is it still all dashboard lights flashing all the time?