Incident.io co-founder Chris Evans
開發人員

Incident.io co-founder Chris Evans on the power of automation in transforming incident response

A conversation about automating the micro frictions out of organizational incident response, so on-call work can take place in the calm of day

Slack 團隊2022 年 1 月 25 日

For most SaaS companies, a single software incident can send a shock wave through the ranks, igniting a flurry of late-night pages, phone calls and emails aimed at mitigating negative impact. Maybe a payment system has gone down, and customer orders aren’t going through. Or even worse, the whole service has gone offline, and entire teams need to mobilize.

Launched just last year, Incident.io, an incident-response platform that lets you manage and respond right within Slack, gives companies complete, premeditated control over their incident-response protocol. From what needs to be done to who needs to be brought into the conversation (or just kept in the loop), everything can be automated up front, or as this co-founder put it, “in the calm of day.”

In this installment of our On the Platform interview series, Slack open-source engineer Alissa Renz connected with Incident.io co-founder and chief product officer Chris Evans to talk about the company, his background as a software engineer and his experience building an incident-response tool specifically for the Slack platform.

The following is a condensed transcript; answers have been edited for length and clarity.

Building Incident.io: “a sensible place to converge around an incident”

Alissa Renz: Incident.io was conceptualized from a team you worked with during your time at Monzo Bank. Can you tell us what led to the idea for Incident.io and how you evolved it from an idea into a company?

Chris Evans: I joined Monzo in the relatively early days. Back then they had a very small on-call rotation, only four engineers. They were the same engineers that had been there since almost day one, and they were all pretty burned out. On top of the small rotation, there was a lot of extra pressure that came from looking after a bank. When things go wrong, it could mean customers aren’t able to access their money, and there could be regulatory impacts.

There was also complexity introduced by the way the organization was set up. Many incidents required that customer support folks be pulled into the loop, or risk and compliance. What they needed was a tool that could be used more widely across their organization for incident response.

I was tasked with taking over on-call. I ended up building what you could say was an early prototype for what we have now with Incident.io. It was purely to make on-call easier for engineers, a sensible place for people to converge around an incident. By the time I left in 2021, it was used in every corner of the company as a single source of truth for incidents across the board.

“Philosophically, we have this approach that the only time you should ever leave Slack is when you’re actually fixing the thing.”

Renz: Can you tell us more about Incident.io? How has the company grown and evolved since it first launched?

Evans: The first and most obvious thing is that it’s built on top of Slack. Users can use a single Slack slash command to declare an incident and manage the entirety of their response, including their existing tools, without having to leave their incident channel. Whether it’s PagerDuty or Opsgenie for getting a hold of people, Jira for logging actions or Statuspage and email for external communications, Incident.io pulls all of these different silos into a single layer within Slack.

For example, if you have a partner document that says, “If this incident is a data breach, then page the data protection officer,” or, if it’s a situation when execs need to be pulled in, you can configure those things in the calm of day. Then, when it’s 2 a.m. and there are pages going off, nobody has to think about the process. Incidient.io makes sure that everyone gets into the right room and instructs you along the way.

Renz: An incident can arise across many different services. Which services does Incident.io offer support for?

Evans: We’re integrated with all the major paging providers. That’s PagerDuty, Opsgenie for getting people together in the room. Then there’s Jira, GitHub and Clubhouse. Statuspage, so you can keep the public updated without leaving Slack. Zoom and Google Media if you need to jump right into a face-to-face chat.

An alert from incident.io in Slack

With Incident.io, you don’t have to jump off of Slack to do things. Philosophically, we have this approach that the only time you should ever leave Slack is when you’re actually fixing the thing. Everything else should be possible without leaving Slack.

On choosing Slack and building for the Slack platform

Renz: What aspects of the Slack platform make it a particularly good fit for something as critical as incident response?

Evans: From the beginning, we knew what kind of companies we wanted to go after. The kind of companies that value the rich context in communication that Slack offers. As founders and engineers, we were all really familiar with Slack already, and we knew that it would be a natural fit to build on the platform.

When you look more generally, great incidents are founded on great communication. That’s the key tenet, that everyone is able to communicate really clearly. Slack was a communication-first tool that was already ubiquitous across so many organizations, as much as calendars or email.

But we came to realize that Slack could give us a lot more than just communications. It’s a genuinely powerful platform that makes it easy to build applications that can be used all across the organization. It’s frictionless for engineers, who can get started with a single Slack command. Meanwhile, folks who have never used it before can jump right into the channel and we’ll help them along with tips and messages.

Slack was simply the most sensible place to build.

“We came to realize that Slack could give us a lot more than just communications.”

Renz: What’s been an interesting technical challenge or problem you’ve faced and what role has Slack had in overcoming it?

Evans: The constant challenge is making technical integrations more friendly for non-technical folks. Our ambition is to become this tool for entire organizations. You need to be able to take someone who has never used a paging program like PagerDuty and give them a crystal-clear UX. Inside of Slack, everything looks similar and familiar, which makes it easier for everyone.

The pandemic and why the future is about “more tools, playing more nicely together”

Renz: As the way we work continues to evolve, how has this affected the nature of incidents and how organizations should approach their response strategies?

Evans: Before the pandemic, jumping into an incident typically meant jumping into a meeting room in an office together. We were still using Slack but were sitting across the desk from one another. Then a lot of companies went 100% remote, and many are staying that way. So there’s an increasing pressure on tooling like ours to bridge that gap.

Flipping to the customer side of that lens, online services have become much more critical. Grocery shopping, Amazon, banking … customers are more dependent on these services than they were before, and the providers of those services have recognized that. There’s a strong driver from both sides to make incidents as low-impact as possible, make them happen as little as possible and get them recovered as quickly as possible.

“Suddenly, we’re in a world where these tools are like Lego blocks that you can use to build an incredible organization.”

Renz: At Slack, we’ve evolved the platform to meet the needs of today’s developers. Looking forward, are there any tools or technologies you’re excited about?

Evans: For me, what’s incredible is the idea of more tools playing more nicely together.

When you look at the old SaaS models, it was all about building a thing and trying to get everyone to gravitate towards it, log in and then make it the center of their universe. Now, there’s such a rich ecosystem of SaaS products, and people want to use them all.

Suddenly, we’re in a world where these tools are like Lego blocks that you can use to build an incredible organization. Even 10 years ago you’d have to build something from scratch or just paper over the cracks by hiring more people. Now you can take the integrations you need and plug these things together in Slack.

Renz: Which tools or technologies do you find most useful when it comes to making work more efficient, effective or fun?

Evans: In terms of making life easier, it’s the fun things. 99.9% of people probably have Giphy installed in Slack. Being able to inject some levity into serious situations is great.

But the real superpower for people building on Slack is using Block Kit Builder to collaborate on design. It’s incredible. We all go into Block Kit Builder and decide exactly how we want everything to look and work. That way, you don’t have to write a single line of code until you’re really happy with it.

You can also use Block Kit Builder to compose really nice messages or announcements to your organization. You can embed nice images, section headings and design elements, then send it out straight from Block Kit.

Renz: What advice do you have for folks that are interested in working with or building on the Slack platform?

Evans: My advice would be to find all the things that are causing you even tiny little micro frictions. Then figure out if those things can go into Slack.

As an example, at Monzo we wanted to keep a record of all the decisions we made over time, so we could go back and take a look. Originally, we logged all that in a Notion database. But it didn’t really get used because it was a micro friction.

So I built a super-trivial little app that just brings up a modal that asks, “What was the context? What led you to this decision? Who was involved?” After you submitted, everything was submitted to GitHub, and we didn’t have to worry about it anymore. With just one little bit of code.

Explore more On the Platform conversations:

這則貼文有幫助嗎?

0/600

超讚!

非常感謝你提供意見回饋!

知道了!

感謝你提供意見回饋。

糟糕!我們遇到問題了。請稍後再試一次!

繼續閱讀

開發人員

在 Slack 進行建構變得好簡單:開發人員和管理員適用的全新工具於今日上線

自助沙箱、Bolt 適用的自訂函式加上改良版軟體堆疊整合,在 Slack 進行建構從未如此順利

開發人員

建立自動化構成元素

現已推出新一代平台 Beta 版供所有開發人員使用

新聞

全新工作流程建立工具為所有人實現工作自動化

新的自動化功能可以實現更強大的工作流程,不受使用者的技術專業能力所限

生產力

全新強化功能讓 Slack 如虎添翼,成為你的智慧生產力平台

一窺今年即將在 Dreamforce 發表的 Slack 最新功能