Incident.io co-founder Chris Evans
개발자

Incident.io co-founder Chris Evans on the power of automation in transforming incident response

A conversation about automating the micro frictions out of organizational incident response, so on-call work can take place in the calm of day

Slack 팀이 작성2022년 1월 25일

For most SaaS companies, a single software incident can send a shock wave through the ranks, igniting a flurry of late-night pages, phone calls and emails aimed at mitigating negative impact. Maybe a payment system has gone down, and customer orders aren’t going through. Or even worse, the whole service has gone offline, and entire teams need to mobilize.

Launched just last year, Incident.io, an incident-response platform that lets you manage and respond right within Slack, gives companies complete, premeditated control over their incident-response protocol. From what needs to be done to who needs to be brought into the conversation (or just kept in the loop), everything can be automated up front, or as this co-founder put it, “in the calm of day.”

In this installment of our On the Platform interview series, Slack open-source engineer Alissa Renz connected with Incident.io co-founder and chief product officer Chris Evans to talk about the company, his background as a software engineer and his experience building an incident-response tool specifically for the Slack platform.

The following is a condensed transcript; answers have been edited for length and clarity.

Building Incident.io: “a sensible place to converge around an incident”

Alissa Renz: Incident.io was conceptualized from a team you worked with during your time at Monzo Bank. Can you tell us what led to the idea for Incident.io and how you evolved it from an idea into a company?

Chris Evans: I joined Monzo in the relatively early days. Back then they had a very small on-call rotation, only four engineers. They were the same engineers that had been there since almost day one, and they were all pretty burned out. On top of the small rotation, there was a lot of extra pressure that came from looking after a bank. When things go wrong, it could mean customers aren’t able to access their money, and there could be regulatory impacts.

There was also complexity introduced by the way the organization was set up. Many incidents required that customer support folks be pulled into the loop, or risk and compliance. What they needed was a tool that could be used more widely across their organization for incident response.

I was tasked with taking over on-call. I ended up building what you could say was an early prototype for what we have now with Incident.io. It was purely to make on-call easier for engineers, a sensible place for people to converge around an incident. By the time I left in 2021, it was used in every corner of the company as a single source of truth for incidents across the board.

“Philosophically, we have this approach that the only time you should ever leave Slack is when you’re actually fixing the thing.”

Renz: Can you tell us more about Incident.io? How has the company grown and evolved since it first launched?

Evans: The first and most obvious thing is that it’s built on top of Slack. Users can use a single Slack slash command to declare an incident and manage the entirety of their response, including their existing tools, without having to leave their incident channel. Whether it’s PagerDuty or Opsgenie for getting a hold of people, Jira for logging actions or Statuspage and email for external communications, Incident.io pulls all of these different silos into a single layer within Slack.

For example, if you have a partner document that says, “If this incident is a data breach, then page the data protection officer,” or, if it’s a situation when execs need to be pulled in, you can configure those things in the calm of day. Then, when it’s 2 a.m. and there are pages going off, nobody has to think about the process. Incidient.io makes sure that everyone gets into the right room and instructs you along the way.

Renz: An incident can arise across many different services. Which services does Incident.io offer support for?

Evans: We’re integrated with all the major paging providers. That’s PagerDuty, Opsgenie for getting people together in the room. Then there’s Jira, GitHub and Clubhouse. Statuspage, so you can keep the public updated without leaving Slack. Zoom and Google Media if you need to jump right into a face-to-face chat.

An alert from incident.io in Slack

With Incident.io, you don’t have to jump off of Slack to do things. Philosophically, we have this approach that the only time you should ever leave Slack is when you’re actually fixing the thing. Everything else should be possible without leaving Slack.

On choosing Slack and building for the Slack platform

Renz: What aspects of the Slack platform make it a particularly good fit for something as critical as incident response?

Evans: From the beginning, we knew what kind of companies we wanted to go after. The kind of companies that value the rich context in communication that Slack offers. As founders and engineers, we were all really familiar with Slack already, and we knew that it would be a natural fit to build on the platform.

When you look more generally, great incidents are founded on great communication. That’s the key tenet, that everyone is able to communicate really clearly. Slack was a communication-first tool that was already ubiquitous across so many organizations, as much as calendars or email.

But we came to realize that Slack could give us a lot more than just communications. It’s a genuinely powerful platform that makes it easy to build applications that can be used all across the organization. It’s frictionless for engineers, who can get started with a single Slack command. Meanwhile, folks who have never used it before can jump right into the channel and we’ll help them along with tips and messages.

Slack was simply the most sensible place to build.

“We came to realize that Slack could give us a lot more than just communications.”

Renz: What’s been an interesting technical challenge or problem you’ve faced and what role has Slack had in overcoming it?

Evans: The constant challenge is making technical integrations more friendly for non-technical folks. Our ambition is to become this tool for entire organizations. You need to be able to take someone who has never used a paging program like PagerDuty and give them a crystal-clear UX. Inside of Slack, everything looks similar and familiar, which makes it easier for everyone.

The pandemic and why the future is about “more tools, playing more nicely together”

Renz: As the way we work continues to evolve, how has this affected the nature of incidents and how organizations should approach their response strategies?

Evans: Before the pandemic, jumping into an incident typically meant jumping into a meeting room in an office together. We were still using Slack but were sitting across the desk from one another. Then a lot of companies went 100% remote, and many are staying that way. So there’s an increasing pressure on tooling like ours to bridge that gap.

Flipping to the customer side of that lens, online services have become much more critical. Grocery shopping, Amazon, banking … customers are more dependent on these services than they were before, and the providers of those services have recognized that. There’s a strong driver from both sides to make incidents as low-impact as possible, make them happen as little as possible and get them recovered as quickly as possible.

“Suddenly, we’re in a world where these tools are like Lego blocks that you can use to build an incredible organization.”

Renz: At Slack, we’ve evolved the platform to meet the needs of today’s developers. Looking forward, are there any tools or technologies you’re excited about?

Evans: For me, what’s incredible is the idea of more tools playing more nicely together.

When you look at the old SaaS models, it was all about building a thing and trying to get everyone to gravitate towards it, log in and then make it the center of their universe. Now, there’s such a rich ecosystem of SaaS products, and people want to use them all.

Suddenly, we’re in a world where these tools are like Lego blocks that you can use to build an incredible organization. Even 10 years ago you’d have to build something from scratch or just paper over the cracks by hiring more people. Now you can take the integrations you need and plug these things together in Slack.

Renz: Which tools or technologies do you find most useful when it comes to making work more efficient, effective or fun?

Evans: In terms of making life easier, it’s the fun things. 99.9% of people probably have Giphy installed in Slack. Being able to inject some levity into serious situations is great.

But the real superpower for people building on Slack is using Block Kit Builder to collaborate on design. It’s incredible. We all go into Block Kit Builder and decide exactly how we want everything to look and work. That way, you don’t have to write a single line of code until you’re really happy with it.

You can also use Block Kit Builder to compose really nice messages or announcements to your organization. You can embed nice images, section headings and design elements, then send it out straight from Block Kit.

Renz: What advice do you have for folks that are interested in working with or building on the Slack platform?

Evans: My advice would be to find all the things that are causing you even tiny little micro frictions. Then figure out if those things can go into Slack.

As an example, at Monzo we wanted to keep a record of all the decisions we made over time, so we could go back and take a look. Originally, we logged all that in a Notion database. But it didn’t really get used because it was a micro friction.

So I built a super-trivial little app that just brings up a modal that asks, “What was the context? What led you to this decision? Who was involved?” After you submitted, everything was submitted to GitHub, and we didn’t have to worry about it anymore. With just one little bit of code.

Explore more On the Platform conversations:

이 포스트가 유용했나요?

0/600

훌륭해요!

피드백을 주셔서 감사합니다.

알겠습니다!

피드백을 주셔서 감사합니다.

죄송합니다. 문제가 발생했습니다. 나중에 다시 시도해주세요.

계속 읽기

혁신

고객 지원을 위한 Slack: Slack 커뮤니티 뉴욕시의 전문가 팁

Slack 전문가들로부터 고객 지원을 위해 Slack을 최대한 활용하는 방법에 대해 들어보세요.

개발자

Slack을 토대로 구축하는 일이 훨씬 더 쉬워졌습니다. 개발자 및 관리자를 위한 새로운 도구를 지금 이용해 보세요

셀프 서비스 샌드박스, Bolt를 위한 사용자 지정 함수, 소프트웨어 스택과의 개선된 통합이 Slack을 위한 구축을 그 어느 때보다 향상해 줍니다

개발자

자동화의 구성 요소 만들기

차세대 플랫폼이 베타 버전으로 출시되어 모든 개발자들이 사용할 수 있습니다.

새 소식

워크플로 빌더의 새로운 커넥터 65개를 통해 더욱 유용한 자동화를 만드세요

파트너 앱에 연계해 코딩이 아닌 클릭으로 업무를 자동화하는 새로운 방법을 소개합니다.