Andrew Marshall, the director of product marketing at PagerDuty and Marc Vilanova, a senior security engineer at Netflix
협업

How engineers at Netflix and PagerDuty outsmart incidents with Slack

An open source incident management tool and intuitive new Slack integration help these companies quickly respond to software issues

작성자: Jess Dawson2021년 1월 8일

Any company with an online platform or website will inevitably deal with an incident or outage. But what sets the best organizations apart is the ability to speedily resolve these issues. That’s why companies across industries are coming up with new and inspired ways to identify and resolve all manner of incidents with Slack, the secure channel-based messaging platform.

Take PagerDuty, which helps more than 12,000 companies around the world pinpoint and tackle incidents with its real-time operations platform. By combining PagerDuty with Slack, a new integration seamlessly connects stakeholders, enabling them to manage and track issues before they escalate.

Then there’s Netflix, where engineers used the Slack API platform to build Dispatch, an open source incident management tool that works with Slack to reduce response times—and is now available for anyone to use on development platform GitHub.

At Slack Frontiers, our annual conference focused on transforming how everyone works, we explored both approaches with Andrew Marshall, the director of product marketing at PagerDuty and Marc Vilanova, a senior security engineer at Netflix.

Leveling up incident response with PagerDuty and Slack

Traditionally, incident management is built on a command-and-control model: Decisions made at the top trickle down.

However, today’s incident response requires more of a swarm approach: connecting the right information to the right responders at the right time. During this response phase, teams rely on real-time communication to react to evolving issues, reassign or escalate incidents, and add responders. Enter the new PagerDuty and Slack integration, which was released at Slack Frontiers 2020.

“This was a major milestone for us because for many teams, Slack is where work happens,” Marshall says. “PagerDuty integrates with over 350 tools to ingest and contextualize signals. Slack then connects PagerDuty’s incident contact to the right team members so they can solve issues a lot faster.”

Marshall explains that with the integration, Slack essentially becomes a third interface for PagerDuty, along with the desktop and mobile experience. This frees developers from switching contexts unnecessarily, keeping teams engaged and unlocking productivity.

“You can drive PagerDuty actions directly through the Slack UI without wasting time toggling between apps,” Marshall says.

Andrew Marshall, the director of product marketing at PagerDuty

“Our integration connects Slack’s hub for communication with PagerDuty’s digital operation platform and the result powers real-time ops for modern businesses across the world.”

PagerDutyDirector of Product MarketingAndrew Marshall

Bringing key stakeholders together quickly

A number of PagerDuty customers have what Marshall describes as a “hybrid ops environment,” where the new integration connects disparate teams. When all is well, they use PagerDuty and Slack as part of a well-oiled ecosystem to collaborate and make quick decisions. But when something goes awry:

  1. PagerDuty detects the incident
  2. The team is notified in Slack, where engineers work on a resolution
  3. Other stakeholders—sales managers, customer service, executives—are updated directly in Slack as needed, without unnecessary disruption
  4. Post-incident, information can be pulled from Slack to complete a PagerDuty postmortem

“Slack and PagerDuty fulfill three major objectives,” Marshall explains. “Enabling efficient communication and collaboration, accelerating the rapid resolution of an incident and improving the overall resolution process.”

Equipping teams at Netflix to efficiently address incidents

At Netflix, Vilanova drives security incidents to resolution and develops programmatic solutions for crisis management. This includes Dispatch, a custom incident management automation framework that deeply integrates with Netflix’s existing tools, including Slack.

“The last part is particularly important, as we want to keep the learning curve for incident participants as flat as possible,” Vilanova says. “There’s no worse time to learn how to use a new tool than during an incident.”

It can take a lot of time just to engage the right people and bring them up to speed. With all this in mind, Vilanova knew Dispatch should accomplish four things:

  1. Reduce cognitive load on participants so they can focus on the resolution
  2. Maximize efficiency by providing a consistent experience
  3. Provide easy and intuitive ways to manage the incident
  4. Collect information for future learnings

Using a simple command in Slack, anyone at Netflix can instantly report a security incident with Dispatch. “The less friction the better. We want employees to report incidents as quickly as possible,” Vilanova says.

Marc Vilanova, a senior security engineer at Netflix

“At Netflix, we use Slack for real-time communications and to manage all aspects of an incident.”

NetflixSenior Security EngineerMarc Vilanova

Empowering teams to focus on the resolution, not the process

During an incident, the number of responders can grow exponentially, making it cumbersome to track who’s in the Slack channel. Dispatch announces participants as they join, including their name, team location and role in the incident.

Participants joining the channel receive a welcome message with all relevant information, including links to resources. “This frees the incident commander and other participants from having to provide context, and allows everyone to start contributing right away,” Vilanova says.

Through another Slack command, the incident commander and participants can engage and page the on-call team, which is defined in the Dispatch web UI in advance.

A consistent Dispatch experience translates to faster responses over time. Incident commanders can manage the entire incident lifecycle right in Slack, and collect metrics and metadata to inform reports and future decision-making.

Getting ahead of the incident curve

While different in their strategies, both Marshall and Vilanova integrate Slack with the best tools for their business, providing responders with exactly what they need to find a resolution, quickly. The beauty of these approaches lies in each team’s ability to gain insight with every incident—learning to solve future issues faster and get ahead of others before they even begin.

이 포스트가 유용했나요?

0/600

훌륭해요!

피드백을 주셔서 감사합니다.

알겠습니다!

피드백을 주셔서 감사합니다.

죄송합니다. 문제가 발생했습니다. 나중에 다시 시도해주세요.

계속 읽기

새 소식

Slack AI를 통해 더욱 스마트하고 빠르게 일하는 모든 규모의 기업들

Slack AI는 모든 유료 플랜에서 구매가 가능하며, 고객이 데이터의 잠재력을 최대한 활용하여 직원들의 생산성을 극대화할 수 있도록 지원합니다.

개발자

Donut에서 팀 문화를 개선하는 Slack 중심의 비즈니스를 구축한 방법

Slack에 내장된 도달 범위와 배포로 강력해진 Donut 앱은 사용자가 어디에 있든 직관적으로 연결될 수 있도록 도와줍니다.

협업

Slack Connect로 고객 관계를 강화하기 위한 3가지 방법

Crema, IQ Accountants 및 Spark 64가 고객과의 협업을 위한 더 많은 기회를 창출하는 방법

혁신

신경다양성을 지닌 팀을 통해 직장 내 포용에 대해 배울 수 있는 점

Ultra Testing 직원의 75%가 자폐 스펙트럼에 속합니다. 이 원격 팀이 Slack을 통해 함께 작업 능률을 높이는 방법을 소개합니다.