Andrew Marshall, the director of product marketing at PagerDuty and Marc Vilanova, a senior security engineer at Netflix

How engineers at Netflix and PagerDuty outsmart incidents with Slack

An open source incident management tool and intuitive new Slack integration help these companies quickly respond to software issues

작성자: Jess Dawson2021년 1월 8일

Any company with an online platform or website will inevitably deal with an incident or outage. But what sets the best organizations apart is the ability to speedily resolve these issues. That’s why companies across industries are coming up with new and inspired ways to identify and resolve all manner of incidents with Slack, the secure channel-based messaging platform.

Take PagerDuty, which helps more than 12,000 companies around the world pinpoint and tackle incidents with its real-time operations platform. By combining PagerDuty with Slack, a new integration seamlessly connects stakeholders, enabling them to manage and track issues before they escalate.

Then there’s Netflix, where engineers used the Slack API platform to build Dispatch, an open source incident management tool that works with Slack to reduce response times—and is now available for anyone to use on development platform GitHub.

At Slack Frontiers, our annual conference focused on transforming how everyone works, we explored both approaches with Andrew Marshall, the director of product marketing at PagerDuty and Marc Vilanova, a senior security engineer at Netflix.

Leveling up incident response with PagerDuty and Slack

Traditionally, incident management is built on a command-and-control model: Decisions made at the top trickle down.

However, today’s incident response requires more of a swarm approach: connecting the right information to the right responders at the right time. During this response phase, teams rely on real-time communication to react to evolving issues, reassign or escalate incidents, and add responders. Enter the new PagerDuty and Slack integration, which was released at Slack Frontiers 2020.

“This was a major milestone for us because for many teams, Slack is where work happens,” Marshall says. “PagerDuty integrates with over 350 tools to ingest and contextualize signals. Slack then connects PagerDuty’s incident contact to the right team members so they can solve issues a lot faster.”

Marshall explains that with the integration, Slack essentially becomes a third interface for PagerDuty, along with the desktop and mobile experience. This frees developers from switching contexts unnecessarily, keeping teams engaged and unlocking productivity.

“You can drive PagerDuty actions directly through the Slack UI without wasting time toggling between apps,” Marshall says.

Andrew Marshall, the director of product marketing at PagerDuty

“Our integration connects Slack’s hub for communication with PagerDuty’s digital operation platform and the result powers real-time ops for modern businesses across the world.”

PagerDutyDirector of Product MarketingAndrew Marshall

Bringing key stakeholders together quickly

A number of PagerDuty customers have what Marshall describes as a “hybrid ops environment,” where the new integration connects disparate teams. When all is well, they use PagerDuty and Slack as part of a well-oiled ecosystem to collaborate and make quick decisions. But when something goes awry:

  1. PagerDuty detects the incident
  2. The team is notified in Slack, where engineers work on a resolution
  3. Other stakeholders—sales managers, customer service, executives—are updated directly in Slack as needed, without unnecessary disruption
  4. Post-incident, information can be pulled from Slack to complete a PagerDuty postmortem

“Slack and PagerDuty fulfill three major objectives,” Marshall explains. “Enabling efficient communication and collaboration, accelerating the rapid resolution of an incident and improving the overall resolution process.”

Equipping teams at Netflix to efficiently address incidents

At Netflix, Vilanova drives security incidents to resolution and develops programmatic solutions for crisis management. This includes Dispatch, a custom incident management automation framework that deeply integrates with Netflix’s existing tools, including Slack.

“The last part is particularly important, as we want to keep the learning curve for incident participants as flat as possible,” Vilanova says. “There’s no worse time to learn how to use a new tool than during an incident.”

It can take a lot of time just to engage the right people and bring them up to speed. With all this in mind, Vilanova knew Dispatch should accomplish four things:

  1. Reduce cognitive load on participants so they can focus on the resolution
  2. Maximize efficiency by providing a consistent experience
  3. Provide easy and intuitive ways to manage the incident
  4. Collect information for future learnings

Using a simple command in Slack, anyone at Netflix can instantly report a security incident with Dispatch. “The less friction the better. We want employees to report incidents as quickly as possible,” Vilanova says.

Marc Vilanova, a senior security engineer at Netflix

“At Netflix, we use Slack for real-time communications and to manage all aspects of an incident.”

NetflixSenior Security EngineerMarc Vilanova

Empowering teams to focus on the resolution, not the process

During an incident, the number of responders can grow exponentially, making it cumbersome to track who’s in the Slack channel. Dispatch announces participants as they join, including their name, team location and role in the incident.

Participants joining the channel receive a welcome message with all relevant information, including links to resources. “This frees the incident commander and other participants from having to provide context, and allows everyone to start contributing right away,” Vilanova says.

Through another Slack command, the incident commander and participants can engage and page the on-call team, which is defined in the Dispatch web UI in advance.

A consistent Dispatch experience translates to faster responses over time. Incident commanders can manage the entire incident lifecycle right in Slack, and collect metrics and metadata to inform reports and future decision-making.

Getting ahead of the incident curve

While different in their strategies, both Marshall and Vilanova integrate Slack with the best tools for their business, providing responders with exactly what they need to find a resolution, quickly. The beauty of these approaches lies in each team’s ability to gain insight with every incident—learning to solve future issues faster and get ahead of others before they even begin.

이 포스트가 유용했나요?



피드백을 주셔서 감사합니다.


피드백을 주셔서 감사합니다.

죄송합니다. 문제가 발생했습니다. 나중에 다시 시도해주세요.

계속 읽기

새 소식

Slack을 통해 영업의 수준을 한 차원 높이세요

Slack Sales Elevate를 통해 리더가 더 나은 의사 결정을 내리고 더 많은 성공을 이끌어내는 방법을 알아보세요


Slack을 토대로 구축하는 일이 훨씬 더 쉬워졌습니다. 개발자 및 관리자를 위한 새로운 도구를 지금 이용해 보세요

셀프 서비스 샌드박스, Bolt를 위한 사용자 지정 함수, 소프트웨어 스택과의 개선된 통합이 Slack을 위한 구축을 그 어느 때보다 향상해 줍니다

새 소식

Slack과 Salesforce의 새로운 통합으로 영업 팀의 역량을 강화하세요

Slack Sales Elevate로 고객 레코드, 계정, 기회, 주요 지표를 중앙 집중화하여 영업 프로세스의 모든 단계 혁신

새 소식

워크플로 빌더의 새로운 커넥터 65개를 통해 더욱 유용한 자동화를 만드세요

파트너 앱에 연계해 코딩이 아닌 클릭으로 업무를 자동화하는 새로운 방법을 소개합니다.