Staff Software Engineer, Observability
About the Team
The Monitoring team at Slack develops platforms and tools which provide insights and observability into the availability, performance, and reliability of Slack production services. We develop configuration management tools for distributed applications and infrastructure, maintain datasets for business and system analytics, and build interfaces and backend systems to answer questions and infer behavioral patterns about our users and systems. Our toolset is varied. We work with open-source observability/monitoring technologies like Elastic Stack and Prometheus, cloud providers such as AWS, and write software using a combination of Go, Python, or Java.
As part of the Monitoring team in San Francisco, you will work closely with other teams in engineering, product development, and customer experience to provide valuable insights to drive decisions and ensure a positive user experience for our Slack customers. You will also help build distributed services in an environment that processes tens of millions of data points per second with the ability to self-heal and scale up or down to meet demand. We are an inclusive team with deep empathy for our colleagues and customers. You can see the team at work here at Monitorama 2018.
Slack has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative. If this sounds like a good fit for you, why not say hello?
About the Role
This is a staff-level engineering position based in San Francisco, California
What you will be doing
- Build Observability tooling & infrastructure for Slack.
- Collaborating with an engineering team to write a client library to collect traces and metrics from customer-facing systems
- Encouraging a culture of Observability at Slack - help suss out problem areas and consult on improving visibility into our systems.
- Prototyping tooling interfaces or building new features for engineering use cases
- Improving auto-remediation in our telemetry infrastructure to avoid common failures
- Teaching engineers how to use our tools to introspect their systems
- Participating in the Monitoring on-call rotation, triaging and addressing production issues as they arise
What you should have
- You are a strong communicator. Explaining complex technical concepts to designers, support, and other engineers is no problem for you.
- You enjoy helping onboard new team members, mentoring, and teaching others.
- You live for unit tests, code review, design documentation, debugging and solving problems.
- You have a deep curiosity about how things work under the hood.
- You are motivated by helping others succeed. When things break — and they will — you are eager and able to help fix things. You like thinking of ways to improve efficiency or bring delight to your coworkers.
- You also know that the internet is a scary place and understand security concepts deeply and can put them into action to protect us and our users.
- Firm grasp of computer science fundamentals: data structures, algorithms, programming languages, distributed systems, and information retrieval.
- Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience
- Experience with functional or imperative programming languages -- e.g., PHP, Python, Go, C, or Java (used without frameworks).
- Experience with creating interfaces, tooling or automation to help define a path for engineers to self-service.
- Experience deploying, operating and debugging server software on Linux at scale
- Solid competency with ELK, Prometheus, OpenTracing, Graphite, or any other widely-used visibility tool.
- Prior experience with or knowledge of large scale, high volume distributed systems, distributed databases and data pipelines.
- Experience with containerization frameworks such as Kubernetes.
- Experience using deployment automation/configuration management, especially Terraform or Chef.
- Experience with AWS and other virtualized environments.
- Experience with message queue services, such as Kafka.
Slack is an Equal Opportunity Employer and participant in the U.S. Federal E-Verify program. Women, minorities, individuals with disabilities and protected veterans are encouraged to apply. Slack will consider qualified applicants with criminal histories in a manner consistent with the San Francisco Fair Chance Ordinance.
Slack is a layer of the business technology stack that brings together people, data, and applications – a single place where people can effectively work together, find important information, and access hundreds of thousands of critical applications and services to do their best work. From global Fortune 100 companies to corner markets, businesses and teams of all kinds use Slack to bring the right people together with all the right information. Slack is headquartered in San Francisco, CA and has offices around the world. For more information on how Slack makes teams better connected, visit slack.com.
Ensuring a diverse and inclusive workplace where we learn from each other is core to Slack’s values. We welcome people of different backgrounds, experiences, abilities and perspectives. We are an equal opportunity employer and a pleasant and supportive place to work.
Come do the best work of your life here at Slack.