Senior Software Engineer - Monitoring
About the Team
The Monitoring team at Slack practices observability concepts and develops platforms and tools which provide insights into the availability, performance and reliability of Slack production services and its customers. We develop configuration management tools for distributed applications and infrastructure, maintain data processing pipelines for business and system analytics, and build interfaces and backend systems to answer questions and infer behavioral patterns about our users and systems. Our toolset is varied. We work with open-source technologies like Elastic Stack and Prometheus, cloud providers such as AWS, and write high-performance services using Go, write automation in Python, shell-script, or anything that your heart may desire
As part of the Monitoring team you will work closely with other teams in engineering, product development and customer experience to provide observability to drive decisions and ensure a positive user experience for our Slack customers. You will also help build distributed services with the ability to self-heal and scale up or down to meet demand. We are an inclusive team with deep empathy for our colleagues and customers.
Slack has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative. If this sounds like a good fit for you, why not say hello?
About the Role
This is a senior engineering position based in San Francisco.
What you could be doing
- Collaborating with an engineering team to write a client library to collect traces and metrics from customer-facing systems
- Whiteboarding and soliciting feedback from peers for next-generation observability systems that will scale to meet growth — and then making it happen
- Prototyping tooling interfaces or building new features for engineering use cases
- Advocating for Slack at conferences by networking, giving talks on best practices for observability or telemetry ingestion, for example.
- Building high-performing services capable of handling millions of events per second to handle engineering use cases.
- Improving automation and management in our telemetry and monitoring infrastructure to avoid common failures
- Teaching engineers how to use our tools to inspect their services
- Participating in the Monitoring on-call rotation, triaging and addressing production issues as they arise
What you should have
- You are a strong communicator. You are able to explain complex technical concepts to designers, support, and other engineers with ease, and you are able to garner approval and consensus.
- You enjoy helping onboard new team members, mentoring, and teaching others.
- You are an excellent problem solver. You’re able to work on problems on the fly and act quickly but intently to reach resolution.
- You model best practices for design, coding, testing, code review, documentation, debugging, and troubleshooting.
- You have a passionate curiosity about how things work.
- You are motivated by helping others succeed. When things break — and they will — you are eager and able to help fix things. You like thinking of ways to improve efficiency or bring delight to your coworkers.
- You also know that the internet is a scary place and understand security concepts deeply and can put them into action to protect us and our users.
- Firm grasp of computer science fundamentals: data structures, algorithms, programming languages, distributed systems, and information retrieval.
- Bachelor's degree in Computer Science, Engineering or a related field.
Equivalent training, fellowship, or work experience is also acceptable.
- Experience with functional or imperative programming languages -- e.g., Go, Python, C, Scala or Java (used without frameworks)
- Experience deploying, operating and debugging services on Linux at scale.
- Experience using deployment automation/configuration management, especially Chef or Terraform.
- Experience with AWS.
- Solid competency with monitoring tools or high throughput services such as Prometheus, Elasticsearch, Kafka, Graphite, or Grafana.
- Experience as an open source contributor, especially to projects relevant to the team, e.g. Prometheus.
- Prior experience with or knowledge of large scale, high volume systems and data pipelines.
Slack is a layer of the business technology stack that brings together people, data, and applications – a single place where people can effectively work together, find important information, and access hundreds of thousands of critical applications and services to do their best work. From global Fortune 100 companies to corner markets, businesses and teams of all kinds use Slack to bring the right people together with all the right information. Slack is headquartered in San Francisco, CA and has ten offices around the world. For more information on how Slack makes teams better connected, visit slack.com.
Ensuring a diverse and inclusive workplace where we learn from each other is core to Slack’s values. We welcome people of different backgrounds, experiences, abilities and perspectives. We are an equal opportunity employer and a pleasant and supportive place to work.
Come do the best work of your life here at Slack.