Senior Database Reliability Engineer
Slack is looking for Senior Database Reliability Engineers to build tools, design and implement services, and improve the performance and reliability of our database systems as we rapidly scale our product and organization.
You will play a highly visible role leading projects for storage capacity forecasting and planning, efficient data backup strategies, and optimizing our sharding approach. You will partner with other software developers to understand data access patterns and tune our database systems for optimal performance, reliability, and availability.
Slack's Database Reliability Engineering team builds and operates the database platform powering Slack. We write software to manage thousands of stateful hosts, providing several petabytes of online database capacity. We are building one of the fastest-growing database platforms in the world. Our databases operate on MySQL and, more recently on Vitess. You can read more about our migration to Vitess at: Migrating to Vitess at (Slack) Scale.
A taste of our scale and reach:
- Users spend over 10 hours connected and 2+ hours active in Slack every work day!
- 10M+ Daily Active Users in more than 150 countries.
- 1.5 billion messages are sent per month, half of those outside the United States!
- Every day we see over 8 million simultaneously connected users, over 3.5 billion web requests, over 42 billion database queries per day, and our systems see over 1 million queries/second in our caching tier.
- We have 8.8M+ app installations with 155,000 weekly active developers building on the developer platform.
- 90% of our paid teams on Slack actively use apps.
Slack has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative. If this sounds like a good fit for you, why not say hello?
What you will be doing
- Operating and enhancing our large, highly-available database infrastructure, utilizing technologies such as MySQL and Vitess.
- Developing tools to enable self-service and self-managing capabilities of our database infrastructure so that other teams can operate full stack while rapidly building new features for our customers.
- Collaborating with engineering teams on their database storage needs, and advise them throughout the development lifecycle.
- Writing code to capture database performance, and create tools and dashboards to provide actionable insight into that data.
- Participating in our on-call rotation and collaborate with our operations team to triage and resolve production issues.
- Mentoring other engineers and deeply review code.
- Improving engineering standards, tooling, and processes.
What you should have
- You’ve been working in Database or Site Reliability Engineering, with increasing responsibilities for 5+ years.
- Professional experience using Python, Ruby, Go, or Java.
- Operated at least one distributed data storage system, at scale and in a team environment. Some examples include: a relational database like MySQL, a search engine like Solr, or a streaming message bus like Kafka.
- Deployed server software on Linux, and then operated it at scale. You’ve debugged its problems, and analyzed and optimized its performance.
- Strong familiarity with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
- You possess experience with virtualized environments, especially Amazon Web Services.
- Able to lead technical architecture discussions and help drive technical decisions within your team.
- Written understandable, testable code with an eye towards maintainability.
- Very strong communication skills: explaining complex technical concepts to designers, support, and other engineers is natural for you.
- You enjoy helping onboard new team members, mentoring, and teaching others.
- A Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience.
Slack is an Equal Opportunity Employer and participant in the U.S. Federal E-Verify program. Women, minorities, individuals with disabilities and protected veterans are encouraged to apply. Slack will consider qualified applicants with criminal histories in a manner consistent with the San Francisco Fair Chance Ordinance.
Slack is a layer of the business technology stack that brings together people, data, and applications – a single place where people can effectively work together, find important information, and access hundreds of thousands of critical applications and services to do their best work. From global Fortune 100 companies to corner markets, businesses and teams of all kinds use Slack to bring the right people together with all the right information. Slack is headquartered in San Francisco, CA and has offices around the world. For more information on how Slack makes teams better connected, visit slack.com.
Ensuring a diverse and inclusive workplace where we learn from each other is core to Slack’s values. We welcome people of different backgrounds, experiences, abilities and perspectives. We are an equal opportunity employer and a pleasant and supportive place to work.
Come do the best work of your life here at Slack.