Senior Site Reliability Engineer

 

Description:

About the job

You are responsible for:

  • Design, implementation and maintenance of public facing infrastructure and services
  • Use of configuration management and deployment tools
  • Architectural design and operation at scale
  • Monitoring of systems and services, optimization of performance and resource utilization
  • Common operating system level tasks such as logging and backup / restore
  • Cookbook / runbook implementation for common maintenance actions
  • Incident response, diagnosis and follow-up on system outages or alerts
  • Automation and streamlining of tasks as well as identifying process gaps
  • Collaborating with a global and asynchronously communicating team (don’t worry if you have never worked remotely, we’ll help you get used to it)
  • Mentoring peers in your areas of technical and operational strength

Skills and Experience:

  • Strong experience with automation and configuration management tools such as Terraform, Ansible. Proficient in at least one programming language (Python,, Go, or similar).
  • Strong understanding of CI/CD pipelines and deployment strategies.
  • Experience managing Cloud services and discovering cost savings (AWS, Azure, GCP)
  • Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, or ELK stack.
  • Strong troubleshooting and problem-solving skills, and ability to work effectively under pressure.
  • Excellent communication skills with a strong emphasis on contributing to documenting processes and runbooks, and ability to collaborate effectively with cross-functional teams.
  • Incident Management: Experience with incident management and on-call rotation practices, as well as tools like PagerDuty or Opsgenie.
  • SRE Best Practices: Understanding of SRE principles, such as Service Level Objectives (SLOs), error budgets, and blameless postmortem.
  • Familiarity with Wikimedia or other open source projects is a plus.
  • If you are passionate about building and maintaining reliable, scalable, and highly available infrastructure on AWS, and thrive in a dynamic and collaborative environment, we encourage you to apply for this exciting opportunity to join our team at Wikimedia Enterprise

Qualities that are important to us:

  • Experience with operating highly available infrastructure
  • Experience with running applications and services at scale
  • Proficient with shell and a programming language used in an SRE/Operations engineering context (Python, Go,, etc.)
  • Comfortable with Open Source configuration management and orchestration tools (, Ansible, TerraForm etc.)
  • Communicative technical English

Organization Wikimedia Foundation
Industry Engineering Jobs
Occupational Category Engineer
Job Location Doha,Qatar
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2023-09-22 1:07 pm
Expires on 2024-10-22