Site Reliability Engineer

at Decagon
Location Dar es Salaam, Tanzania, United Republic of
Date Posted August 16, 2022
Category Engineering
Job Type Full-time
Currency TZS

Description

About the job

Decagon is Nigeria’s leading tech talent accelerator connecting the top 0.5% of software Engineers with global companies looking to scale their engineering teams.

Driven by a passion to make Nigeria a Top 10 software Engineering nation within a decade, we recruit and place only the brightest of software engineers.

Founded in 2018 with a global network of 400+ software Engineers and 120+ partner companies spread across Europe, the USA, and Africa, Decagon is committed to connecting exceptional software Engineers to global companies across our network.

The Role

  • Develop, deploy, and operate cloud-native infrastructure in support of SaaS platform
  • Develop and improve instrumentation for understanding and troubleshooting the health and availability of services
  • Bring a mindset of standards and best practices to help create observability solutions that the team would want to adopt
  • Participate in an on-call rotation
  • Drive a culture of automation, both within the team and throughout the organization, in order to scale efficiently and reliably
  • Participate in technical discussions to aid system design, analysis, and troubleshooting
  • Help engineering teams to develop, test, debug and release scalable, resilient and highly available cloud-native applications

Ideal Profile

  • 4+ years of experience with implementation, operations, and maintenance of cloud services
  • A drive to inspire adoption through enthusiasm
  • An understanding of the importance of a strong feedback loop with other teams and individuals across the organization
  • A deep understanding of cloud computing concepts and solutions, specifically with Google Cloud Platform
  • A solid understanding of Identity and Access Management, as well as setting and auditing access policies
  • Experience with cloud-native approaches to security concerns
  • Hands-on experience with container and container orchestration technologies: Kubernetes, Docker, Podman, etc.
  • Experience working with Infrastructure-as-Code tools
  • Intimate understanding of one or more of these monitoring and observability tools: DataDog, Prometheus, Grafana, Jaeger, Honeycomb
  • Very strong problem solving & troubleshooting skills, including the ability to perform root cause analysis and preventative analysis

Nice To Have

  • You have experience in building systems in a microservice environment, understanding the basic building blocks of resilient and scalable software
  • Experience with web applications developed in Python or Ruby
  • Knowledge of some or all of: web/network protocols, security, data persistence, and CI/CD pipelines
  • An understanding of modern software development practices: TDD/BDD, hexagonal design, etc.
  • An understanding of Linux primitives: process scheduling, signals, namespaces, authentication/authorization, etc.

What's on Offer?

  • Flexible working options
  • Excellent career development opportunities
  • Opportunity to make a positive impact