Site Reliability Engineer
Location | Dar es Salaam, Tanzania, United Republic of |
Date Posted | August 16, 2022 |
Category | Engineering |
Job Type | Full-time |
Currency | TZS |
Description
About the job
Decagon is Nigeria’s leading tech talent accelerator connecting the top 0.5% of software Engineers with global companies looking to scale their engineering teams.
Driven by a passion to make Nigeria a Top 10 software Engineering nation within a decade, we recruit and place only the brightest of software engineers.
Founded in 2018 with a global network of 400+ software Engineers and 120+ partner companies spread across Europe, the USA, and Africa, Decagon is committed to connecting exceptional software Engineers to global companies across our network.
The Role
- Develop, deploy, and operate cloud-native infrastructure in support of SaaS platform
- Develop and improve instrumentation for understanding and troubleshooting the health and availability of services
- Bring a mindset of standards and best practices to help create observability solutions that the team would want to adopt
- Participate in an on-call rotation
- Drive a culture of automation, both within the team and throughout the organization, in order to scale efficiently and reliably
- Participate in technical discussions to aid system design, analysis, and troubleshooting
- Help engineering teams to develop, test, debug and release scalable, resilient and highly available cloud-native applications
Ideal Profile
- 4+ years of experience with implementation, operations, and maintenance of cloud services
- A drive to inspire adoption through enthusiasm
- An understanding of the importance of a strong feedback loop with other teams and individuals across the organization
- A deep understanding of cloud computing concepts and solutions, specifically with Google Cloud Platform
- A solid understanding of Identity and Access Management, as well as setting and auditing access policies
- Experience with cloud-native approaches to security concerns
- Hands-on experience with container and container orchestration technologies: Kubernetes, Docker, Podman, etc.
- Experience working with Infrastructure-as-Code tools
- Intimate understanding of one or more of these monitoring and observability tools: DataDog, Prometheus, Grafana, Jaeger, Honeycomb
- Very strong problem solving & troubleshooting skills, including the ability to perform root cause analysis and preventative analysis
Nice To Have
- You have experience in building systems in a microservice environment, understanding the basic building blocks of resilient and scalable software
- Experience with web applications developed in Python or Ruby
- Knowledge of some or all of: web/network protocols, security, data persistence, and CI/CD pipelines
- An understanding of modern software development practices: TDD/BDD, hexagonal design, etc.
- An understanding of Linux primitives: process scheduling, signals, namespaces, authentication/authorization, etc.
What's on Offer?
- Flexible working options
- Excellent career development opportunities
- Opportunity to make a positive impact