Returning Candidate?

Site Reliability Engineer

Site Reliability Engineer

Job ID 
# of Openings 
Job Locations 
Posted Date 
Software Development

More information about this job


Life is short, work somewhere awesome.


The SRE team are the in-house experts on building reliable and maintainable systems. They plan infrastructure capacity to accomplish High Availability and uptime goals for all of DrFirst products.


The DevOps/ Site reliability team eliminates inefficiencies and incompatibilities which jeopardize service availability to deliver a reliable and scalable software service to DrFirst’s clients. Key aspects of this role include automation, configuration management and tools development while collaborating with the engineering team on projects/products as expert on reliability, performance and efficiency.



As a part of the Systems team, you will:

  • Periodically assess all monitoring requirements and implement necessary enhancements to meet changing/growing business needs
  • Enhance current automation processes of managing capacity, safely deploying software and mitigating failures
  • Tune and troubleshoot full-stack software applications using OOPS, Java, web-services, Oracle DB, Mongo DB, networks concepts and virtualization techniques
  • Proactively review, recommend and implement changes to the live infrastructure after ensuring the right validation has been carried out
  • Assist in roll out and deployment of new product features and installations to facilitate rapid iteration.
  • Confidently make informed, data-driven decisions in a fast-paced environment with competing priorities
  • Create and maintain Chef recipes for instance configuration management
  • Participate in 24/7 on call rotation and after hours deployment


To be successful in this role, you must have:

  • Bachelor’s degree in Computer Science or a related discipline (Master’s preferred)
  • At least 3 years’ coding experience (with Java, JavaScript, Ruby on Rails, or Python)
  • Experience with production releases, maintenance, and monitoring
  • Experience with build tools, orchestration tools, and virtual machine frameworks
  • Skills with log analysis and troubleshooting
  • Strong background in Unix/Linux administration
  • Experience with automation/configuration management using Chef, Puppet, or equivalent
  • Ability to use a wide variety of open source technologies and cloud services (AWS required)
  • Understanding of coding and scripting with Shell, PowerShell, Python, Ruby and/or Perl
  • Strong experience with SQL and MySQL (NoSQL is a big plus)
  • Knowledge of best practices and IT operations in a 24/7 environment
  • Self-motivate and technically curious
  • Ability to work independently and prioritize effectively

Nice to Have:

  • Ability to perform merging, branching and configuration management of SCM systems
  • 2+ years’ experience with Maven and/or Gradle
  • Experience with Source control management tools like SVN and Git