We are storming the marketplace with the highly skilled, experienced, and certified professionals that businesses need.

Job Search
Site Reliability Engineer

Site Reliability Engineer (Closed)

SkillStorm is seeking a Site Reliability Engineer for our client in Atlanta, GA OR Charlotte, NC OR Addison, TX OR Richmond, VA. Candidates must be able to work on SkillStorm's W2; not a C2C position. EOE, including disability/vets.

Job Description:

Need to be hands on
10-15+ years of experience
Merchant Technology Engineering seeks a Steller, Self motivated, Problem solver, Automator, SRE engineer who has the responsibility for maintaining and establishing service level indicators (SLIs), objective (SLOs), agreements (SLAs), and error budgets for their systems and make sure these are met. SRE engineer are expected to spend a certain amount of their time doing operational work (making sure systems work as expected and code deployment) and improving processes and reduce toil.
Merchant is a brand-new business for the bank, and we’re looking for a special person who is a leader and is dedicated to delivering successful releases in alignment with program goals, while constantly improving quality. Are you looking to build your career? We have a huge opportunity on this team—we a growing explosively, doubling our team size in the next few months, and scaling our next generation platform by 17,000% in the next year

Objective of this Role:

Run all pipeline environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and application
Improve reliability and quality of systems and application
Measure and optimize system performance, with an eye toward pushing our capability forward
Provide primary operational support and engineering for multiple large distributed application

Responsibility:

Partner with development team to improve services through rigorous testing and release procedure
Gather and analyze metrics from both operating systems and application to assist in performance tuning and fault finding
Participate in system design consulting, platform management and capacity planning
Create sustainable system and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objective

Required Skills and Qualification:

Proficiency with SRE concepts such as SLO, SLA and SLI
Expertise in automation tools experience such as Ansible, Jenkins, XLR, Circle CI, Terraform
Experience with logging and monitoring implementations like Splunk, Dynatrace, ELK etc.
Ability to program (structured and OO) with one or more high level language, such as Python, Java,
Experience with bash scripting, Linux scripting
Experience in Docker/Kubernetes orchestration and management in any cloud environment
Participate in the team’s on-call rotation to address complex problems in real-time and keep services operational and highly available
Experience troubleshooting relational databases and distributed platforms
Experience with messaging and middleware products like Mulesoft, kafka
A proactive approach to spotting problems, areas of improvement and performance bottlenecks
Understands Jenkins, Python Scripts, and Ansible.
Has experience in automating manual tasks.

#LI-DNI

We are storming the marketplace with the highly skilled, experienced, and certified professionals that businesses need.

Find your perfect job.

Site Reliability Engineer (Closed)