DevOps / Integration Engineer (Closed)
SkillStorm is seeking a Dev Ops / Integration Systems Engineer for our client in Richardson, TX. Candidates must be able to work on SkillStorm's W2; not a C2C position. EOE, including disability/vets.
Job Description:
- Our Client is seeking a Dev Ops / Integration Systems Engineer for GRID platform with working experience in large enterprise Grid/Cloud infrastructure. Will be responsible to implementation of automation tools to support, large scale migration of grid infrastructure to newer provisioning technology and maintain system currency baseline using continuous Integration / Delivery, automation.
Responsibilities Include:
- Support development and production environments using automation tools and
- process
- Ongoing support of Grid infrastructure following best practices including following ITSM Process, monitoring, triage for major issues, etc
- Work collaboratively with a geographically dispersed team
- Gather and analyze requirements for automation and operation tools
- Work with Engineering team to test and deploy automated workflow in support of the IT business
- On-call coverage requirements and support break-fix needs when required
- Coordinate change releases. Work Line of Business support team to ensure successful deployments and compliance with standard operating procedures
Required Job Skills:
- Minimum of 5+ years in Python, shell/Bash/Ksh.
- Strong understanding of Linux Operating system
- Good experience in Bare metal provisioning,
- Good experience in Continuous Integration and Delivery practices and automation tools (e.g., Jenkins, Ansible / Tower)
- Experience with SDLC development processes (Waterfall, Agile, Lean) and tools (e.g., Jira, Confluence, Git,etc)
- Strong knowledge and understanding of network technologies (e.g., TCP/IP networking, SSL, Firewall, Proxy, Load Balancing)
- Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.
- Experience with IT core applications like DNS, Active Directory, Kerberos, SMTP, Transactional DBs, Apache, etc.
- Excellent problem solving & troubleshooting skills
- Good understanding and work experience in an Enterprise Environment
- Familiarity with Agile development methodology including SCRUM and DevOps.
Desired Job Skills:
- Experience with configuration of HPC job management and scheduling software such as Spectrum, SLURM, or similar technology
- GPU computation and CUDA Experience configuring and running Cluster Management Software
- Experience designing and building HPC application environments on Hybrid Cloud infrastructure, including services for big data analytics, scale out HPC applications, GPU-centric ML and deep learning
- Strong knowledge and understanding of x86 and RHEL platform in relation to HPC on CPU and GPU (NVIDIA) platforms
- Machine Learning and Data Science engineering skills with focus on Pytorch, Jupyter, Tensorflow, Apache Spark, H2O etc , a plus
- Ability to handle highly volatile support with platform and clients spanning multiple time zones
- Experience with large, distributed environments (10,000+ servers).
Education/Experience
- Minimum of a 4-year degree in computer science or equivalent experience
- 5-7 years infrastructure or software engineering / development experience
- Candidate with exposure to large enterprise grid deployment and/or Cloud integration experience is preferred
#LI-DNI