Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
KLA Corporation logo

Sr. Platform Engineer- GenAI

KLA Corporation

3/1/2025

Ann Arbor, MI

Full-time

Salary: $103,000.00 - $175,100.00 Annually


Job Description

The job involves developing and managing advanced AI/ML infrastructure solutions to enhance efficiency and scalability.

Requirements

  • Bachelor's Degree or equivalent training/certifications in Computer Science or related IT field
  • Eight (8) years of implementing and maintaining AI/ML Infrastructure On-Prem environment
  • Strong experience with AI/ML infrastructure and tools, including GPU clusters and Kubernetes
  • Proficiency in deploying and managing open-source GenAI components and vector databases
  • Hands-on experience with high-performance computing (HPC) environments
  • Expertise in designing and managing on-premises, cloud, and hybrid-based ML platforms
  • Solid understanding of distributed storage systems, scheduling systems, and high availability capabilities

Responsibilities

  • Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable solutions
  • Develop advanced AI/ML infrastructure solutions that enhance the efficiency of skilled ML teams
  • Design and implement solutions for critical areas within large-scale GPU clusters
  • Monitor and optimize the performance of AI/ML infrastructure
  • Develop and deploy automation tools, monitoring solutions, and operational strategies
  • Work with various teams to create a cohesive and integrated AI/ML infrastructure ecosystem
  • Implement and manage GPU infrastructure within Kubernetes clusters
  • Deploy and manage open-source GenAI components and various AI/ML models
  • Evaluate and integrate new open-source GenAI tools and technologies
  • Collaborate with research and development teams to implement and optimize innovative AI/ML models and algorithms
  • Ensure security and compliance of open-source GenAI components within infrastructure
  • Leverage High-Performance Computing (HPC) experience to optimize and manage large-scale AI/ML workloads

Benefits

  • Performance incentive programs
  • Medical, dental, vision, and life insurance (plus other voluntary benefits)
  • 401(k) with company matching
  • Employee Stock Purchase Program (ESPP)
  • Student debt assistance
  • Tuition reimbursement program
  • Development and career growth opportunities
  • Financial planning benefits
  • Wellness benefits (including EAP)
  • Paid time off and paid company holidays
  • Family care and bonding leave
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎