Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
Tesla, Inc. logo

Site Reliability Engineer, AI Infrastructure

Tesla, Inc.

10/6/2024

Palo Alto, CA

Full-time

Salary: $120,000 - $300,000


Job Description

As a Site Reliability Engineer, you will maintain and improve the AI infrastructure to support FSD, Tesla Bot, and Dojo teams. Your role involves managing AI infrastructure, monitoring compute/GPU/network metrics, troubleshooting Linux performance, and enhancing security. Your work will facilitate neural network training at scale and streamline development for Dojo, the most powerful supercomputer to date.

Requirements

  • Proficiency in Python, Golang, and/or Bash
  • Proficiency with Linux fundamentals and performance optimizations
  • Bachelor's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or equivalent experience
  • 3+ years of additional equivalent experience or evidence of exceptional ability related to the position.

Responsibilities

  • Support AI/ML cluster infrastructure on GPU and Dojo platforms
  • Improve monitoring & self-healing pipelines
  • Optimize server, storage, and network performance
  • Develop new tools in Python, Golang, or Bash/Shell
  • Use Infrastructure as Code best practices
  • Participate in 24x7 on-call rotation.

Benefits

  • Benefits include Aetna PPO and HSA plans with $0 payroll deduction, family-building benefits, dental and vision plans with $0 contribution, flexible spending accounts, 401(k) with employer match, company-paid life insurance, sick and vacation time, and employee discounts.
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎