Post your job offer for free on H1BConnect with no upfront cost!

Logo

Hire with Us
NVIDIA Corporation logo

Senior Observability Engineer, AI and HPC

NVIDIA Corporation

2/28/2025

US, CA, Santa Clara

Full-time

Salary: $184,000 - $356,500


Job Description

NVIDIA is seeking a Senior Observability Engineer to architect and implement distributed observability systems for AI and HPC clusters, collaborating with various engineering and research teams to improve efficiency and productivity.

Requirements

  • Experience developing large scale, distributed observability systems
  • Ability to collaborate with data scientists, researchers, and engineering teams
  • Experience with turning raw data into actionable reports
  • Experience with observability platforms like Apache Spark, Elastic/Open Search, Grafana, Prometheus, etc.
  • Python programming experience and use of API calls
  • Passion for improving productivity, excellent planning, and interpersonal skills
  • Flexibility to work in a dynamic environment with changing requirements
  • MS or BS in Computer Science, Electrical Engineering, or related field (or equivalent experience)
  • 8+ years of proven experience

Responsibilities

  • Collaborate with AI, HW, SW engineering and research teams to deliver observability solutions for AI/HPC clusters
  • Develop, test, and deploy data collectors, pipelines, visualization, and retrieval services
  • Define data collection and retention policies
  • Provide operational and strategic data to empower engineers and researchers
  • Continuously improve quality, workloads, and processes through better observability

Benefits

  • Multiple relocation packages
  • Two weeklong shutdowns (mid-summer and year-end) in the US (in addition to PTO)
  • 8-week parental leave
  • 9 Employee Resource Groups
  • Annual bonus offering
  • Flexible work arrangements
  • Up to 6% 401K matching
Logo

© 2024 H1BConnect. All rights reserved.

Check out our sister site LatamDev for tech jobs in Latin America! 🌎