NVIDIA

Senior Software Architect, AI Networking

Israel, Tel Aviv Full time

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. Being an NVIDIAN means being part of a diverse and encouraging setting that encourages everyone to perform at their peak. Come join the team and discover how you can develop a lasting influence on the world. 

NVIDIA is in search of a Senior Software Architect- a creative, forward-thinking, and practical researcher to improve the framework for widespread LLM learning and prediction. As part of our dynamic E2E Architecture group, you will design and optimize systems driving generative AI workloads, working at the intersection of software and hardware on some of the most advanced GPU clusters worldwide. You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and compute scheduling to system-level optimization. This is an opportunity to collaborate with best-in-class engineers and researchers and shape the future of generative AI in real-world applications. Your work will make a lasting impact by enabling generative AI technologies to reach real-world applications and improve global computing capabilities. 

What You’ll Be Doing: 

  • Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, and placement.  

  • Analyze current deployments, develop prototypes, and recommend architectural improvements. 

  • Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies. 

  • Design, simulate, and validate new systems using novel, scalable network simulator NSX. 

  • Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1). 

  • Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features. 

  • Publish patents and present research at leading conferences. 

What We Need to See: 

  • M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience and publications.  

  • 5+ years of relevant experience.

  • Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing). 

  • Strong software engineering skills in C++ and/or Python. 

  • Excellent system-level design and problem-solving abilities. 

  • Outstanding communication and collaboration skills across technical domains.  

Ways to Stand Out from the Crowd: 

  • Proven passion for solving sophisticated technical problems and delivering impactful solutions. 

  • Record of publications in top-tier conferences. 

  • Experience in designing and building large-scale AI training clusters. 

  • Post-PhD research experience 

  • Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows.