Senior Performance Architect

quadric.io, Burlingame, California, United States, 94012

Senior Performance Architect

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code. As a Senior Performance Architect, you will be the critical link between software and hardware, responsible for understanding how code executes on Quadric's architecture and identifying opportunities for optimization. You will analyze workloads from high-level C++ and Python down through generated assembly to pinpoint performance bottlenecks. This is a hands-on role: beyond analysis, you will prototype solutions yourself - whether that means writing optimized code, modifying compiler passes, or building proof-of-concept implementations to validate proposed fixes before handing off to the appropriate team for productization. This role requires regular work from the Quadric office in Burlingame, CA, a minimum of 23 days per week, with some weeks requiring more days onsite based on business needs. Candidates must be able to commute to the office. Responsibilities

Analyze application performance across the full stack: C++/Python source, compiler output, assembly, and hardware execution Identify and localize performance bottlenecks to specific code regions, assembly sequences, or architectural limitations Implement proof-of-concept fixes and optimizations to validate proposed solutions before broader rollout Develop and maintain profiling infrastructure, benchmarks, and performance regression tests Collaborate with compiler engineers to improve code generation and optimization passes Work with hardware architects to identify microarchitectural improvements and validate performance models Create performance models that predict workload behavior and guide optimization priorities Document findings and communicate performance insights to both technical and non-technical stakeholders Support customer engagements by analyzing their workloads and recommending optimizations