Usamah Zaheer

Machine Learning Engineer · Cambridge, EN

Machine Learning Engineer with 4+ years of experience in deep learning, computer vision, and edge computing. Strong background in end-to-end ML development, model optimisation, and MLOps practices.

Usamah Zaheer is a Machine Learning Software Engineer at Arm, where he designs and optimises ML infrastructure for Arm architectures. He holds an MS in Artificial Intelligence from the University of Texas at Austin and an MS in Embedded Systems from the University of Leicester. His work spans deep learning, computer vision, edge computing, robotics, and MLOps across companies including Arm, Dyson, and the University of Leicester.

Experience

Machine Learning Software Engineer — Arm

Mar 2025 – Present
  • Designed and optimised ML infrastructure to analyse and enhance performance of models and systems on Arm architectures.
  • Optimised ML compilers and libraries for Arm, improving inference performance and reducing latency.
  • Conducted deep kernel-level analysis to identify and eliminate inference bottlenecks.
  • Profiled and optimised runtime performance of ML models; developed scalable benchmarking solutions across cloud and edge environments.
  • Built automated pipelines for data collection, preprocessing, and model evaluation, streamlining production workflows.
  • Led cross-functional collaborations to align ML infrastructure with organisational goals, owning a new project from inception.

PyTorch · TensorFlow · JAX · FBGEMM · KleidiAI · ACL · ArmNN · OneDNN

Robotics Software Engineer — Dyson

Sep 2022 – Aug 2024
  • Developed CNN algorithms for segmentation, object detection, and classification, applying quantisation, pruning, and knowledge distillation for deployment on robot hardware.
  • Architected evaluation tools and robotics algorithms for planning and navigation using C++. Conducted log analysis, debugging, and on-robot testing.
  • Deployed models on diverse hardware and edge devices, optimising through profiling and bottleneck analysis using CUDA and cuDNN.
  • Developed a VLM solution that saved over £100,000 and boosted productivity by 20x.
  • Streamlined the ML lifecycle with model versioning, monitoring, and automated deployment using CI/CD pipelines.
  • Presented complex ML projects to senior leaders and the CEO.

PyTorch · MXNet · ONNX · CUDA · cuDNN · C++

ML Research Assistant — University of Leicester

Mar 2021 – Aug 2022
  • Spearheaded development of an end-to-end automated ML pipeline for processing high-resolution data in real-time.
  • Integrated cutting-edge CNNs in PyTorch and TensorFlow for high-resolution satellite imagery analysis.
  • Led development of ML systems utilising Random Forest and SVM for predictive modelling.
  • Deployed AI solutions in cloud environments with Docker and Kubernetes.

PyTorch · TensorFlow · Scikit-learn · Docker · Kubernetes · R

Projects

AI Agent Systems — Stealth Startup

Aug 2024 – Feb 2025
  • Led design and implementation of AI agents for SDRs, integrating NLP and LLMs for accurate and scalable solutions.
  • Managed cloud infrastructure on Vertex AI with Databricks and Snowflake, utilising RAG to enhance AI response precision.
  • Orchestrated ML workflows and semantic search using LangChain and LlamaIndex.
  • Oversaw containerised deployment with Docker and Kubernetes, implementing MLOps practices with MLflow.

ML Model Deployment App

  • Developed an Android application for deploying ML models on edge devices for object detection using YOLO, Mask R-CNN, and SSD.
  • Optimised using TensorFlow Lite with quantisation and pruning for low-latency, on-device inference.

360 Vision Navigation — Dyson

  • Developed a robot utilising 26 sensors with SLAM technology and 360-degree vision for autonomous navigation.
  • Advanced path planning algorithms and developed integration and unit tests in C++ and Python.

Air Purifiers Embedded Software — Dyson

  • Contributed to embedded software for all Dyson air purifiers including Pure Cool using C and Python.
  • Designed backbone logic behaviour system for hardware communication with FreeRTOS. Updated a fundamental library stack for 10+ projects.

AI4EO & CNN Research — University of Leicester

  • Classified high-resolution satellite images for forest fire detection using CNNs, Random Forest, and SVM.
  • Evaluated CNN architectures for autonomous vehicle applications, optimised with Transfer Learning and TensorRT.

Education

Skills

Languages: Python · C++ · C · Rust · SQL · MATLAB

ML & Deep Learning: PyTorch · TensorFlow · JAX · Scikit-learn · LLMs · VLMs · Multimodal Models · CNNs · Transformers

Computing & Inference: TensorRT · CUDA · cuDNN · ArmNN · LLVM · GGML · OpenMP · ONNX/Runtime · FBGEMM · KleidiAI

Profiling & Debugging: NVIDIA Nsight · PyTorch Profiler · TensorFlow Profiler · Valgrind · gprof · cProfile · Py-Spy

Cloud & Deployment: AWS · GCP · Vertex AI · Kubernetes · Docker · GitHub Actions · Jenkins · MLflow · KubeFlow · Databricks · Snowflake

Data & Visualisation: Pandas · Apache Spark · Matplotlib · Plotly · Streamlit · Grafana · Tableau · Gradio

Leadership

Contact

Latest Posts