Muhammad Haseeb

HPC Infrastructure and Performance Postdoc at National Energy Research Scientific Computing Center (NERSC)
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
San Francisco Bay Area
Languages
  • English Native or bilingual proficiency
  • Urdu Native or bilingual proficiency
  • Punjabi Native or bilingual proficiency

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Research Services
    • 1 - 100 Employee
    • HPC Infrastructure and Performance Postdoc
      • Apr 2023 - Present

      1. Develop cutting-edge GPU accelerated scientific software using new technologies in Programming Models (MPI, CUDA, SYCL, Kokkos, OpenMP-offload, AMReX), and C++ (C++26, stdexec, parSTL). 2. Investigate and model GPU-GPU communications in HPC applications over Perlmutter supercomputer interconnects. 3. Develop and optimize GPU-accelerated algorithms for ECP-WarpX (DOE's flagship particle acceleration simulation code). Skills: HPC, modern C++, STL evolution, programming… Show more 1. Develop cutting-edge GPU accelerated scientific software using new technologies in Programming Models (MPI, CUDA, SYCL, Kokkos, OpenMP-offload, AMReX), and C++ (C++26, stdexec, parSTL). 2. Investigate and model GPU-GPU communications in HPC applications over Perlmutter supercomputer interconnects. 3. Develop and optimize GPU-accelerated algorithms for ECP-WarpX (DOE's flagship particle acceleration simulation code). Skills: HPC, modern C++, STL evolution, programming models, Senders/Receivers, CUDA, Python, AMReX, build systems, performance engineering, profiling tools. Show less 1. Develop cutting-edge GPU accelerated scientific software using new technologies in Programming Models (MPI, CUDA, SYCL, Kokkos, OpenMP-offload, AMReX), and C++ (C++26, stdexec, parSTL). 2. Investigate and model GPU-GPU communications in HPC applications over Perlmutter supercomputer interconnects. 3. Develop and optimize GPU-accelerated algorithms for ECP-WarpX (DOE's flagship particle acceleration simulation code). Skills: HPC, modern C++, STL evolution, programming… Show more 1. Develop cutting-edge GPU accelerated scientific software using new technologies in Programming Models (MPI, CUDA, SYCL, Kokkos, OpenMP-offload, AMReX), and C++ (C++26, stdexec, parSTL). 2. Investigate and model GPU-GPU communications in HPC applications over Perlmutter supercomputer interconnects. 3. Develop and optimize GPU-accelerated algorithms for ECP-WarpX (DOE's flagship particle acceleration simulation code). Skills: HPC, modern C++, STL evolution, programming models, Senders/Receivers, CUDA, Python, AMReX, build systems, performance engineering, profiling tools. Show less

    • United States
    • Software Development
    • 1 - 100 Employee
    • Graduate Research Assistant
      • Aug 2018 - Apr 2023

      Developed parallel algorithms, data structures, and GPU kernels to scalably accelerate computational proteomics algorithms by > 40x on modern supercomputers. Skills: Modern C++, HPC, GPU Computing, Data Structures, OOP, Computational Biology Developed parallel algorithms, data structures, and GPU kernels to scalably accelerate computational proteomics algorithms by > 40x on modern supercomputers. Skills: Modern C++, HPC, GPU Computing, Data Structures, OOP, Computational Biology

    • United States
    • Research Services
    • 1 - 100 Employee
    • Application Performance Intern
      • May 2021 - Aug 2021

      1. Designed and implemented DPC++/SYCL-based GPU-accelerated sequence alignment kernels of the ADEPT framework (10-30% improvement). 2. Performance analysis and optimization of the SYCL-code versus the native implementations on Intel, NVIDIA, and AMD GPUs. Contributed towards the bug fixing and performance optimization of the SYCL + NVIDIA 11 compiler. Developed Python bindings for ADEPT code with zero-copy support for direct code usage from Python. Skills: GPU Computing… Show more 1. Designed and implemented DPC++/SYCL-based GPU-accelerated sequence alignment kernels of the ADEPT framework (10-30% improvement). 2. Performance analysis and optimization of the SYCL-code versus the native implementations on Intel, NVIDIA, and AMD GPUs. Contributed towards the bug fixing and performance optimization of the SYCL + NVIDIA 11 compiler. Developed Python bindings for ADEPT code with zero-copy support for direct code usage from Python. Skills: GPU Computing, DPC++/SYCL, CUDA, NSight, Modern C++, Optimization Show less 1. Designed and implemented DPC++/SYCL-based GPU-accelerated sequence alignment kernels of the ADEPT framework (10-30% improvement). 2. Performance analysis and optimization of the SYCL-code versus the native implementations on Intel, NVIDIA, and AMD GPUs. Contributed towards the bug fixing and performance optimization of the SYCL + NVIDIA 11 compiler. Developed Python bindings for ADEPT code with zero-copy support for direct code usage from Python. Skills: GPU Computing… Show more 1. Designed and implemented DPC++/SYCL-based GPU-accelerated sequence alignment kernels of the ADEPT framework (10-30% improvement). 2. Performance analysis and optimization of the SYCL-code versus the native implementations on Intel, NVIDIA, and AMD GPUs. Contributed towards the bug fixing and performance optimization of the SYCL + NVIDIA 11 compiler. Developed Python bindings for ADEPT code with zero-copy support for direct code usage from Python. Skills: GPU Computing, DPC++/SYCL, CUDA, NSight, Modern C++, Optimization Show less

    • United States
    • Research Services
    • 1 - 100 Employee
    • Application Performance Intern
      • May 2020 - Aug 2020

      Developed core features including dynamic instrumentation, python instrumentation, C/PyCtesting, and integration of an HPC instrumentation framework called Timemory. Skills: Performance Analysis, Modern C++, CRTP, SFINAE, CMake, Spack, CI Developed core features including dynamic instrumentation, python instrumentation, C/PyCtesting, and integration of an HPC instrumentation framework called Timemory. Skills: Performance Analysis, Modern C++, CRTP, SFINAE, CMake, Spack, CI

    • Graduate Student Researcher
      • Aug 2017 - Aug 2018

      Research on time and memory efficient indexing algorithms for database peptide search for peptide sequencing. Research on time and memory efficient indexing algorithms for database peptide search for peptide sequencing.

    • United States
    • Software Development
    • 700 & Above Employee
    • Senior Embedded Software Engineer
      • Dec 2016 - Aug 2017

      Developed core features for Mentor Embedded Multicore Framework (MEMF) and Nucleus RTOS Kernel. Skills: Embedded Systems, Embedded C, OS

    • Embedded Software Engineer
      • Aug 2015 - Nov 2016

      Developed core features of Mentor Embedded Multicore Framework (MEMF) and Nucleus RTOS Kernel. MEMF implements the unsupervised Multicore MultiOS software on ARM-based homogeneous and heterogeneous multicore SOCs. Skills: Embedded Systems, Embedded C, OS

Education

  • Florida International University
    Doctor of Philosophy - PhD, Computer Science
    2018 - 2023
  • University of Engineering and Technology, Lahore
    Bachelor’s Degree, Electrical Engineering
    2011 - 2015

Community

You need to have a working account to view this content. Click here to join now