Saad Ashfaq
AI Compiler Engineer at Deeplite- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
-
English Native or bilingual proficiency
-
Urdu Native or bilingual proficiency
Topline Score
Bio
Experience
-
Deeplite
-
Canada
-
Software Development
-
1 - 100 Employee
-
AI Compiler Engineer
-
Mar 2021 - Present
• Developed Deeplite Runtime (DLRT) as an inference engine based on the TVM compiler stack enabling deployment of ultra-low precision quantized deep learning models on Arm Cortex-A based platforms• Implemented and optimized bitserial convolution kernels for Armv7 and Armv8 architectures to accelerate the execution of quantized convolution layers with extremely low bitwidth (1-2 bits) weights and activations• Defined transformation passes to convert trained fake quantized models to ultra low-bit representations that can be executed efficiently on Arm target devices• Designed a mixed precision approach to selectively offload convolution layers to 32-bit full-precision, 8-bit integer or ultra-low precision bitserial kernels to minimize the accuracy drop of quantized models• Profiled and compared the end-to-end model execution time of ultra-low bit quantized models on DLRT against open-source inference engine and runtime frameworks achieving up to 7x improvement in latency over the full-precision baseline
-
-
-
Huawei
-
China
-
Telecommunications
-
700 & Above Employee
-
Senior Software Engineer - DaVinci AI Compilers
-
Mar 2020 - Mar 2021
• Implemented compiler optimizations in the LLVM framework with a focus on loop optimizations• Developed middle end and back end passes to improve low-level performance of the vector execution unit in Huawei's AI processor• Defined passes and runtime libraries to enable support of tensor data structure in the C+T programming language• Created regression and end-to-end tests to validate functionality and performance• Collaborated with teams across geographic locations on a regular basis
-
-
-
AMD
-
United States
-
Semiconductor Manufacturing
-
700 & Above Employee
-
Senior Software Development Engineer - Platform Security
-
Jun 2018 - Mar 2020
• Developed the Secure OS and Bootloader components in a Trusted Execution Environment (TEE) for the ARM Cortex-A5 security coprocessor on AMD dGPUs• Enabled virtualization security features based on the SR-IOV specification for key customers including Google Stadia, Microsoft Project xCloud and Amazon Web Services• Implemented low-level features such as single-GPU and multi-GPU ASIC reset, secure firmware loading, power saving and ECC error handling on server dGPU products• Resolved issues in the security software stack during pre-silicon, ASIC bring-up and post-release stages
-
-
Software Development Engineer - Multimedia
-
Aug 2016 - Jun 2018
• Developed user and kernel mode layers of the multimedia driver stack for dGPU and APU projects in accordance with the Windows Display Driver Model• Optimized power management and job scheduling for the firmware executing on the embedded multimedia microprocessor• Involved in the bring-up phase of future multimedia IPs on FPGA and in simulation• Designed internal tools to validate software functionality of the driver stack• Debugged and resolved reported issues from customers and internal validation teams
-
-
-
AMD
-
United States
-
Semiconductor Manufacturing
-
700 & Above Employee
-
System Validation Intern
-
May 2014 - Jul 2015
• Executed test procedures to validate features of dGPU products• Created scripts to automate manual validation test cases using Ruby, bash and batch• Performed clock profiling experiments reducing the typical runtime duration from 3 days to less than 5 hours• Debugged post-silicon issues in collaboration with cross-functional teams • Executed test procedures to validate features of dGPU products• Created scripts to automate manual validation test cases using Ruby, bash and batch• Performed clock profiling experiments reducing the typical runtime duration from 3 days to less than 5 hours• Debugged post-silicon issues in collaboration with cross-functional teams
-
-
Education
-
University of Toronto
Master of Engineering - MEng, Computer Engineering -
University of Toronto
Bachelor of Applied Science - BASc, Computer Engineering