Van Zyl van Vuuren

Lead machine learning engineer at ScienceIO
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
United States, US

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

5.0

/5.0
/ Based on 2 ratings
  • (2)
  • (0)
  • (0)
  • (0)
  • (0)

Filter reviews by:

Jon Kornik

It has been a pleasure working with Van Zyl. He brings a unique mix of skills, as both an experienced data scientist and a qualified AWS Solutions Architect. With these skills, he has been able to efficiently build high quality, secure and scalable infrastructure which is purpose built for deploying our data science applications and roadmap. Throughout our work together, qualities that have stood out include his strong analytical and problem solving skills, the ability to know how deep to dive, and the technical skills necessary to run data science analyses, develop ML models, and build infrastructure on AWS.

Dirk Badenhorst

I was fortunate enough to work alongside Van Zyl as a peer at Health Q technologies. He was extremely easy and pleasant to work with in a very difficult and high pressure environment. I had always respected his ability to do deep and technical R&D work, as well as his ability to finish, polish and productionize data driven solutions. He was primary developer in the development and production of a world class Digital Signal Processing algorithm as a result of this. When I received an oppertunity at 4G Capital as Chief Data Science officer, Van Zyl was one of the first people I head hunted to join my team. There he proved to be so much more than originally suspected. Unbeknown to me, he had spent his free time to further expand on his technical knowledge way beyond my original expectations. He very quickly became not only a invaluable resource, but someone I depended on to review my own work and sound board off. Van Zyl is an extremely valuable Data Engineer or Data Scientist for any company lucky enough to have him. I sincerely hope our paths cross again.

You need to have a working account to view this content.
You need to have a working account to view this content.

Credentials

  • AWS Certified Machine Learning – Specialty
    Amazon Web Services (AWS)
    Jan, 2022
    - Nov, 2024
  • AWS Certified Data Analytics – Specialty
    Amazon Web Services (AWS)
    Jul, 2021
    - Nov, 2024
  • AWS Certified Solutions Architect – Professional
    Amazon Web Services (AWS)
    Mar, 2021
    - Nov, 2024
  • AWS Certified Solutions Architect – Associate
    Amazon Web Services (AWS)
    Dec, 2020
    - Nov, 2024

Experience

    • United States
    • Biotechnology Research
    • 1 - 100 Employee
    • Lead machine learning engineer
      • Sep 2022 - Present

    • Machine Learning Engineer
      • Aug 2021 - Sep 2022

      ■ Deploy and manage infrastructure using AWS CDK (cloud development kit in python) ■ Set up a framework to build container based HTTP servers for model deployment on AWS SageMaker ■ Automate model building and testing using github actions/AWS CodeBuild, and AWS CodePipelines ■ Set up a model deployment framework using AWS SageMaker. This includes setting the instance types, scaling policies, and blue/green deployments via config files. ■ Load test model containers and scaling… Show more ■ Deploy and manage infrastructure using AWS CDK (cloud development kit in python) ■ Set up a framework to build container based HTTP servers for model deployment on AWS SageMaker ■ Automate model building and testing using github actions/AWS CodeBuild, and AWS CodePipelines ■ Set up a model deployment framework using AWS SageMaker. This includes setting the instance types, scaling policies, and blue/green deployments via config files. ■ Load test model containers and scaling policies ■ Built a data provenance workflow tool used for data engineering jobs using AWS Sagemaker processing, SageMaker Batch, AWS Fargate, AWS Batch, and AWS step functions

    • Renewable Energy Semiconductor Manufacturing
    • 1 - 100 Employee
    • Senior Data Scientist/Machine learning engineer
      • Sep 2020 - Aug 2021

      ■ Provision AWS infrastructure (Glue, Sagemaker, stepfunctions) for ML model(XGBoost) training, hyper-parameter optimisation and deployment ■ Analyse, preprocess, and extract features from data from PostgreSQL database ■ Develop serverless analytics backend on AWS with a REST API (Lambda, API gateway) ■ Develop python Dash dashboard and provision AWS infrastructure for deployment (elastic beanstalk) and authentication (Cognito) ■ Set up CI/CD pipelines with AWS CodeBuild and… Show more ■ Provision AWS infrastructure (Glue, Sagemaker, stepfunctions) for ML model(XGBoost) training, hyper-parameter optimisation and deployment ■ Analyse, preprocess, and extract features from data from PostgreSQL database ■ Develop serverless analytics backend on AWS with a REST API (Lambda, API gateway) ■ Develop python Dash dashboard and provision AWS infrastructure for deployment (elastic beanstalk) and authentication (Cognito) ■ Set up CI/CD pipelines with AWS CodeBuild and -CodePipeline for dev and prod environments Show less ■ Provision AWS infrastructure (Glue, Sagemaker, stepfunctions) for ML model(XGBoost) training, hyper-parameter optimisation and deployment ■ Analyse, preprocess, and extract features from data from PostgreSQL database ■ Develop serverless analytics backend on AWS with a REST API (Lambda, API gateway) ■ Develop python Dash dashboard and provision AWS infrastructure for deployment (elastic beanstalk) and authentication (Cognito) ■ Set up CI/CD pipelines with AWS CodeBuild and… Show more ■ Provision AWS infrastructure (Glue, Sagemaker, stepfunctions) for ML model(XGBoost) training, hyper-parameter optimisation and deployment ■ Analyse, preprocess, and extract features from data from PostgreSQL database ■ Develop serverless analytics backend on AWS with a REST API (Lambda, API gateway) ■ Develop python Dash dashboard and provision AWS infrastructure for deployment (elastic beanstalk) and authentication (Cognito) ■ Set up CI/CD pipelines with AWS CodeBuild and -CodePipeline for dev and prod environments Show less

    • Mauritius
    • Financial Services
    • 300 - 400 Employee
    • Senior Data Scientist/Machine learning engineer
      • Aug 2019 - Oct 2020

      ■ Train ML models to predict credit scores and affordability (XGoost, AWS Sagemaker) ■ Use genetic algorithms for feature selection and Bayesian optimisation for parameter tuning ■ Develop risk-based pricing strategies and reported on key metrics ■ Design and implement a workflow on AWS to produce risk assessments for field agents (Glue, lambda, Step functions, and SageMaker) ■ Train ML models to predict credit scores and affordability (XGoost, AWS Sagemaker) ■ Use genetic algorithms for feature selection and Bayesian optimisation for parameter tuning ■ Develop risk-based pricing strategies and reported on key metrics ■ Design and implement a workflow on AWS to produce risk assessments for field agents (Glue, lambda, Step functions, and SageMaker)

    • United States
    • Wellness and Fitness Services
    • 1 - 100 Employee
    • Data Scientist
      • Oct 2013 - Aug 2019

      ■ Create detailed technical reports with results from models/algorithms (seaborn) ■ Regression analyses (linear, logistic and decision tree models) to find relationships between features or an outcome, feature significance analyses, model performance analyses (confusion matrices, kappa, F1, ROC curves) ■ Data preprocessing: unusable data is pruned, and missing data is imputed ■ Developed ML models/algorithms : motion compensated PPG-based heart rate (integrated with companies like… Show more ■ Create detailed technical reports with results from models/algorithms (seaborn) ■ Regression analyses (linear, logistic and decision tree models) to find relationships between features or an outcome, feature significance analyses, model performance analyses (confusion matrices, kappa, F1, ROC curves) ■ Data preprocessing: unusable data is pruned, and missing data is imputed ■ Developed ML models/algorithms : motion compensated PPG-based heart rate (integrated with companies like Garmin, TomTom, and Montblanc), device-on-skin detection, exercise-activity classification, sleep apneoa classification, and drowsiness detection ■ Python machine learning packages used : scikit learn, statsmodels, imblearn, patsy, pyGAM and keras (Tensorflow) ■ ML models used : linear- and logistic regression, SVM, decision trees, ensemble models (gradient boosting, random forests), feature selection (Pearson’s correlation, Spearman’s correlation, mutual information, PCA, ICA, recursive feature elimination, LASSO based, tree based, etc.), deep learning (CNN, LSTM), and clustering/Gaussian mixture models ■ Frequency- and time domain analyses on time series data for feature extraction ■ Time series analysis techniques used : FFT, DWT, Lombscargle, sparse signal reconstruction, filter design (FIR, IIR), adaptive filters (notch, comb, Kalman, LMS, RLS, QRLS), PCA, ICA, emprical mode decomposition, detrended fluctuation analysis, hurst exponent, correlation dimension, and sample entropy Show less ■ Create detailed technical reports with results from models/algorithms (seaborn) ■ Regression analyses (linear, logistic and decision tree models) to find relationships between features or an outcome, feature significance analyses, model performance analyses (confusion matrices, kappa, F1, ROC curves) ■ Data preprocessing: unusable data is pruned, and missing data is imputed ■ Developed ML models/algorithms : motion compensated PPG-based heart rate (integrated with companies like… Show more ■ Create detailed technical reports with results from models/algorithms (seaborn) ■ Regression analyses (linear, logistic and decision tree models) to find relationships between features or an outcome, feature significance analyses, model performance analyses (confusion matrices, kappa, F1, ROC curves) ■ Data preprocessing: unusable data is pruned, and missing data is imputed ■ Developed ML models/algorithms : motion compensated PPG-based heart rate (integrated with companies like Garmin, TomTom, and Montblanc), device-on-skin detection, exercise-activity classification, sleep apneoa classification, and drowsiness detection ■ Python machine learning packages used : scikit learn, statsmodels, imblearn, patsy, pyGAM and keras (Tensorflow) ■ ML models used : linear- and logistic regression, SVM, decision trees, ensemble models (gradient boosting, random forests), feature selection (Pearson’s correlation, Spearman’s correlation, mutual information, PCA, ICA, recursive feature elimination, LASSO based, tree based, etc.), deep learning (CNN, LSTM), and clustering/Gaussian mixture models ■ Frequency- and time domain analyses on time series data for feature extraction ■ Time series analysis techniques used : FFT, DWT, Lombscargle, sparse signal reconstruction, filter design (FIR, IIR), adaptive filters (notch, comb, Kalman, LMS, RLS, QRLS), PCA, ICA, emprical mode decomposition, detrended fluctuation analysis, hurst exponent, correlation dimension, and sample entropy Show less

    • South Africa
    • Higher Education
    • 700 & Above Employee
    • External Examiner
      • Jan 2018 - Apr 2019

Education

  • Stellenbosch University/Universiteit Stellenbosch
    Bachelor of Engineering (BEng), Electrical and Electronics Engineering with Computer Science
    2008 - 2011
  • Stellenbosch University/Universiteit Stellenbosch
    Master of Engineering (MEng), Electrical and Electronics Engineering
    2012 - 2013

Community

You need to have a working account to view this content. Click here to join now