Benjamin Gilbert

Lead Data Engineer at Flex
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
Madison, Wisconsin, United States, US

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

5.0

/5.0
/ Based on 1 ratings
  • (1)
  • (0)
  • (0)
  • (0)
  • (0)

Filter reviews by:

Jerry Q. Shi

Ben and I worked together under the big roof of Analytics department while I was on the Advertiser side and Ben was on the modeling side. He has always been the go-to person in terms of programming questions, modeling skill, company data, etc. He is so technical savvy and constantly did group/1-on-1 training sessions on Python, Hadoop, R, etc. In simple English, HE THE MAN!

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Software Development
    • 1 - 100 Employee
    • Lead Data Engineer
      • Oct 2022 - Present

    • United States
    • Technology, Information and Internet
    • 700 & Above Employee
    • Director, Data Platform
      • Apr 2022 - Oct 2022

      * Analytics Engineering * Machine Learning Engineering * Data Engineering * Computer Vision

    • Technical Manager, Data and Analytics Engineering
      • Jun 2021 - Apr 2022

    • DataOps Technical Lead
      • Nov 2020 - Jun 2021

    • Data Engineer
      • Feb 2020 - Nov 2020

    • United States
    • Software Development
    • Data Scientist Engineer
      • Sep 2019 - Jan 2020

      • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena;… Show more • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena; schemas were defined with Glue. This was orchestrated by (Python) Lambdas set to trigger on specific events, such as data landing in DynamoDB or verification that a file conformed to an expected schema. Infrastructure was catalogued, tested, and deployed using Cloudformation’s SAM templates and functionality. • Created a simplified ETL integration suite to test code changes against the ETL pipeline. Integration tests were triggered by pull requests merged into develop branches in Github, commit SHAs were used to track sample data moving through the pipeline and finally Athena queries were used to confirm data flowed through the pipeline as expected. Show less • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena;… Show more • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena; schemas were defined with Glue. This was orchestrated by (Python) Lambdas set to trigger on specific events, such as data landing in DynamoDB or verification that a file conformed to an expected schema. Infrastructure was catalogued, tested, and deployed using Cloudformation’s SAM templates and functionality. • Created a simplified ETL integration suite to test code changes against the ETL pipeline. Integration tests were triggered by pull requests merged into develop branches in Github, commit SHAs were used to track sample data moving through the pipeline and finally Athena queries were used to confirm data flowed through the pipeline as expected. Show less

    • United States
    • Technology, Information and Internet
    • 1 - 100 Employee
    • Data Scientist
      • Feb 2019 - Sep 2019

      • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between… Show more • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between jobs. The output clustered similar jobs into leaves in a job hierarchy tree based Cosine Similarity measurements. • Developed ETL pipeline orchestrated by AWS Cloudwatch, run through AWS Batch, with resources defined and deployed using Terraform. Coded ETL tasks in Python and packaged as Docker images, automated deployment of updated codebase with Drone and the ECR Drone plugin. • Took over responsibilities as Snowflake database administrator. Was able to reduce MoM costs from May to June by 2/3rds by restructuring process flow, restricting access, and migrating recurring jobs to cheap, scalable compute resources. A side effect was improved and well documented Role Based Access Control (RBAC). • POC’d Airflow and Pachyderm on Kubernetes (with Minikube). Tested different deploy configurations with Helm charts. Show less • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between… Show more • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between jobs. The output clustered similar jobs into leaves in a job hierarchy tree based Cosine Similarity measurements. • Developed ETL pipeline orchestrated by AWS Cloudwatch, run through AWS Batch, with resources defined and deployed using Terraform. Coded ETL tasks in Python and packaged as Docker images, automated deployment of updated codebase with Drone and the ECR Drone plugin. • Took over responsibilities as Snowflake database administrator. Was able to reduce MoM costs from May to June by 2/3rds by restructuring process flow, restricting access, and migrating recurring jobs to cheap, scalable compute resources. A side effect was improved and well documented Role Based Access Control (RBAC). • POC’d Airflow and Pachyderm on Kubernetes (with Minikube). Tested different deploy configurations with Helm charts. Show less

    • United States
    • Banking
    • 700 & Above Employee
    • Data Engineer
      • 2016 - 2019

      • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate… Show more • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate internal data sources; ingestion and transformation were done with Scala Spark, data was stored on HDFS; processes were launched in Docker containers with logs stored in AWS CloudWatch • Assisted the DevOps engineer in the development of Terraform, Packer, and Ansible code to create load balancers with LDAP authentication; testing performed on a Vagrant box • Documented and reviewed Python/PySpark setup and best practices with the Customer Intelligence team, including connecting to HDFS, running Spark jobs, creating Spark UDFs for data transformations, and reading/writing with files and Postgres from Jupyter Notebooks Show less • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate… Show more • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate internal data sources; ingestion and transformation were done with Scala Spark, data was stored on HDFS; processes were launched in Docker containers with logs stored in AWS CloudWatch • Assisted the DevOps engineer in the development of Terraform, Packer, and Ansible code to create load balancers with LDAP authentication; testing performed on a Vagrant box • Documented and reviewed Python/PySpark setup and best practices with the Customer Intelligence team, including connecting to HDFS, running Spark jobs, creating Spark UDFs for data transformations, and reading/writing with files and Postgres from Jupyter Notebooks Show less

    • United States
    • Advertising Services
    • 300 - 400 Employee
    • Senior Statistical Analyst
      • 2014 - 2016

      • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to… Show more • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to segment user base of over 30 million users, developing a partial factorial experimental design to test engagement strategies • Improved transaction match methodology with a Nearest Neighbor Search algorithm, reducing time and space requirements by over 90% while maintaining accuracy • Stepped into a leadership role as a technical reference for new analysts and interns, as well as providing insights and recommendations to technical and non-technical internal clients • Maximized customer response rates using optimization and predictive analytic techniques by determining Propensity to Buy Sales Analyst • Automated and maintained code for test vs. control analysis used to evaluate campaign performance, which simplified the process and reduced user error • Developed predictive analytic tools in SAS and R including a Mover Model, Homeowner Model, and Competitor Classification Model for broad reach targeting • Responded to ad hoc request from clients, sales, and account managers, including a churn analysis that Comcast requested be updated on a monthly basis • Organized and lead ‘lunch and learn’ sessions for the analytics group Show less • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to… Show more • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to segment user base of over 30 million users, developing a partial factorial experimental design to test engagement strategies • Improved transaction match methodology with a Nearest Neighbor Search algorithm, reducing time and space requirements by over 90% while maintaining accuracy • Stepped into a leadership role as a technical reference for new analysts and interns, as well as providing insights and recommendations to technical and non-technical internal clients • Maximized customer response rates using optimization and predictive analytic techniques by determining Propensity to Buy Sales Analyst • Automated and maintained code for test vs. control analysis used to evaluate campaign performance, which simplified the process and reduced user error • Developed predictive analytic tools in SAS and R including a Mover Model, Homeowner Model, and Competitor Classification Model for broad reach targeting • Responded to ad hoc request from clients, sales, and account managers, including a churn analysis that Comcast requested be updated on a monthly basis • Organized and lead ‘lunch and learn’ sessions for the analytics group Show less

Education

  • Clemson University
    Master's degree, Applied Economics
    2012 - 2014
  • University of North Carolina at Asheville
    Bachelor of Science (B.S.), Statistics
    2007 - 2012
  • University of North Carolina at Asheville
    Bachelor of Arts (B.A.), Economics
    2007 - 2012

Community

You need to have a working account to view this content. Click here to join now