Topline | Benjamin Gilbert

Homepage Find Professionals Benjamin Gilbert

Benjamin Gilbert

Lead Data Engineer at Flex

Claim this Profile

Contact Information

us****@****om

(386) 825-5501

Gold Feature

Click to upgrade to our gold package
for the full feature experience.

Location

Madison, Wisconsin, United States, US

Topline Score

Topline score feature will be out soon.

Bio

Generated by

Topline AI

Jerry Q. Shi

Ben and I worked together under the big roof of Analytics department while I was on the Advertiser side and Ben was on the modeling side. He has always been the go-to person in terms of programming questions, modeling skill, company data, etc. He is so technical savvy and constantly did group/1-on-1 training sessions on Python, Hadoop, R, etc. In simple English, HE THE MAN!

You need to have a working account to view this content.

Join now

You need to have a working account to view this content.

Join now

Experience

Flex

United States
Software Development
1 - 100 Employee

Lead Data Engineer
- Oct 2022 - Present

Fetch

United States
Technology, Information and Internet
700 & Above Employee

Director, Data Platform
- Apr 2022 - Oct 2022
* Analytics Engineering * Machine Learning Engineering * Data Engineering * Computer Vision
Technical Manager, Data and Analytics Engineering
- Jun 2021 - Apr 2022
DataOps Technical Lead
- Nov 2020 - Jun 2021
Data Engineer
- Feb 2020 - Nov 2020

Drum Technologies, Inc.

United States
Software Development

Data Scientist Engineer
- Sep 2019 - Jan 2020
• Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena;… Show more • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena; schemas were defined with Glue. This was orchestrated by (Python) Lambdas set to trigger on specific events, such as data landing in DynamoDB or verification that a file conformed to an expected schema. Infrastructure was catalogued, tested, and deployed using Cloudformation’s SAM templates and functionality. • Created a simplified ETL integration suite to test code changes against the ETL pipeline. Integration tests were triggered by pull requests merged into develop branches in Github, commit SHAs were used to track sample data moving through the pipeline and finally Athena queries were used to confirm data flowed through the pipeline as expected. Show less • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena;… Show more • Research, designed, and coded a content diversification algorithm written in JavaScript that balanced relevance, uniqueness, and novelty in product and service searches from the mobile app against the Drum database (specifically data stored in Algolia and Redis). • Assisted in the construction of an event driven ETL pipeline built within the AWS ecosystem. Data moved from DynamoDB and 3rd party APIs into Kinesis, then Firehose, and eventually stored in S3 and made queryable with Athena; schemas were defined with Glue. This was orchestrated by (Python) Lambdas set to trigger on specific events, such as data landing in DynamoDB or verification that a file conformed to an expected schema. Infrastructure was catalogued, tested, and deployed using Cloudformation’s SAM templates and functionality. • Created a simplified ETL integration suite to test code changes against the ETL pipeline. Integration tests were triggered by pull requests merged into develop branches in Github, commit SHAs were used to track sample data moving through the pipeline and finally Athena queries were used to confirm data flowed through the pipeline as expected. Show less

Steady

United States
Technology, Information and Internet
1 - 100 Employee

Data Scientist
- Feb 2019 - Sep 2019
• Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between… Show more • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between jobs. The output clustered similar jobs into leaves in a job hierarchy tree based Cosine Similarity measurements. • Developed ETL pipeline orchestrated by AWS Cloudwatch, run through AWS Batch, with resources defined and deployed using Terraform. Coded ETL tasks in Python and packaged as Docker images, automated deployment of updated codebase with Drone and the ECR Drone plugin. • Took over responsibilities as Snowflake database administrator. Was able to reduce MoM costs from May to June by 2/3rds by restructuring process flow, restricting access, and migrating recurring jobs to cheap, scalable compute resources. A side effect was improved and well documented Role Based Access Control (RBAC). • POC’d Airflow and Pachyderm on Kubernetes (with Minikube). Tested different deploy configurations with Helm charts. Show less • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between… Show more • Created partner data importer in Python by setting up AWS S3 bucket notifications to trigger Lambda function when business partners sent daily data files, output was written to Kafka for consumption by backend dev team • Designed and assisted in the initial stages of a job recommendation engine after exploring literature published by Google and CareerBuilder. Leveraged the natural language processing (NLP) technique Doc2Vec to compare job descriptions and required skills between jobs. The output clustered similar jobs into leaves in a job hierarchy tree based Cosine Similarity measurements. • Developed ETL pipeline orchestrated by AWS Cloudwatch, run through AWS Batch, with resources defined and deployed using Terraform. Coded ETL tasks in Python and packaged as Docker images, automated deployment of updated codebase with Drone and the ECR Drone plugin. • Took over responsibilities as Snowflake database administrator. Was able to reduce MoM costs from May to June by 2/3rds by restructuring process flow, restricting access, and migrating recurring jobs to cheap, scalable compute resources. A side effect was improved and well documented Role Based Access Control (RBAC). • POC’d Airflow and Pachyderm on Kubernetes (with Minikube). Tested different deploy configurations with Helm charts. Show less

BBVA in the USA

United States
Banking
700 & Above Employee

Data Engineer
- 2016 - 2019
• Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate… Show more • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate internal data sources; ingestion and transformation were done with Scala Spark, data was stored on HDFS; processes were launched in Docker containers with logs stored in AWS CloudWatch • Assisted the DevOps engineer in the development of Terraform, Packer, and Ansible code to create load balancers with LDAP authentication; testing performed on a Vagrant box • Documented and reviewed Python/PySpark setup and best practices with the Customer Intelligence team, including connecting to HDFS, running Spark jobs, creating Spark UDFs for data transformations, and reading/writing with files and Postgres from Jupyter Notebooks Show less • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate… Show more • Built data pipeline that ingested auto loan applications, parsing XML documents with Scala, writing CSVs to AWS S3, crawling S3 with AWS Glue, and providing data to the Risk line of business through AWS Redshift using the external schema functionality • Created web application using RShiny to display weekly audit logs from Oracle and MySQL databases for compliance reporting • Developed ETL pipeline orchestrated by Nomad that sourced data from a SFTP server, external APIs, and disparate internal data sources; ingestion and transformation were done with Scala Spark, data was stored on HDFS; processes were launched in Docker containers with logs stored in AWS CloudWatch • Assisted the DevOps engineer in the development of Terraform, Packer, and Ansible code to create load balancers with LDAP authentication; testing performed on a Vagrant box • Documented and reviewed Python/PySpark setup and best practices with the Customer Intelligence team, including connecting to HDFS, running Spark jobs, creating Spark UDFs for data transformations, and reading/writing with files and Postgres from Jupyter Notebooks Show less

Cardlytics

United States
Advertising Services
300 - 400 Employee

Senior Statistical Analyst
- 2014 - 2016
• Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to… Show more • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to segment user base of over 30 million users, developing a partial factorial experimental design to test engagement strategies • Improved transaction match methodology with a Nearest Neighbor Search algorithm, reducing time and space requirements by over 90% while maintaining accuracy • Stepped into a leadership role as a technical reference for new analysts and interns, as well as providing insights and recommendations to technical and non-technical internal clients • Maximized customer response rates using optimization and predictive analytic techniques by determining Propensity to Buy Sales Analyst • Automated and maintained code for test vs. control analysis used to evaluate campaign performance, which simplified the process and reduced user error • Developed predictive analytic tools in SAS and R including a Mover Model, Homeowner Model, and Competitor Classification Model for broad reach targeting • Responded to ad hoc request from clients, sales, and account managers, including a churn analysis that Comcast requested be updated on a monthly basis • Organized and lead ‘lunch and learn’ sessions for the analytics group Show less • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to… Show more • Wrote a Map-Reduce job with Python that simulated ad impressions; the simulator tested and was used to select a more competitive targeting strategy for 2016; opened up a 15% growth opportunity for Direct Ad sales • Assisted in the data preparation, coding, and execution of Elastic Search on 200+ million bank transaction strings • Used Apache Drill and Hbase to modify transaction match from weekly matches to hourly improving overall match rates by 6% • Leveraged k-means clustering to segment user base of over 30 million users, developing a partial factorial experimental design to test engagement strategies • Improved transaction match methodology with a Nearest Neighbor Search algorithm, reducing time and space requirements by over 90% while maintaining accuracy • Stepped into a leadership role as a technical reference for new analysts and interns, as well as providing insights and recommendations to technical and non-technical internal clients • Maximized customer response rates using optimization and predictive analytic techniques by determining Propensity to Buy Sales Analyst • Automated and maintained code for test vs. control analysis used to evaluate campaign performance, which simplified the process and reduced user error • Developed predictive analytic tools in SAS and R including a Mover Model, Homeowner Model, and Competitor Classification Model for broad reach targeting • Responded to ad hoc request from clients, sales, and account managers, including a churn analysis that Comcast requested be updated on a monthly basis • Organized and lead ‘lunch and learn’ sessions for the analytics group Show less

Education

Clemson University
Master's degree, Applied Economics

2012 - 2014
University of North Carolina at Asheville
Bachelor of Science (B.S.), Statistics

2007 - 2012
University of North Carolina at Asheville
Bachelor of Arts (B.A.), Economics

2007 - 2012

Community

You need to have a working account to view this content. Click here to join now

Add Contact

Check Before You Do

Benjamin Gilbert

Topline Score

Bio

5.0

Filter reviews by:

Jerry Q. Shi

Jerry Q. Shi

Jerry Q. Shi

Jerry Q. Shi

Experience

Flex

Lead Data Engineer

Fetch

Director, Data Platform

Technical Manager, Data and Analytics Engineering

DataOps Technical Lead

Data Engineer

Drum Technologies, Inc.

Data Scientist Engineer

Steady

Data Scientist

BBVA in the USA

Data Engineer

Cardlytics

Senior Statistical Analyst

Education

Clemson University

University of North Carolina at Asheville

University of North Carolina at Asheville

Community

Get started with topline today!

Get started with topline today!

Sign in or create an account.

Claim and Verify your profile with One-Click!

Claim your profile

Let’s Get Into Business

Message me

Gold Feature

Email

Gold Feature

Phone

Add Action

Add Hire me

Add External URL

Add Embeded Code

Add Contact Me

Add Work With Me

Check Before You Do ✋

Sign in or create an account.

Account sign in

Account sign in

Forgot your password?