Sumith Suresh
Lead/Senior Data Engineer at Citizen Bank- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
Topline Score
Bio
Experience
-
Citizen Bank
-
United States
-
Financial Services
-
1 - 100 Employee
-
Lead/Senior Data Engineer
-
Sep 2021 - Present
✓ Developed Spark applications in Python for Apache Spark data processing, handling data from RDBMS and streaming sources. ✓ Designed scalable distributed data solutions with Hadoop and maintained Hadoop clusters on GCP using Google Cloud Storage, BigQuery, and DataProc. ✓ Created data pipelines in GCP using Apache Airflow, incorporating various operators for ETL jobs. ✓ Utilized GCP services for migrating on-premises applications to AWS, ensuring smooth data transfer and integration. ✓ Implemented Spark Streaming and utilized Kafka for real-time data processing and capturing UI updates from XML messages. ✓ Worked with various big data APIs and frameworks including Spark RDD, Data Frame API, Data Set API, Data Source API, Spark SQL, and Spark Streaming. ✓ Developed ETL programs in Netezza to load and transform data into NoSQL databases and MySQL, ensuring regular updates. ✓ Applied Hive QL for partitioning and bucketing data, running queries on Parquet tables, and leveraged Cassandra for distributed data storage and management. Show less
-
-
-
Apple
-
United States
-
Computers and Electronics Manufacturing
-
700 & Above Employee
-
Senior Data Engineer
-
Mar 2019 - Dec 2020
✓ Implemented multi-node cloud cluster setup on AWS EC2 and handled AWS Management Tools (CloudWatch, CloudTrail). ✓ Developed Spark applications using Spark DataFrames and Spark SQL API for batch processing. ✓ Designed and implemented GCP data solutions for enterprise data warehouse and data lakes. ✓ Built data pipelines in GCP using Airflow for ETL jobs with various operators. ✓ Worked on real-time data movement using Spark Structured Streaming and Kafka. ✓ Migrated REST APIs from AWS Lambda and API Gateway to Microservices architecture using Docker and Kubernetes (GCP GKE). ✓ Deployed Spark jobs on GCP DataProc clusters and utilized Cloudera Hadoop on GCP and AWS. ✓ Extensive experience with GitLab CI/CD, Jenkins, Terraform, and integration of Java, Spring Boot, Hibernate, and databases like MongoDB and MySQL. Show less
-
-
-
CVS Health
-
United States
-
Hospitals and Health Care
-
700 & Above Employee
-
Data Engineer
-
Feb 2017 - Mar 2019
✓ Gathered data and business requirements, designed data migration solutions, and performed data validation using HIVE, SQL, and Tableau/Power Bi/Cognos. ✓ Automated and orchestrated tasks with AWS Step Functions and integrated Apache Airflow with AWS for monitoring ML workflows. ✓ Developed PL/SQL statements for database operations and utilized indexing, aggregation, and materialized views for query performance. ✓ Performed statistical analysis using SQL, Python, R Programming, and Excel, including Excel VBA Macros and Microsoft Access Forms. ✓ Extracted, transformed, and loaded data from transaction systems using Python, SAS, and R programming. ✓ Collaborated with AWS for data storage, handling terabytes of data for customer BI reporting tools. ✓ Implemented data ingestion using Sqoop and HDFS, and worked with dimensional modeling and slowly changing dimensions. ✓ Used Apache Airflow for data pipeline authoring, scheduling, and monitoring. Show less
-
-
-
Tata Consultancy Services
-
India
-
IT Services and IT Consulting
-
700 & Above Employee
-
Data Engineer
-
May 2015 - Dec 2016
✓ Responsible for end-to-end software development lifecycle, including requirements analysis, design, coding, testing, maintenance, and support. ✓ Developed stored procedures, triggers, packages, and SQL scripts based on requirements. ✓ Created complex SQL queries using views, subqueries, and correlated subqueries. ✓ Conducted architecture and implementation assessments of AWS services such as Amazon EMR, Redshift, and S3. ✓ Automated data loading and preprocessing tasks with Oozie workflows, Pig, and HiveQL. ✓ Utilized Zookeeper for cluster coordination and worked with OOZIE workflows in the Cloudera environment. ✓ Implemented Kafka consumer API for data consumption and used Sqoop for exporting analyzed data to relational databases. ✓ Proficient in CICD pipelines using Terraform, Docker containers, and container orchestration tools like EC2 Container Service and Kubernetes. Show less
-
-
-
Biocon
-
India
-
Biotechnology Research
-
700 & Above Employee
-
Hadoop Developer
-
Mar 2013 - May 2015
✓ Installed and configured SQL Server 2005, worked on development and optimization of a Loans database. ✓ Built scalable distributed data solutions using Hadoop, installed and configured Hive, Pig, Sqoop, and Oozie. ✓ Developed and designed ETL jobs for data loading into Teradata Database. ✓ Worked on data extraction, aggregation, and analysis in HDFS using PySpark and Hive. ✓ Created and deployed SSIS 2005 packages and reports using SSRS 2005. ✓ Developed real-time stream jobs using Spark Streaming and Spark SQL, loaded data into HBase. ✓ Developed frontend and backend modules using Python on Django Web Framework. ✓ Configured database maintenance plans, user management, and access permissions. Show less
-
-
Education
-
Vignana Bharathi Institute of Technology
Bachelor's degree, Electrical and Electronics Engineering