Sumith Suresh

Lead/Senior Data Engineer at Citizen Bank
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
Charlotte, North Carolina, United States, US

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Financial Services
    • 1 - 100 Employee
    • Lead/Senior Data Engineer
      • Sep 2021 - Present

      ✓ Developed Spark applications in Python for Apache Spark data processing, handling data from RDBMS and streaming sources. ✓ Designed scalable distributed data solutions with Hadoop and maintained Hadoop clusters on GCP using Google Cloud Storage, BigQuery, and DataProc. ✓ Created data pipelines in GCP using Apache Airflow, incorporating various operators for ETL jobs. ✓ Utilized GCP services for migrating on-premises applications to AWS, ensuring smooth data transfer and integration. ✓ Implemented Spark Streaming and utilized Kafka for real-time data processing and capturing UI updates from XML messages. ✓ Worked with various big data APIs and frameworks including Spark RDD, Data Frame API, Data Set API, Data Source API, Spark SQL, and Spark Streaming. ✓ Developed ETL programs in Netezza to load and transform data into NoSQL databases and MySQL, ensuring regular updates. ✓ Applied Hive QL for partitioning and bucketing data, running queries on Parquet tables, and leveraged Cassandra for distributed data storage and management. Show less

    • United States
    • Computers and Electronics Manufacturing
    • 700 & Above Employee
    • Senior Data Engineer
      • Mar 2019 - Dec 2020

      ✓ Implemented multi-node cloud cluster setup on AWS EC2 and handled AWS Management Tools (CloudWatch, CloudTrail). ✓ Developed Spark applications using Spark DataFrames and Spark SQL API for batch processing. ✓ Designed and implemented GCP data solutions for enterprise data warehouse and data lakes. ✓ Built data pipelines in GCP using Airflow for ETL jobs with various operators. ✓ Worked on real-time data movement using Spark Structured Streaming and Kafka. ✓ Migrated REST APIs from AWS Lambda and API Gateway to Microservices architecture using Docker and Kubernetes (GCP GKE). ✓ Deployed Spark jobs on GCP DataProc clusters and utilized Cloudera Hadoop on GCP and AWS. ✓ Extensive experience with GitLab CI/CD, Jenkins, Terraform, and integration of Java, Spring Boot, Hibernate, and databases like MongoDB and MySQL. Show less

    • United States
    • Hospitals and Health Care
    • 700 & Above Employee
    • Data Engineer
      • Feb 2017 - Mar 2019

      ✓ Gathered data and business requirements, designed data migration solutions, and performed data validation using HIVE, SQL, and Tableau/Power Bi/Cognos. ✓ Automated and orchestrated tasks with AWS Step Functions and integrated Apache Airflow with AWS for monitoring ML workflows. ✓ Developed PL/SQL statements for database operations and utilized indexing, aggregation, and materialized views for query performance. ✓ Performed statistical analysis using SQL, Python, R Programming, and Excel, including Excel VBA Macros and Microsoft Access Forms. ✓ Extracted, transformed, and loaded data from transaction systems using Python, SAS, and R programming. ✓ Collaborated with AWS for data storage, handling terabytes of data for customer BI reporting tools. ✓ Implemented data ingestion using Sqoop and HDFS, and worked with dimensional modeling and slowly changing dimensions. ✓ Used Apache Airflow for data pipeline authoring, scheduling, and monitoring. Show less

    • India
    • IT Services and IT Consulting
    • 700 & Above Employee
    • Data Engineer
      • May 2015 - Dec 2016

      ✓ Responsible for end-to-end software development lifecycle, including requirements analysis, design, coding, testing, maintenance, and support. ✓ Developed stored procedures, triggers, packages, and SQL scripts based on requirements. ✓ Created complex SQL queries using views, subqueries, and correlated subqueries. ✓ Conducted architecture and implementation assessments of AWS services such as Amazon EMR, Redshift, and S3. ✓ Automated data loading and preprocessing tasks with Oozie workflows, Pig, and HiveQL. ✓ Utilized Zookeeper for cluster coordination and worked with OOZIE workflows in the Cloudera environment. ✓ Implemented Kafka consumer API for data consumption and used Sqoop for exporting analyzed data to relational databases. ✓ Proficient in CICD pipelines using Terraform, Docker containers, and container orchestration tools like EC2 Container Service and Kubernetes. Show less

    • India
    • Biotechnology Research
    • 700 & Above Employee
    • Hadoop Developer
      • Mar 2013 - May 2015

      ✓ Installed and configured SQL Server 2005, worked on development and optimization of a Loans database. ✓ Built scalable distributed data solutions using Hadoop, installed and configured Hive, Pig, Sqoop, and Oozie. ✓ Developed and designed ETL jobs for data loading into Teradata Database. ✓ Worked on data extraction, aggregation, and analysis in HDFS using PySpark and Hive. ✓ Created and deployed SSIS 2005 packages and reports using SSRS 2005. ✓ Developed real-time stream jobs using Spark Streaming and Spark SQL, loaded data into HBase. ✓ Developed frontend and backend modules using Python on Django Web Framework. ✓ Configured database maintenance plans, user management, and access permissions. Show less

Education

  • Vignana Bharathi Institute of Technology
    Bachelor's degree, Electrical and Electronics Engineering
    2009 - 2013

Community

You need to have a working account to view this content. Click here to join now