Vinay Karingula

Data Science Specialist at NewMarket Corporation
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
Richmond, Virginia, United States, US

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Chemical Manufacturing
    • 100 - 200 Employee
    • Data Science Specialist
      • Mar 2023 - Present

    • United States
    • Hospitality
    • 700 & Above Employee
    • Cloud/Big Data Engineer
      • Feb 2022 - Feb 2023

      • Gathered all documentation from another team in transition phase about all the project details. • Worked with azure databricks and airflow to maintain and troubleshoot airflow dags and databricks logs. • Documented integration between azure databricks and airflow. • Created couple of spark jobs to maintain data integrity pipelines coming to ADLS from various source systems • and automated with a dag to run on schedule basis. • Developed PySpark script to setup the data pipeline. • Involved in the Design and building of project right from the scratch. • Implement MPI and accompanying interfaces • Implement file and web service interfaces for data exchange • Assisted partner agencies with providing data to a data warehouse • Analyzed data provided by partner agencies to facilitate deduplication • Exported data according to outside vendor specification • Implement Master Patient Index • Worked on Developing DAG, Performance tuning of the DAGs and task implementation • Worked closely with Machine learning engineers to produce the desired output for the customers. • Maintained Inbound and Outbound pipelines for the data transfers from various source systems to target systems • Agile delivery to deliver proof of concept and production implementation in iterative sprints. Show less

    • United States
    • Wellness and Fitness Services
    • 700 & Above Employee
      • Jun 2021 - Jan 2022

      • Worked with AWScloud and created EMRclusters with spark for analyzing raw data processing and access data from S3 buckets.• Integrated the end to end data pipeline to take data from source systems to target data repositories ensuring the quality and consistency of data is maintained at all times• Agile delivery to deliver proof of concept and production implementation in iterative sprints.• Migrated data from TD to TDV • Developed PySpark script to setup the data pipeline.• Worked on developing ETL streams using Databricks.• Create Spark Application to load the data in Athena Tables.• Involved in the Design and building of project right from the scratch.• Migrated data from TD to S3• Designed and created automation Airflow Dagsand parallel execution.• Worked on Developing DAG, Performance tuning of the DAGs and task implementation.Environment:PySpark, Python, AWS Glue, Athena, Teradata, Airflow, S3, Databricks, IAM, SNS. Show less

      • Sep 2020 - May 2021

      • Working as Sr. Data Engineer with Hadoop Ecosystems, Apache Spark, and AWS.• Designing and building robust services using streaming and batch data.• Key contributor in building identity services that will enable to share profiles across the organization in support of marketing and analytics.• Created Hiveschemas using performance techniques like partitioning and bucketing.• Developed analytical components using Kafka and Spark Stream.• Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.• Create Spark Application to load the data in Athena Tables.• Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift• Used JSON schema to define table and column mapping from S3 data to Redshift• Developed PySpark script to setup the data pipeline.• Worked on Spark SQL, created Data frames by loading data from Hivetables and created prep data and stored in AWS S3.• Collaborated with product teams, data analysts and data scientists to design and built data-forward solutions.• Create airflow jobs to workflow of Spark. Show less

    • United States
    • Electric Power Generation
    • 700 & Above Employee
    • Hadoop Developer
      • Oct 2015 - Jun 2018

      • Worked as Java/Hadoop Developer and responsible for taking care of everything related to the clusters. • Responsible for building scalable distributed data solutions using Hadoopcluster environment with Hortonworksdistribution. • Developed Spark scripts by using Python shell commands as per the requirement. • Developed Sparkscripts by writing custom RDDs in Scala and Python for data transformations and actions on RDDs. • Used Spark API over ClouderaHadoopYARN to perform analytics on data in Hive. • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment. • Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs. • Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet. • Involved in converting Hive/SQL queries into Sparktransformations using Spark RDDs, Scala Show less

Education

  • New Jersey Institute of Technology
    Master's degree, Information Science/Studies
    2018 - 2019
  • Sreenidhi Institute of Science and Technology
    Bachelor's degree, Electrical and Electronics Engineering
    2011 - 2015

Community

You need to have a working account to view this content. Click here to join now