Reethy Reddy

Senior Big Data Engineer at Drug Plastics and Glass Co
  • Claim this Profile
Contact Information
Location
Littleton, Massachusetts, United States, US

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Packaging & Containers
    • 1 - 100 Employee
    • Senior Big Data Engineer
      • Apr 2021 - Present

      • Developed Apache Spark applications by using Scala for data processing from various streaming sources. • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to DynamoDB using Scala. • Created a set of data classifiers that reads from DynamoDB and classifies the features into bins and stores them in DynamoDB. • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS. • Used different stages of Datastage Designer like Lookup, Join, Merge, Funnel, Filter, Copy, Aggregator, and Sort etc. • Developed and maintained batch data flow using HiveQL and Unix scripting • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python. • Worked with developer teams on NiFi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka. • Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. • Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala. • Developed a python script to transfer data, REST API’s and extract data from on-premises to AWS S3. Implemented Micro Services based Cloud Architecture using Spring Boot. • Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWSGlue and Step Functions. • Developed applications using Java that reads data from MSK (kafka) and writes it to Dynamo DB. • Developed applications that leverages step functions and cloud watch event triggers to fetch data and generate features from that data. • Involved in creating research data-lake by extracting customer's data from various data sources to S3 which include data from Excel, databases, and log data from servers Show less

    • Big Data Engineer
      • Jun 2019 - Mar 2021

      • Stored data in AWS S3 like HDFS and performed EMR programs on data stored. • Used the AWS-CLI to suspend an AWS Lambda function. Used AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS. • Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers. • Developed HIVE UDFs to incorporate external business logic into Hivescript and Developed join data set scripts using HIVE join operations. • Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics. • Migrated Map reduce jobs to Spark jobs to achieve better performance. • Working on designing the MapReduce and Yarn flow and writing MapReduce scripts, performance tuning and debugging. • Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API. • Implemented the machine learning algorithms using python to predict the quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 data lake. • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. • Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects. • Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora. • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. • Implemented AWS provides a variety of computing and networking services to meet the needs of applications Show less

    • United States
    • Hospitals and Health Care
    • 700 & Above Employee
    • Big Data Engineer
      • Feb 2017 - May 2019

      • Worked in Azure environment for development and deployment of Custom Hadoop Applications. • Responsible to manage data coming from different sources through Kafka. • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Data bricks. • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles. • Deployed the initial Azure components like Azure Virtual Networks, Azure Application Gateway, Azure Storage and Affinity groups. • Working in big data technologies like spark, Scala, Hive, Hadoop cluster (Cloudera platform). • Making a data pipelining with help Data Fabric job SQOOP, SPARK, Scala and KAFKA. Parallel working in data side oracle and MYSQL server for data designing to source to target. • Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required. • Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics. • Analyzed Teradata procedure and imported all the data from Teradata to My SQL Database for Hive QL queries information for developing Hive Queries which consist of UDF’s where we don’t have some of the default functions in Hive. • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems. Managed and reviewed Hadoop log files. Show less

    • India
    • IT Services and IT Consulting
    • 1 - 100 Employee
    • Data Engineer
      • Aug 2014 - Nov 2016

      • Created pipelines to create a processing pipeline including transformations, estimations, evaluation of analytical models. • Performed tuning of SQL queries and Stored Procedure for speedy extraction of data to resolve and troubleshoot issues in OLTP environment. • Worked on Oozie workflow engine for job scheduling. • Worked with Avro Data Serialization system to work with JSON data formats. • Worked on various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins etc. • Performed pre-processing on a dataset prior to training, including standardization, normalization. • Worked with heterogeneous source to Extracted data from Oracle database, XML and flat files and loaded to a relational Oracle warehouse. • Hive Context, with transformations and actions (map, flat Map, filter, reduce, reduce by Key). • Developed PIG scripts for the analysis of semi structured data. • Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS. • Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs. • Developed PIG UDF'S for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders. • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. • Evaluated model accuracy by dividing data into training and test datasets and computing metrics using evaluators. Show less

    • SQL Developer
      • Oct 2013 - Aug 2014

      • Used SSIS to create ETL packages (dtsx files) to validate, extract, transform and load data to data warehouse, data mart databases, and process SSAS cubes to store data to OLAP databases • Scheduled and maintain packages by daily, weekly and monthly using SQL Server Agent in SSMS. • Modified and maintained SQL Server stored procedures, views, ad-hoc queries, and SSIS packages used in the search engine optimization process. • Updated existing and created new reports using Microsoft SQL Server Reporting Services. Team consisted of 2 developers. • Performed unit tests on all code and packages. • Improved the performance of long running views and stored procedures. • Developed complex parameterized reports which were used for making current and future business decisions. • Performed and automated SQL Server version upgrades, patch installs and maintained relational databases. • Performed front line code reviews for other development teams. • Created and modified various Stored Procedures used in the application using T-SQL. • Monitored and tuned database resources and activities for SQL Server databases. Show less

Community

You need to have a working account to view this content. Click here to join now