Sarada Kurapati
Big Data Engineer at LendingClub- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
Topline Score
Bio
Experience
-
LendingClub
-
United States
-
Financial Services
-
700 & Above Employee
-
Big Data Engineer
-
Aug 2021 - Present
• Developed Spark Applications to implement various data cleansing/validation and processing activity of large-scale datasets ingested from traditional data warehouse systems. • Migrated the existing on-premises applications and scripts from Java code to Cloud based platform - Azure Cloud storage. • Using PySpark to process and analyze large datasets in a distributed manner. • Writing PySpark scripts to perform complex data transformations and aggregations for improving performance. • Optimizing PySpark applications using techniques such as caching, broadcast variables, and partitioning • Managing and monitoring Azure resources such as virtual machines, storage accounts, and network resources to ensure high availability and performance. • Integrating Azure services such as Azure Stream Analytics and Azure Machine Learning to build real-time data processing pipelines. • Designed and developed scalable big data solutions using Hadoop and Azure Cloud technologies. • Implemented data processing pipelines using Apache Spark • Designed and developed data pipelines using Hive, Pig, and Spark to transform and analyze large datasets for real-time data processing. • Maintained and optimized Hadoop clusters for high availability and performance. • Worked with Databricks and Oozie jobs to automate and schedule Hadoop workflows, resulting in a 25% increase in operational efficiency. • Conducted performance tuning and capacity planning to optimize Hadoop clusters for various workloads. • Defining workflows using Oozie to automate Hadoop jobs and other big data applications. • Creating Oozie coordinators to schedule and manage multiple workflows. • Monitoring and troubleshooting Oozie workflows to ensure successful completion of jobs. • Collaborating with data scientists and analysts to understand their data needs and developing solutions to meet those needs using Hadoop and related technologies. • Writing complex Hive queries to extract data from large datasets for analysis. Show less
-
-
-
Cloudflare
-
United States
-
Computer and Network Security
-
700 & Above Employee
-
Big Data Engineer
-
Sep 2019 - Jul 2021
• Worked with Spark to create structured data from a pool of unstructured data received. • Implemented advanced procedures such as text analytics and processing using in-memory computing capabilities such as Apache Spark written in Scala. • Implemented Spark using Scala and Spark SQL for faster testing and processing of data. • Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala. • Documented requirements, including the available code which should be implemented using Spark, Hive, HDFS and Elastic Search. • Maintained ELK (Elastic Search, Kibana) and wrote Spark scripts using Scala shell. • Implemented Spark using Scala and utilized DataFrames and Spark SQL API for faster processing of data. • Developed Spark Streaming applications to consume data from Kafka topics and insert the processed streams to HBase. • Provided a continuous discretized DStream of data with a high level of abstraction with Spark. • Structured Steaming. • Moved transformed data to Spark cluster where the data is set to go live on the application using Kafka. • Created a Kafka producer to connect to different external sources and bring the data to a Kafka broker. • Handled schema changes in data stream using Kafka. • Developed new Flume agents to extract data from Kafka. • Created a Kafka broker in structured streaming to get structured data by schema. • Analyzed and tuned Cassandra data model for multiple internal projects and worked with analysts to model Cassandra tables from business rules and enhance/optimize existing tables. • Designed and deployed new ELK clusters. • Created log monitors and generated visual representations of logs using ELK stack. • Implemented CI/CD tools Upgrade, Backup, and Restore. • Played a key role installing and configuring various Big Data ecosystem tools such as Elastic Search, Logstash, Kibana, Kafka, and Cassandra. • Reviewed functional and non-functional requirements on the Hortonworks Hadoop project. Show less
-
-
-
Merck
-
United States
-
Pharmaceutical Manufacturing
-
700 & Above Employee
-
Big Data Engineer
-
Dec 2018 - Aug 2019
• Experience in Data ingestion from sources like MySQL, Oracle and CSV files. • Experience in using Apache Sqoop to import and export data to from HDFS and external RDBMS databases, MYSQL and CSV files. • Implemented Spark using Scala and Spark SQL for faster testing and processing of data. • Worked with Spark and Scala mainly in Claims Invoice Ingestion Framework exploration for transition from Hadoop/MapReduce to Spark. • Write Spark jobs to transform the data to calculate and group the Vendor payment status on HDFS and store it in hive tables / Kafka topics. • Spark transformations are performed using data frames and Spark SQL. • Business Reports generated from the data stored in Hive tables to display in the dashboard. • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDD's, Spark YARN. • Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it. • Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS. • Worked in Agile development approach. • Developed a strategy for Full load and incremental load using Sqoop • Implemented POC to migrate map reduce jobs into Spark RDD transformations. Tools Used: Python, Teradata, Netezza, Oracle 12c, PySpark, SQL Server, UML, MS Visio, Oracle Designer, SQL Server 2012, Cassandra, Azure Show less
-
-
-
Vertex Computer Systems
-
United States
-
IT Services and IT Consulting
-
200 - 300 Employee
-
Hadoop Developer
-
Apr 2017 - Nov 2018
• Consumed rest APIs and wrote source code so it could be used for the Kafka program. Worked on various real-time and batch processing applications using Spark/Scala, Kafka and Cassandra. • Built Spark applications to perform data enrichments and transformations using Spark Data frames with Cassandra lookups. • Used Data Stax Spark Cassandra Connector to extract and load data to/from Cassandra. Worked in a team to develop an ETL pipeline that involved extraction of Parquet serialized files from S3 and persisted them in HDFS. • Developed Spark application that uses Kafka Consumer and Broker libraries to connect to Apache Kafka and consume data from the topics and ingest them into Cassandra. • Developed applications involving Big Data technologies such as Hadoop, Spark, Map Reduce, Yarn, Hive, Pig, Kafka, Oozie, Sqoop, and Hue. • Worked on Apache Airflow, Apache Oozie, and Azkaban. Designed and implemented data ingestion framework to load data into the data lake for analytical purposes. • Developed data pipelines using Hive, Pig, and MapReduce. • Wrote Map Reduce jobs. • Administered clusters in the Hadoop ecosystem. • Installed and configured the Hadoop Cluster of Major Hadoop Distributions. • Designed the reporting application that uses the Spark SQL to fetch and generate reports on Hive. • Analyzed data using Hive and wrote User Defined Functions (UDFs). • Used AWS services like EC2 and S3 for small data sets processing and storage. • Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets. Tools Used: Spark, Scala, AWS, DBeaver, Zeppelin, SSIS, Cassandra, Workspace, c# scripting. Show less
-
-
-
Thrymr Software
-
India
-
IT Services and IT Consulting
-
100 - 200 Employee
-
Java Developer
-
Feb 2015 - Mar 2017
• Implemented Multi-Threaded Environment and used most of the interfaces under the Collection framework by using Core Java concepts. • Involved in developing code using major concepts of Spring Framework Dependency Injection (DI) and Inversion of control (IOC). • sed Spring MVC framework for implementing RESTful web services so that complexity of integration will be reduced, and maintenance will be quite easy. • Used Bootstrap to create responsive web pages which can be displayed properly in different screen sizes. • Used GIT as version control tool to update work progress and attended daily Scrum sessions. • Developed Interactive web pages using Angular, HTML5, CSS and JavaScript. • Build REST web service by building Server in the backend to handle requests sent from the front-end. • Involved in Stored Procedures, User Defined functions, Views and implemented the Error Handling in the Stored Procedures and SQL objects and modified already existing stored. • Functionalities include writing code in HTML, CSS, JavaScript, JSON, Bootstrap with MySQL Database as the backend. • Involved in design and development of a user-friendly enterprise application using Java, Spring, Hibernate, Web services, Eclipse. • Developed and enhanced the application using Java and J2EE (Servlets, JSP, JDBC, JNDI, EJB), Web Services (RESTful Web Services), HTML, JSON, XML, Maven and MySQL DB. • Used GIT as source control management giving a huge speed advantage on centralized systems that must communicate with a server. Tools Used : Java/J2EE, core java, Spring, Hibernate, GIT, MySQL database, Maven, RESTful Web Services, HTML, HTML5, CSS, JavaScript, Bootstrap, JSON, XML Show less
-
-
Education
-
KL University
Bachelor's degree, Electrical and Electronics Engineering