Pujan K.
Data Engineer at First Republic- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
-
English Full professional proficiency
-
Hindi Full professional proficiency
-
Nepali Native or bilingual proficiency
Topline Score
Bio
Experience
-
First Republic Bank
-
United States
-
Banking
-
700 & Above Employee
-
Data Engineer
-
Feb 2022 - Present
Designed, built, and maintained data pipelines using Python and SQL for extracting, transforming, and loading data from various sources into data warehouses and data lakes. Deployed and managed data pipelines using Docker containers and Kubernetes, ensuring scalability and reliability of the ETL process. Developed Spark jobs in Python to perform data transformation, creating Data Frames and Spark SQL. Worked on processing un-structured data in JSON format to structured data in parquet format by performing several transformations using Pyspark. Developed Spark applications using spark libraries to perform ETL transformations and thereby eliminating the need for utilizing ETL tools. Monitored the performance of data pipelines and implemented optimizations using Python and SQL to improve performance. Implemented Kafka-based ETL pipelines designing and developing Kafka topics, Kafka producers and consumers, and integrating them with the ETL processes. Worked with data security teams to implement data security measures, such as encryption and access controls, to protect sensitive financial information. Documented the design and implementation of data pipelines, including data dictionaries, flow diagrams, and technical specifications, to ensure knowledge transferability. Worked on cloud integration with AWS using Elastic Mapreduce (EMR), Simple Storage Service (S3), EC2. Design, develop and document micro services and system components consisting of several objects working together to execute a business function of the larger system. Work with the Business and Information Technology team to understand business problems and to design, implement, and deliver a solution using Agile methodology across the larger program. Develop code and test artifacts that reuse subroutines, is well structured, backed by automated tests. Hands on in multiple software paradigms, to support team’s technical infrastructure with project leader. Show less
-
-
-
HMS Holdings Corp (HMSY)
-
United States
-
1 - 100 Employee
-
Data Engineer
-
Aug 2021 - Feb 2022
Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team. Lead the development, implementation and take the responsibilities for setting up Informatica Power Center 10.1 and Informatica BDM tools from inception to production Implemented Informatica BDM mappings for extracting data from Data Warehouse to Data Lake Design and implement various layer of Data lake, Design star schema in Big Query. Using g-cloud function with Python to load data in to Big query for on arrival csv files in GCS bucket. Process and load bound and unbound data from Google pub/sub topic to Big query using cloud Dataflow with Python. Implemented multiple data pipelines DAG’s and Maintenance DAG’s in Airflow orchestration. Designed Pipelines with Apache Beam, KubeFlow, Dataflow and orchestrated jobs into GCP. Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, Big Query, Cloud SQL and Cloud DataProc. Documented the inventory of modules, infrastructure, storage, components of existing On-Prem data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration. Design, development and implementation of performing ETL pipelines using python API (pySpark) of Apache Spark. Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud. Exposure on IAM roles in GCP. Create firewall rules to access Google data procs from other machines. Process and load bound and unbound Data from Google pub/subtopic to Big query using cloud Dataflow with Python. Setup GCP Firewall rules to ingress or egress traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency. Show less
-
-
-
General Motors
-
United States
-
Motor Vehicle Manufacturing
-
700 & Above Employee
-
Data Engineer
-
Jun 2020 - Aug 2021
Worked on implementing scalable infrastructure and platform for large amounts of data ingestion, aggregation, integration, analytics in Hadoop using Spark and Hive. Worked on developing streamlined workflows using high-performance API services dealing with large amounts of structured and unstructured data. Developed Spark jobs in Python to perform data transformation, creating Data Frames and Spark SQL. Worked on processing un-structured data in JSON format to structured data in parquet format by performing several transformations using Pyspark. Developed Spark applications using spark libraries to perform ETL transformations and thereby eliminating the need for utilizing ETL tools. Developed the end-to-end data pipeline in spark using python to ingest, transform and analyses data. Created Hive tables using HiveQL, then loaded the data into Hive tables and analyzed the data by developing Hive queries. Created and executed Unit test cases to validate transformations and process functions are working as expected. Worked on scheduling Control-M workflow engine to run multiple jobs. Written shell scripts to automate application deployments. Implemented solutions to switch schemas based on the dates so that the transformation would be automated. Developed custom functions and UDFs in python to incorporate methods and functionality of Spark. Developed data validation scripts in Hive and Spark and perform validation using Jupiter Notebook by spinning up the query cluster in AWS EMR. Executed Hadoop and Spark jobs on AWS EMR using data stored in Amazon S3. Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations. Worked on Data serialization formats for converting complex objects into sequence bits by using Parquet, Avro, JSON, CSV formats. Show less
-
-
-
Meijer
-
United States
-
Retail
-
700 & Above Employee
-
Data Engineer
-
Apr 2019 - May 2020
Creating and managing nodes that utilize Java jars and python, shell scripts for scheduling jobs to customize Data ingestion. Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS. Developed and performed Sqoop import from Oracle to load the data into HDFS. Created Partitions, Buckets based on State to further process using Bucket based Hive joins. Created Hive tables to store the processed results in a tabular format. Scheduled MapReduce jobs in production environment using Oozie scheduler and Autosys. Developed Kafka producer and brokers for message handling. Imported the data to Hadoop using Kafka and implemented the Oozie job for daily imports. Configured Kafka ingestion pipeline to transmit the logs from web server to Hadoop. Worked with POC’s for stream processing using Apache NIFI. Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI. Analyzed Hadoop logs using Pig scripts to oversee the errors caused by the team. Performed MySQL queries for efficient retrieval of ingested data using MySQL workbench. Implemented data ingestion and transformation using automated workflows using Oozie. Created and generated audit reports to notify security threat and track all user activity using various Hadoop components. Designed various plots showing HDFS analytics and other operations performed on the environment. Worked with Infra team for testing the environment after patches, upgrades and migration. Developed multiple Java scripts for delivering end-to-end support while maintaining product integrity. Show less
-
-
-
RBS
-
United Kingdom
-
Banking
-
700 & Above Employee
-
Hadoop Developer
-
Mar 2017 - Aug 2018
Installed and configured Hadoop MapReduce, HDFS Importing and exporting data into HDFS and Hive using Sqoop. Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files. Extracted files from MongoDB through Sqoop and placed in HDFS and processed. Experienced in running Hadoop streaming jobs to process terabytes of xml format data. Load and transform large sets of structured, semi structured, and unstructured data. Responsible to manage data coming from different sources. Supported Map Reduce Programs those are running on the cluster Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and written Hive UDFs. Involved in creating Hive tables, loading with data, and writing Hive queries which will run internally in map reduce way. Conducted functional, system, data, and regression testing. Involved in Bug Review meetings and participated in weekly meetings with management team. Show less
-
-
-
Rimini Street
-
United States
-
IT Services and IT Consulting
-
700 & Above Employee
-
Junior Data Engineer
-
Apr 2015 - Feb 2017
Designed and implemented code changes in existing modules - Java, python, shell-scripts for enhancement. Designed User Interface and the business logic for customer registration and maintenance. Integrating Web services and working with data in different servers. Involved in designing and Development of SOA services using Web Services. Created, developed, modified, and maintained database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources. Extracted data from various location and load them into the oracle table using SQL*LOADER. Developed PL/SQL Procedures and UNIX Scripts for Automation of UNIX jobs and running files in batch mode. Using Informatica Power Center Designer, analyzed the source data to Extract, transform from various source systems (Oracle, SQL server and flat files) by incorporating business rules using different objects and functions that the tool supports. Used Oracle OLTP databases as one of the main sources for ETL processing. Managed ETL process by pulling large volume of data from various data sources using BCP in staging database from MS access and excel. Responsible for detecting errors in ETL operation and rectify them. Incorporated Error Redirection during ETL load in SSIS packages. Implemented various types of SSIS transformations in packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc. Implemented the Master Child Package Technique to manage big ETL Projects efficiently. Involved in Unit testing and System Testing of ETL Process. Show less
-
-
Education
-
University of Arizona
Master's degree, Industrial Engineering -
Visvesvaraya Technological University
Bachelor’s Degree, Mechanical Engineering