Pujan K.

Data Engineer at First Republic
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
United States, US
Languages
  • English Full professional proficiency
  • Hindi Full professional proficiency
  • Nepali Native or bilingual proficiency

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Banking
    • 700 & Above Employee
    • Data Engineer
      • Feb 2022 - Present

       Designed, built, and maintained data pipelines using Python and SQL for extracting, transforming, and loading data from various sources into data warehouses and data lakes.  Deployed and managed data pipelines using Docker containers and Kubernetes, ensuring scalability and reliability of the ETL process.  Developed Spark jobs in Python to perform data transformation, creating Data Frames and Spark SQL.  Worked on processing un-structured data in JSON format to structured data in parquet format by performing several transformations using Pyspark.  Developed Spark applications using spark libraries to perform ETL transformations and thereby eliminating the need for utilizing ETL tools.  Monitored the performance of data pipelines and implemented optimizations using Python and SQL to improve performance.  Implemented Kafka-based ETL pipelines designing and developing Kafka topics, Kafka producers and consumers, and integrating them with the ETL processes.  Worked with data security teams to implement data security measures, such as encryption and access controls, to protect sensitive financial information.  Documented the design and implementation of data pipelines, including data dictionaries, flow diagrams, and technical specifications, to ensure knowledge transferability.  Worked on cloud integration with AWS using Elastic Mapreduce (EMR), Simple Storage Service (S3), EC2.  Design, develop and document micro services and system components consisting of several objects working together to execute a business function of the larger system.  Work with the Business and Information Technology team to understand business problems and to design, implement, and deliver a solution using Agile methodology across the larger program.  Develop code and test artifacts that reuse subroutines, is well structured, backed by automated tests.  Hands on in multiple software paradigms, to support team’s technical infrastructure with project leader. Show less

    • United States
    • 1 - 100 Employee
    • Data Engineer
      • Aug 2021 - Feb 2022

       Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.  Lead the development, implementation and take the responsibilities for setting up Informatica Power Center 10.1 and Informatica BDM tools from inception to production  Implemented Informatica BDM mappings for extracting data from Data Warehouse to Data Lake  Design and implement various layer of Data lake, Design star schema in Big Query.  Using g-cloud function with Python to load data in to Big query for on arrival csv files in GCS bucket.  Process and load bound and unbound data from Google pub/sub topic to Big query using cloud Dataflow with Python.  Implemented multiple data pipelines DAG’s and Maintenance DAG’s in Airflow orchestration.  Designed Pipelines with Apache Beam, KubeFlow, Dataflow and orchestrated jobs into GCP.  Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, Big Query, Cloud SQL and Cloud DataProc.  Documented the inventory of modules, infrastructure, storage, components of existing On-Prem data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration.  Design, development and implementation of performing ETL pipelines using python API (pySpark) of Apache Spark.  Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud.  Exposure on IAM roles in GCP.  Create firewall rules to access Google data procs from other machines.  Process and load bound and unbound Data from Google pub/subtopic to Big query using cloud Dataflow with Python.  Setup GCP Firewall rules to ingress or egress traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency. Show less

    • United States
    • Motor Vehicle Manufacturing
    • 700 & Above Employee
    • Data Engineer
      • Jun 2020 - Aug 2021

       Worked on implementing scalable infrastructure and platform for large amounts of data ingestion, aggregation, integration, analytics in Hadoop using Spark and Hive.  Worked on developing streamlined workflows using high-performance API services dealing with large amounts of structured and unstructured data.  Developed Spark jobs in Python to perform data transformation, creating Data Frames and Spark SQL.  Worked on processing un-structured data in JSON format to structured data in parquet format by performing several transformations using Pyspark.  Developed Spark applications using spark libraries to perform ETL transformations and thereby eliminating the need for utilizing ETL tools.  Developed the end-to-end data pipeline in spark using python to ingest, transform and analyses data.  Created Hive tables using HiveQL, then loaded the data into Hive tables and analyzed the data by developing Hive queries.  Created and executed Unit test cases to validate transformations and process functions are working as expected.  Worked on scheduling Control-M workflow engine to run multiple jobs.  Written shell scripts to automate application deployments.  Implemented solutions to switch schemas based on the dates so that the transformation would be automated.  Developed custom functions and UDFs in python to incorporate methods and functionality of Spark.  Developed data validation scripts in Hive and Spark and perform validation using Jupiter Notebook by spinning up the query cluster in AWS EMR.  Executed Hadoop and Spark jobs on AWS EMR using data stored in Amazon S3.  Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.  Worked on Data serialization formats for converting complex objects into sequence bits by using Parquet, Avro, JSON, CSV formats. Show less

    • United States
    • Retail
    • 700 & Above Employee
    • Data Engineer
      • Apr 2019 - May 2020

       Creating and managing nodes that utilize Java jars and python, shell scripts for scheduling jobs to customize Data ingestion.  Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.  Developed and performed Sqoop import from Oracle to load the data into HDFS.  Created Partitions, Buckets based on State to further process using Bucket based Hive joins.  Created Hive tables to store the processed results in a tabular format.  Scheduled MapReduce jobs in production environment using Oozie scheduler and Autosys.  Developed Kafka producer and brokers for message handling.  Imported the data to Hadoop using Kafka and implemented the Oozie job for daily imports.  Configured Kafka ingestion pipeline to transmit the logs from web server to Hadoop.  Worked with POC’s for stream processing using Apache NIFI.  Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI.  Analyzed Hadoop logs using Pig scripts to oversee the errors caused by the team.  Performed MySQL queries for efficient retrieval of ingested data using MySQL workbench.  Implemented data ingestion and transformation using automated workflows using Oozie.  Created and generated audit reports to notify security threat and track all user activity using various Hadoop components.  Designed various plots showing HDFS analytics and other operations performed on the environment.  Worked with Infra team for testing the environment after patches, upgrades and migration.  Developed multiple Java scripts for delivering end-to-end support while maintaining product integrity. Show less

    • United Kingdom
    • Banking
    • 700 & Above Employee
    • Hadoop Developer
      • Mar 2017 - Aug 2018

       Installed and configured Hadoop MapReduce, HDFS  Importing and exporting data into HDFS and Hive using Sqoop.  Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files.  Extracted files from MongoDB through Sqoop and placed in HDFS and processed.  Experienced in running Hadoop streaming jobs to process terabytes of xml format data.  Load and transform large sets of structured, semi structured, and unstructured data.  Responsible to manage data coming from different sources.  Supported Map Reduce Programs those are running on the cluster  Involved in loading data from UNIX file system to HDFS.  Installed and configured Hive and written Hive UDFs.  Involved in creating Hive tables, loading with data, and writing Hive queries which will run internally in map reduce way.  Conducted functional, system, data, and regression testing.  Involved in Bug Review meetings and participated in weekly meetings with management team. Show less

    • United States
    • IT Services and IT Consulting
    • 700 & Above Employee
    • Junior Data Engineer
      • Apr 2015 - Feb 2017

       Designed and implemented code changes in existing modules - Java, python, shell-scripts for enhancement.  Designed User Interface and the business logic for customer registration and maintenance.  Integrating Web services and working with data in different servers.  Involved in designing and Development of SOA services using Web Services.  Created, developed, modified, and maintained database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources.  Extracted data from various location and load them into the oracle table using SQL*LOADER.  Developed PL/SQL Procedures and UNIX Scripts for Automation of UNIX jobs and running files in batch mode.  Using Informatica Power Center Designer, analyzed the source data to Extract, transform from various source systems (Oracle, SQL server and flat files) by incorporating business rules using different objects and functions that the tool supports.  Used Oracle OLTP databases as one of the main sources for ETL processing.  Managed ETL process by pulling large volume of data from various data sources using BCP in staging database from MS access and excel.  Responsible for detecting errors in ETL operation and rectify them.  Incorporated Error Redirection during ETL load in SSIS packages.  Implemented various types of SSIS transformations in packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.  Implemented the Master Child Package Technique to manage big ETL Projects efficiently.  Involved in Unit testing and System Testing of ETL Process. Show less

Education

  • University of Arizona
    Master's degree, Industrial Engineering
    2018 - 2020
  • Visvesvaraya Technological University
    Bachelor’s Degree, Mechanical Engineering
    2011 - 2015

Community

You need to have a working account to view this content. Click here to join now