Mohit K
Senior Data Engineer at Texas Medicaid & Healthcare Partnership- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
Topline Score
Bio
Experience
-
Texas Medicaid & Healthcare Partnership
-
Hospitals and Health Care
-
1 - 100 Employee
-
Senior Data Engineer
-
Sep 2021 - Present
• Performed Inventory Planning and Retail data processing on AWS S3 using Spark, performed transformations and applied rules and look ups as per business requirements • Experience in ETL jobs and developing and managing data pipelines • Experience in integrating and scheduling Airflow Dag building for data pipelines for real time data integration • Experience in writing hive queries performing partitioning on top of S3 location and tuning joins on hive tables for optimized… Show more • Performed Inventory Planning and Retail data processing on AWS S3 using Spark, performed transformations and applied rules and look ups as per business requirements • Experience in ETL jobs and developing and managing data pipelines • Experience in integrating and scheduling Airflow Dag building for data pipelines for real time data integration • Experience in writing hive queries performing partitioning on top of S3 location and tuning joins on hive tables for optimized performance • Worked on reading multiple data formats like CSV, JSON, Parquet on HDFS using PySpark And used as lookup tables and applied coalesce • Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor • Optimizing the Hive Queries using the various files format like PARQUET, JSON, CSV • Involved in sprint planning process and standup meetings under the Agile Scrum methodology • Experience in creating hive external table with Partitions, bucketing concepts in Hive • Automated end to end data processing pipelines and scheduled various data workflows • Scheduling Spark jobs using Airflow workflow in Hadoop Cluster • Actively Participated and scheduled calls with Product Owner and BSA’s to gather requirements • Experience in Git and Jenkins to push and deploy code • Persisting data in S3 using kinesis Firehouse, VPC, EMR, Lambda and Cloud Watch. • Mongo DB persistence and maintenance. • EAP (Enterprise Application platform) Maintenance and persistence of real time and batch data. • Tokenization Implementation using DTAAS, Previtaar for supporting REST Level, file level, field level encryption. • Kafka Connects: AWS S3 Sink Connect, Salesforce Connect, Splunk Connect and HDFS Connect • Developed spark applications in PySpark for inventory analytics purpose and loaded data to table for Data science team to perform analysis and make charts • Worked on building End to end data pipelines using spark and storing final data into . Show less • Performed Inventory Planning and Retail data processing on AWS S3 using Spark, performed transformations and applied rules and look ups as per business requirements • Experience in ETL jobs and developing and managing data pipelines • Experience in integrating and scheduling Airflow Dag building for data pipelines for real time data integration • Experience in writing hive queries performing partitioning on top of S3 location and tuning joins on hive tables for optimized… Show more • Performed Inventory Planning and Retail data processing on AWS S3 using Spark, performed transformations and applied rules and look ups as per business requirements • Experience in ETL jobs and developing and managing data pipelines • Experience in integrating and scheduling Airflow Dag building for data pipelines for real time data integration • Experience in writing hive queries performing partitioning on top of S3 location and tuning joins on hive tables for optimized performance • Worked on reading multiple data formats like CSV, JSON, Parquet on HDFS using PySpark And used as lookup tables and applied coalesce • Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor • Optimizing the Hive Queries using the various files format like PARQUET, JSON, CSV • Involved in sprint planning process and standup meetings under the Agile Scrum methodology • Experience in creating hive external table with Partitions, bucketing concepts in Hive • Automated end to end data processing pipelines and scheduled various data workflows • Scheduling Spark jobs using Airflow workflow in Hadoop Cluster • Actively Participated and scheduled calls with Product Owner and BSA’s to gather requirements • Experience in Git and Jenkins to push and deploy code • Persisting data in S3 using kinesis Firehouse, VPC, EMR, Lambda and Cloud Watch. • Mongo DB persistence and maintenance. • EAP (Enterprise Application platform) Maintenance and persistence of real time and batch data. • Tokenization Implementation using DTAAS, Previtaar for supporting REST Level, file level, field level encryption. • Kafka Connects: AWS S3 Sink Connect, Salesforce Connect, Splunk Connect and HDFS Connect • Developed spark applications in PySpark for inventory analytics purpose and loaded data to table for Data science team to perform analysis and make charts • Worked on building End to end data pipelines using spark and storing final data into . Show less
-
-
-
The College Board
-
United States
-
Education Administration Programs
-
700 & Above Employee
-
Big Data Engineer
-
Feb 2020 - Aug 2021
• Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements • Experience in integrating Apache Kafka with Hdfs and S3 data pipelines for real time data • Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables • Worked on building ETL process using spark and storing final data into Datawarehouse solution snowflake • Involved in designing Hive schemas, designing, and developing… Show more • Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements • Experience in integrating Apache Kafka with Hdfs and S3 data pipelines for real time data • Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables • Worked on building ETL process using spark and storing final data into Datawarehouse solution snowflake • Involved in designing Hive schemas, designing, and developing normalized and denormalized data models • Worked on implementing data lake and responsible for data management in Data Lake. • Developed Ruby Script to map the data to the production environment. • Developed Hive queries and Sqooped data from RDBMS to data lake staging area • Handled data stored in warehouse and made external tables utilizing Hive and created scripts to ingest data to tables that can be reused across the project • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process • Involved in planning process of iterations in Agile methodology • Experience with Partitions, bucketing concepts in Hive and designed both using Managed and External tables in Hive • Experience in partitioning/bucketing for hive tables and memory management of driver and executor for tuning the jobs • Worked in developing Pig scripts to create the relationship between the data present in the Hadoop cluster. • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of spark using Python and Scala. • Working closely with Data science team and understand the requirement clearly and create hive table on HDFS • Automated data processing flow pipelines and scheduled various data flow jobs • Scheduling Spark jobs using Oozie workflow in Hadoop Cluster • Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them. Show less • Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements • Experience in integrating Apache Kafka with Hdfs and S3 data pipelines for real time data • Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables • Worked on building ETL process using spark and storing final data into Datawarehouse solution snowflake • Involved in designing Hive schemas, designing, and developing… Show more • Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements • Experience in integrating Apache Kafka with Hdfs and S3 data pipelines for real time data • Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables • Worked on building ETL process using spark and storing final data into Datawarehouse solution snowflake • Involved in designing Hive schemas, designing, and developing normalized and denormalized data models • Worked on implementing data lake and responsible for data management in Data Lake. • Developed Ruby Script to map the data to the production environment. • Developed Hive queries and Sqooped data from RDBMS to data lake staging area • Handled data stored in warehouse and made external tables utilizing Hive and created scripts to ingest data to tables that can be reused across the project • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process • Involved in planning process of iterations in Agile methodology • Experience with Partitions, bucketing concepts in Hive and designed both using Managed and External tables in Hive • Experience in partitioning/bucketing for hive tables and memory management of driver and executor for tuning the jobs • Worked in developing Pig scripts to create the relationship between the data present in the Hadoop cluster. • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of spark using Python and Scala. • Working closely with Data science team and understand the requirement clearly and create hive table on HDFS • Automated data processing flow pipelines and scheduled various data flow jobs • Scheduling Spark jobs using Oozie workflow in Hadoop Cluster • Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them. Show less
-
-
-
Virginia Credit Union
-
United States
-
Banking
-
400 - 500 Employee
-
Data Engineer
-
Aug 2017 - Jan 2020
• Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase • Worked in Agile methodology and actively participated in standup calls, PI planning and work reported in Rally • Involved in Requirement gathering and prepared the Design documents • Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries • Handled importing… Show more • Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase • Worked in Agile methodology and actively participated in standup calls, PI planning and work reported in Rally • Involved in Requirement gathering and prepared the Design documents • Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries • Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other operations • Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project • Developed Databricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure data lake storage. • Created Azure Data Factory and managing policies for Data Factory and Utilized Blob storage for storage and backup on Azure. • Extensive knowledge in migrating applications from internal data storage to Azure. • Hive and Spark tuning with partitioning/bucketing of Parquet and executor’s memory • Developed Hive queries and Sqoop data from RDBMS to Hadoop staging area • Developed dataflows and processes for the Data processing using SQL (SparkSQL & Data frames) • Designed and developed Map Reduce (hive) programs to analyze & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact on flight historical data • Involved in planning process of iterations under the Agile Scrum methodology • Scheduling Spark jobs using Oozie workflow in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations. Show less • Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase • Worked in Agile methodology and actively participated in standup calls, PI planning and work reported in Rally • Involved in Requirement gathering and prepared the Design documents • Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries • Handled importing… Show more • Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase • Worked in Agile methodology and actively participated in standup calls, PI planning and work reported in Rally • Involved in Requirement gathering and prepared the Design documents • Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries • Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other operations • Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project • Developed Databricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure data lake storage. • Created Azure Data Factory and managing policies for Data Factory and Utilized Blob storage for storage and backup on Azure. • Extensive knowledge in migrating applications from internal data storage to Azure. • Hive and Spark tuning with partitioning/bucketing of Parquet and executor’s memory • Developed Hive queries and Sqoop data from RDBMS to Hadoop staging area • Developed dataflows and processes for the Data processing using SQL (SparkSQL & Data frames) • Designed and developed Map Reduce (hive) programs to analyze & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact on flight historical data • Involved in planning process of iterations under the Agile Scrum methodology • Scheduling Spark jobs using Oozie workflow in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations. Show less
-
-
-
Tulasi Technologies Pvt Ltd
-
India
-
IT Services and IT Consulting
-
1 - 100 Employee
-
Data Engineer
-
Dec 2015 - May 2017
• Worked on enhanced performance of workflows by tuning at query level and workflow level using partitioning • Worked for the enhancement of authorizing claim process by developing a single module which works irrespective of all file types • Involved in project meetings for the requirement analysis • Used different partitions in the session to improve performance without affecting the logic • Performed peer reviews before reaching code to higher environments • Created Scheduled… Show more • Worked on enhanced performance of workflows by tuning at query level and workflow level using partitioning • Worked for the enhancement of authorizing claim process by developing a single module which works irrespective of all file types • Involved in project meetings for the requirement analysis • Used different partitions in the session to improve performance without affecting the logic • Performed peer reviews before reaching code to higher environments • Created Scheduled jobs using ESP • Involved in Requirement gathering and Requirement analysis form Business grooming sessions • Performed unit testing and prepared project related design documents • Developed the ETL design and responsible for the deliverables • Responsible for defect tracking using ALM. Show less • Worked on enhanced performance of workflows by tuning at query level and workflow level using partitioning • Worked for the enhancement of authorizing claim process by developing a single module which works irrespective of all file types • Involved in project meetings for the requirement analysis • Used different partitions in the session to improve performance without affecting the logic • Performed peer reviews before reaching code to higher environments • Created Scheduled… Show more • Worked on enhanced performance of workflows by tuning at query level and workflow level using partitioning • Worked for the enhancement of authorizing claim process by developing a single module which works irrespective of all file types • Involved in project meetings for the requirement analysis • Used different partitions in the session to improve performance without affecting the logic • Performed peer reviews before reaching code to higher environments • Created Scheduled jobs using ESP • Involved in Requirement gathering and Requirement analysis form Business grooming sessions • Performed unit testing and prepared project related design documents • Developed the ETL design and responsible for the deliverables • Responsible for defect tracking using ALM. Show less
-
-
-
Bloom Soft Tech
-
India
-
IT Services and IT Consulting
-
1 - 100 Employee
-
Data Analyst
-
Jul 2014 - Nov 2015
• Worked on Informatica tools -Source Analyzer, Mapping Designer, Mapplet Designer and Transformation Developer • Created Mappings using the transformations like Source Qualifier, Connected and Unconnected Lookup, Filter, Router, Update Strategy, Aggregator, Sequence Generator, Joiner and Expression Transformations • Prepared Unit test cases for the mappings • Developed mappings as per the given mapping Specs • Responsible for understanding dynamically changing requirements and… Show more • Worked on Informatica tools -Source Analyzer, Mapping Designer, Mapplet Designer and Transformation Developer • Created Mappings using the transformations like Source Qualifier, Connected and Unconnected Lookup, Filter, Router, Update Strategy, Aggregator, Sequence Generator, Joiner and Expression Transformations • Prepared Unit test cases for the mappings • Developed mappings as per the given mapping Specs • Responsible for understanding dynamically changing requirements and accommodating the changes in development • Involved in Data Extracting, Transforming and Loading the data from Source to Staging and Staging to Target according to the Business requirements • Validating ETL code while moving to other environments • Implemented parameters at Mapping and Workflow level • Scheduled Sessions and Batches on the Informatica Server using Workflow Manager. Show less • Worked on Informatica tools -Source Analyzer, Mapping Designer, Mapplet Designer and Transformation Developer • Created Mappings using the transformations like Source Qualifier, Connected and Unconnected Lookup, Filter, Router, Update Strategy, Aggregator, Sequence Generator, Joiner and Expression Transformations • Prepared Unit test cases for the mappings • Developed mappings as per the given mapping Specs • Responsible for understanding dynamically changing requirements and… Show more • Worked on Informatica tools -Source Analyzer, Mapping Designer, Mapplet Designer and Transformation Developer • Created Mappings using the transformations like Source Qualifier, Connected and Unconnected Lookup, Filter, Router, Update Strategy, Aggregator, Sequence Generator, Joiner and Expression Transformations • Prepared Unit test cases for the mappings • Developed mappings as per the given mapping Specs • Responsible for understanding dynamically changing requirements and accommodating the changes in development • Involved in Data Extracting, Transforming and Loading the data from Source to Staging and Staging to Target according to the Business requirements • Validating ETL code while moving to other environments • Implemented parameters at Mapping and Workflow level • Scheduled Sessions and Batches on the Informatica Server using Workflow Manager. Show less
-
-
Education
-
GITAM Deemed University
: Bachelors, Computer Science