Umesh I
sr.Big Data Engineer at GILEAD SCIENCES LTD- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
Topline Score
Bio
Experience
-
GILEAD SCIENCES LTD
-
United Kingdom
-
Biotechnology
-
1 - 100 Employee
-
sr.Big Data Engineer
-
Mar 2021 - Present
Translated data mapping specifications into detailed system test plans, ensuring accurate extraction, transformation, and transfer of data from internal data warehouses to external entities. Prepared thorough documentation covering entities, attributes, data relationships, primary and foreign key structures, allowed values, business rules, and glossary updates, adapting to evolving project needs. Experienced in Data Governance practices, including Metadata Management, Master Data Management, Data Quality, and Data Security, emphasizing data management best practices across the data lifecycle. Designed and executed scalable data lake architectures, incorporating cutting-edge technologies such as Hadoop Distributed File System (HDFS) and cloud-based object storage. Enabled efficient data ingestion, storage, and processing within complex environments. Expertise in designing, developing, and implementing intricate ETL processes using Apache Nifi, showcasing adaptability in handling data movement and transformation. Conducted comprehensive testing of ETL processes, validating data integrity both before and after the transformation. Assured accurate message publication from the ETL tool and successful data loading into various databases. Successfully migrated legacy data infrastructure to Google Cloud Platform (GCP), proficiently transitioning data pipelines to Cloud Storage and BigQuery, aligning with modern cloud-based data solutions. Stayed updated with emerging trends such as DataOps practices, real-time data processing with Kafka, and the integration of machine learning for advanced data insights. Explored advanced data analytics techniques such as graph analytics using tools like Apache Spark GraphX, and integrating machine learning algorithms for predictive analytics within data pipelines. Proficiently employed UNIX Shell Scripting for tasks such as splitting large files into smaller ones and automating file transfers. Show less
-
-
-
Edward Jones
-
United States
-
Financial Services
-
700 & Above Employee
-
AWS Data
-
May 2018 - Feb 2021
Responsibilities: Developed Sqoop Jobs to efficiently load data from Relational Database Management Systems (RDBMS) to external systems such as HDFS and HIVE, ensuring smooth data migration and integration. Proficiently created Spark applications using both PySpark and Spark-SQL, focusing on dataextraction, transformation, and aggregation across diverse file formats. Successfully handled dynamic XML data conversion for ingestion into HDFS. Leveraged AWS services including S3 and Redshift to extract, transform, and load data from various sources into the cloud, demonstrating expertise in cloud-based data integration. Designed and implemented ETL workflows utilizing AWS Glue, showcasing automation prowess in data processing and transformation tasks within the cloud environment. Optimized SQL queries for enhanced data retrieval and aggregation performance, contributing to efficient reporting and analysis. Architected a data lake infrastructure on AWS S3, effectively ingesting and storing a wide array of datasets, facilitating advanced analytics and insightful reporting capabilities. Created a real-time data processing pipeline using AWS Lambda and Kinesis, enabling instantaneous analysis of streaming data, staying aligned with real-time analytics trends. Designed and implemented a scalable and performant data warehouse on AWS Redshift, strategically optimizing data storage and query performance for seamless data access. Automated ETL workflows through AWS Glue, reducing manual intervention, enhancing data processing efficiency, and maintaining data integrity. Stayed updated with emerging technologies such as Apache Kafka for real-time data streaming, serverless computing with AWS Lambda, and modern data lake architecture patterns. Explored advanced analytics techniques like machine learning integration with data pipelines, optimizing data processing for predictive modeling and insights generation. Show less
-
-
-
Travelport
-
United Kingdom
-
Information Technology & Services
-
700 & Above Employee
-
Senior Data Engineer
-
Oct 2017 - Apr 2018
Expertise in crafting scalable and secure data pipelines tailored for handling large datasets, ensuring optimal performance and data integrity throughout the process. Drove requirements gathering for new data sources, encompassing data lifecycle management, data quality validation, transformation logic, and metadata enrichment strategies. Championed data quality management by incorporating robust data quality checks directly into data pipelines, ensuring the reliability and accuracy of ingested data. Continuously improved the Data Ingestion Framework by implementing advanced techniques to fortify security, enhance efficiency, and maintain data integrity across all stages of the pipeline. Pioneered the implementation of data streaming capabilities using Kafka and Informatica, effectively handling real-time data ingestion from multiple sources while maintaining data integrity. Played a key role in implementing SQOOP, facilitating seamless data loading between various RDBMS sources and Hadoop systems, contributing to efficient data movement and integration. Utilized a range of storage formats such as Avro and Parquet, alongside databases like Hive, to optimize data storage, retrieval, and query performance within diverse environments including cloud-based platforms like Microsoft Azure SQL. Stayed updated with the latest trends in data engineering, including event-driven architectures,stream processing frameworks like Apache Kafka Streams, and cloud-native data solutions. Continued to explore emerging technologies such as Apache NiFi for efficient data flow orchestration, Apache Beam for unified batch and stream processing, and containerization using Docker and Kubernetes for scalable and portable deployment. Actively participated in a collaborative development environment, leveraging agile methodologies and modern collaboration tools such as GitLab, Bitbucket, or Azure DevOps to enhance team efficiency and project transparency. Show less
-
-
-
Infosys
-
India
-
IT Services and IT Consulting
-
700 & Above Employee
-
Data Engineer
-
Jan 2016 - Jun 2017
Expertly orchestrated the migration of a large-scale Oracle database to Google BigQuery, capitalizing on cloudnative capabilities for enhanced scalability and analytics. Employed data migration best practices to ensure a smooth transition and optimal performance in the cloud environment.. Designed and implemented robust data pipelines within Google Cloud Platform (GCP) using Apache Airflow. Leveraged an array of airflow operators, including the latest ones, to streamline ETL processes, ensuring data reliability, quality, and efficient orchestration. Demonstrated proficiency in seamlessly transferring data between Google Cloud and Microsoft Azure utilizing Azure Data Factory. Employed cutting-edge techniques to ensure secure and efficient cross-cloud data movement, catering to modern hybrid cloud architectures. Created high-impact Power BI reports, leveraging the capabilities of Azure Analysis Services to optimize report performance and interactive data exploration. Utilized advanced data modeling techniques to provide actionable insights for data-driven decision-making. Utilized GCP Cloud Shell SDK to efficiently manage and configure core services such as Data Proc, Storage, and BigQuery. Spearheaded collaborative initiatives to automate the generation of daily adhoc reports and extracts from large-scale enterprise data housed in BigQuery. Proficiently developed Spark applications using Spark SQL within Databricks, enabling efficient data extraction, transformation, and aggregation across diverse file formats. These advanced capabilities were pivotal for uncovering intricate customer usage patterns and deriving actionable insights. Stayed abreast of the latest trends in cloud computing, including serverless architectures, microservices, and multi-cloud strategies, to ensure cutting-edge solutions aligned with evolving industry standards. Integrated services like GitHub, Snowflake, AWS CodePipeline, AWS Elastic Beanstalk to create a deployment pipeline. Show less
-
-
-
PepsiCo
-
United States
-
Food and Beverage Services
-
700 & Above Employee
-
Data Engineer
-
Oct 2013 - Dec 2015
Skillfully managed end-to-end data warehouse architecture, encompassing schema design and implementation of SQL objects. Employed Hive and HBase to execute data processing with finesse, employing advanced scripting techniques. Devised and executed intricate Hive and Pig scripts, dynamically transforming data and uncovering nuanced behavioral insights. Created and executed intricate Hive and Pig scripts, dynamically transforming data and deriving nuanced behavioral insights. Orchestrated seamless task scheduling using Python scripts, automating report generation via Windows Task Scheduler. Engineered sophisticated ETL pipelines utilizing a combination of tools including SSIS, PySpark, pandas, ensuring adherence to the highest data manipulation standards. Mastered Oozie, Sqoop, and cloud-driven solutions for orchestrating fluid data workflows and performing essential preprocessing tasks. Orchestrated seamless task scheduling using Python scripts, automating report generation through tools like Apache Airflow or other relevant solutions. Efficiently engaged with stakeholders, translating their requirements into impactful visualizations and offering comprehensive training in Business Intelligence (BI) tools. Worked with google data catalog and other google cloud APIs for monitoring, query, and billing related analysis for Big Query usage. Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process. Knowledge about cloud dataflow and Apache beam. Good knowledge in using cloud shells for various tasks and deploying services. Created Big Query authorized views for row level security or exposing the data to other teams. Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, SQOOP, Apache Spark, with Cloudera Distribution.Continuously stayed updated with emerging cloud-native technologies such as serverless computing, Kubernetes for container Show less
-
-
Education
-
Jawaharlal Nehru Technological University
Bachelor of Technology - BTech, Computer Science