Topline | Huazhi Fang

Homepage Find Professionals Huazhi Fang

Huazhi Fang

Big Data Engineer at Yahoo

Claim this Profile

Contact Information

us****@****om

(386) 825-5501

Gold Feature

Click to upgrade to our gold package
for the full feature experience.

Location

Atlanta, Georgia, United States, GE

Topline Score

Topline score feature will be out soon.

Bio

Generated by

Topline AI

You need to have a working account to view this content.

Join now

You need to have a working account to view this content.

Join now

Credentials

Data Analytics & Data Science

Digi-Safari & Tredence Inc.

Aug, 2019
- Oct, 2024
Big Data 101

IBM
Hadoop 101

IBM
Simplifying data pipelines with Apache Kafka

IBM
Spark Fundamentals I

IBM
Spark Fundamentals II

IBM
Using HBase for Real-time Access to your Big Data

IBM

Experience

Yahoo

Australia
Online Media
100 - 200 Employee

Big Data Engineer
- Sep 2019 - Present
• Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive. • Worked on AWS Kinesis for processing huge amounts of real-time data. • Expertise in optimizing the storage in Hive using partitioning and bucketing mechanisms on each the managed and external tables. • Used Spark SQL and DataFrames API to load structured and semi structured information into Spark Clusters. • Extensively worked on CI/CD pipeline for code deployment by engaging different tools (Git, Jenkins, CodePipeline) in the process right from developer code check-in to production deployment. • Used Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster. • Creation, configuration, and monitoring Shards sets. Analysis of the data to be shared, choosing a shard Key to distribute data evenly. • Enforced YARN Resource pools to share resources of cluster for YARN jobs submitted by users. • Exploring Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrames, Pair RDD's, Spark YARN. • Used Spark to export the transformed streaming datasets into Redshift on AWS cloud. • Created Lambda to process the information from S3 to Spark for organized gushing to get organized information by blueprint. • Installed and configured Kafka cluster and monitoring the cluster; Architected a lightweight Kafka broker; integration of Kafka with Spark for real-time data processing. • Extracted the needed data from the server into Hadoop file system (HDFS) and bulk loaded the cleaned data into HBase using Spark. • Accessed Hadoop file system (HDFS) using Spark and managed data in Hadoop data lakes. • Worked with the Spark-SQL context to create data frames to filter input data for model execution. • Utilized Spark Data Frame and Data Set from Spark SQL API widely for information handling. Show less

Spotify

Sweden
Musicians
700 & Above Employee

Date Engineer
- Dec 2017 - Sep 2019
• Installed and configured Kafka producer to ingest data from Rest API. • Installed and configured Spark consumer to scream data from Kafka Producer. • Proficient experience in writing Queries, Stored procedures, Functions, and Triggers by using SQL. Support development, testing, and operations teams during new system deployments. • Wrote custom user define functions (UDF) for complex Hive queries (HQL). • Configure and deploy production-ready multi-node Hadoop services Hive, Sqoop, Flume, Oozie on the Hadoop cluster with latest patches. • Developed scripts for collecting high-frequency log data from various sources and integrating it into HDFS using Flume; staging data in HDFS for further analysis. • Used Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster. • Configuring a multi-node cluster of 10 Nodes and 30 brokers for consuming high volume, high-velocity data. • Used Spark SQL to perform transformations and actions on data residing in Hive. • Used Zookeeper for numerous styles of centralized configurations, as well for Kafka offset management. • Assigned to making Hive tables, loading the info and writing hive queries. • Import/export knowledge into HDFS and Hive in exploitation of Sqoop and Kafka. • Created Partitions, Buckets supported State to additional method exploitation Bucket primarily based Hive joins. • Configure and deploy production-ready multi-node Hadoop services Hive, Sqoop, Flume, Oozie on the Hadoop cluster with latest patches. • Built a prototype for real-time analysis using Spark Streaming and Kafka. • Flume and HiveQL scripts to extract, transform, and load the data into database. • Loaded into ingested data into Hive Managed and External tables. • Involved in creating Hive tables, loading data, and writing hive queries, which will run internally in the map, reduce way. • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing. Show less

eBay

United States
Technology, Information and Internet
700 & Above Employee

Big Data Engineer
- Jun 2015 - Dec 2017
• Installed and configured Kafka producer to ingest data from Rest API. • Installed and configured Spark consumer to scream data from Kafka Producer. • Installed and configured Hive for data warehousing and HQL ETL. • Used Spark to migrate the data to Hive. • Worked on AWS to form, manage EC2 instances, and Hadoop Clusters. • Deployed the large knowledge Hadoop application mistreatment Talend on cloud AWS. • Using AWS Redshift for storing the information on cloud. • Performed maintenance, monitoring, deployments, and upgrades across infrastructure that supports all Hadoop clusters. • Used Zookeeper and Oozie for coordinating the cluster and programming workflows. • Involved in reworking knowledge from tables to HDFS, and HBase tables. • Transformed the logs data into knowledge model written UDF functions to format the logs knowledge. • Used HBase to store majority of information that required to be divided on columns region. • Experience with Spark for process ingested data from varied sources. • Created HBase tables to store variable data formats of information returning from completely different portfolios. • Used Spark SQL and Data Frames API to load structured and semi structured Data into Spark Clusters. • Wrote shell scripts for log files to Hadoop cluster through automatic processes. • Successfully loaded files to HDFS from MySQL using Spark. Show less

Ahold Delhaize

Netherlands
Retail
700 & Above Employee

Date Engineer
- Oct 2013 - Jun 2015
• Installed and configured Hadoop cluster including HDFS, Yarn and MapReduce. • Used Spark to migrate data from HDFS to MySQL database. • Installed and configured Hive and also written Hive UDFs. • Worked with totally different file formats and compression techniques to standards. • Involved in loading data from the UNIX file system to HDFS. • Installed and configured MySQL server to allow remote user access on Ubuntu. • Loaded RDBMS of large datasets to big data by using Sqoop. • Accessed Hadoop cluster (CDM) and reviewed log files of all daemons. • Analyzed datasets using Hive, MapReduce, and Sqoop to recommend business improvements • Maintaining and troubleshooting network connectivity. • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis. • Installed and configured Flume agent to ingest data from Rest API. Show less

Education

University of Science and Technology Beijing
Doctor of Philosophy - PhD, Modeling and Simulation in Materials Science

2004 - 2010

Community

You need to have a working account to view this content. Click here to join now

Topline Software

Get New Leads on Autopilot

The All-in-One Growth Operating System

Access Our Free Tools

Topline Services

Done For You Marketing & Sales Solutions

Done For You Global Talent Recruitment

Agency and Reseller Services

Topline Use Cases

The All-in-One Growth Operating System

The All-in-One Growth Operating System