Huazhi Fang

Big Data Engineer at Yahoo
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
Atlanta, Georgia, United States, GE

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Credentials

  • Data Analytics & Data Science
    Digi-Safari & Tredence Inc.
    Aug, 2019
    - Oct, 2024
  • Big Data 101
    IBM
  • Hadoop 101
    IBM
  • Simplifying data pipelines with Apache Kafka
    IBM
  • Spark Fundamentals I
    IBM
  • Spark Fundamentals II
    IBM
  • Using HBase for Real-time Access to your Big Data
    IBM

Experience

    • Australia
    • Online Media
    • 100 - 200 Employee
    • Big Data Engineer
      • Sep 2019 - Present

      • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive. • Worked on AWS Kinesis for processing huge amounts of real-time data. • Expertise in optimizing the storage in Hive using partitioning and bucketing mechanisms on each the managed and external tables. • Used Spark SQL and DataFrames API to load structured and semi structured information into Spark Clusters. • Extensively worked on CI/CD pipeline for code deployment by engaging different tools (Git, Jenkins, CodePipeline) in the process right from developer code check-in to production deployment. • Used Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster. • Creation, configuration, and monitoring Shards sets. Analysis of the data to be shared, choosing a shard Key to distribute data evenly. • Enforced YARN Resource pools to share resources of cluster for YARN jobs submitted by users. • Exploring Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrames, Pair RDD's, Spark YARN. • Used Spark to export the transformed streaming datasets into Redshift on AWS cloud. • Created Lambda to process the information from S3 to Spark for organized gushing to get organized information by blueprint. • Installed and configured Kafka cluster and monitoring the cluster; Architected a lightweight Kafka broker; integration of Kafka with Spark for real-time data processing. • Extracted the needed data from the server into Hadoop file system (HDFS) and bulk loaded the cleaned data into HBase using Spark. • Accessed Hadoop file system (HDFS) using Spark and managed data in Hadoop data lakes. • Worked with the Spark-SQL context to create data frames to filter input data for model execution. • Utilized Spark Data Frame and Data Set from Spark SQL API widely for information handling. Show less

    • Sweden
    • Musicians
    • 700 & Above Employee
    • Date Engineer
      • Dec 2017 - Sep 2019

      • Installed and configured Kafka producer to ingest data from Rest API. • Installed and configured Spark consumer to scream data from Kafka Producer. • Proficient experience in writing Queries, Stored procedures, Functions, and Triggers by using SQL. Support development, testing, and operations teams during new system deployments. • Wrote custom user define functions (UDF) for complex Hive queries (HQL). • Configure and deploy production-ready multi-node Hadoop services Hive, Sqoop, Flume, Oozie on the Hadoop cluster with latest patches. • Developed scripts for collecting high-frequency log data from various sources and integrating it into HDFS using Flume; staging data in HDFS for further analysis. • Used Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster. • Configuring a multi-node cluster of 10 Nodes and 30 brokers for consuming high volume, high-velocity data. • Used Spark SQL to perform transformations and actions on data residing in Hive. • Used Zookeeper for numerous styles of centralized configurations, as well for Kafka offset management. • Assigned to making Hive tables, loading the info and writing hive queries. • Import/export knowledge into HDFS and Hive in exploitation of Sqoop and Kafka. • Created Partitions, Buckets supported State to additional method exploitation Bucket primarily based Hive joins. • Configure and deploy production-ready multi-node Hadoop services Hive, Sqoop, Flume, Oozie on the Hadoop cluster with latest patches. • Built a prototype for real-time analysis using Spark Streaming and Kafka. • Flume and HiveQL scripts to extract, transform, and load the data into database. • Loaded into ingested data into Hive Managed and External tables. • Involved in creating Hive tables, loading data, and writing hive queries, which will run internally in the map, reduce way. • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing. Show less

    • United States
    • Technology, Information and Internet
    • 700 & Above Employee
    • Big Data Engineer
      • Jun 2015 - Dec 2017

      • Installed and configured Kafka producer to ingest data from Rest API. • Installed and configured Spark consumer to scream data from Kafka Producer. • Installed and configured Hive for data warehousing and HQL ETL. • Used Spark to migrate the data to Hive. • Worked on AWS to form, manage EC2 instances, and Hadoop Clusters. • Deployed the large knowledge Hadoop application mistreatment Talend on cloud AWS. • Using AWS Redshift for storing the information on cloud. • Performed maintenance, monitoring, deployments, and upgrades across infrastructure that supports all Hadoop clusters. • Used Zookeeper and Oozie for coordinating the cluster and programming workflows. • Involved in reworking knowledge from tables to HDFS, and HBase tables. • Transformed the logs data into knowledge model written UDF functions to format the logs knowledge. • Used HBase to store majority of information that required to be divided on columns region. • Experience with Spark for process ingested data from varied sources. • Created HBase tables to store variable data formats of information returning from completely different portfolios. • Used Spark SQL and Data Frames API to load structured and semi structured Data into Spark Clusters. • Wrote shell scripts for log files to Hadoop cluster through automatic processes. • Successfully loaded files to HDFS from MySQL using Spark. Show less

    • Netherlands
    • Retail
    • 700 & Above Employee
    • Date Engineer
      • Oct 2013 - Jun 2015

      • Installed and configured Hadoop cluster including HDFS, Yarn and MapReduce. • Used Spark to migrate data from HDFS to MySQL database. • Installed and configured Hive and also written Hive UDFs. • Worked with totally different file formats and compression techniques to standards. • Involved in loading data from the UNIX file system to HDFS. • Installed and configured MySQL server to allow remote user access on Ubuntu. • Loaded RDBMS of large datasets to big data by using Sqoop. • Accessed Hadoop cluster (CDM) and reviewed log files of all daemons. • Analyzed datasets using Hive, MapReduce, and Sqoop to recommend business improvements • Maintaining and troubleshooting network connectivity. • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis. • Installed and configured Flume agent to ingest data from Rest API. Show less

Education

  • University of Science and Technology Beijing
    Doctor of Philosophy - PhD, Modeling and Simulation in Materials Science
    2004 - 2010

Community

You need to have a working account to view this content. Click here to join now