Haobin Yuan
Machine Learning Engineer at 数说故事DataStory- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
Topline Score
Bio
Experience
-
数说故事DataStory
-
China
-
Information Technology & Services
-
1 - 100 Employee
-
Machine Learning Engineer
-
Dec 2020 - Present
Responsible for the algorithm development in the consumer portrait system, mainly natural language processing (NLP) including text classification, text representation, and named entity recognition Responsible for the algorithm development in the consumer portrait system, mainly natural language processing (NLP) including text classification, text representation, and named entity recognition
-
-
-
-
Software Engineer
-
Nov 2018 - Oct 2019
Optimization on machine learning algorithm – Improvement in clustering algorithm performance1. Improved performance of SEABED, a novel self-developed clustering algorithm, while processing larger-scale datasets in order to apply it to industrial environment.2. Reduced the time complexity of core parts (SVD, Graph construction, Graph cut) of the algorithm by Nystrom theory and applying simple algorithms to sampling data.3. Speeded up the algorithm up to 15 times while processing thousands of samples.4. Developed three baseline algorithms based on different types of Laplacian vector for ensemble clustering.5. Applied ensemble clustering models by developing ensemble committee selection models and consensus functions to improve clustering quality.6. Employed internal and external clustering evaluation indexes to develop three committee selection models. 7. Developed two consensus functions based on the co-association matrix and Hybrid Bipartite Graph Formulation respectively. 8. Proposed two ensemble clustering models by combining ensemble committee selection models and consensus functions.9. Visualized clustering results and evaluated algorithms by silhouette coefficient and RMSSTD, which turned out that the ensemble models improved the clustering quality by 8%.
-
-
-
Department of Computer Science, University of Sheffield
-
United Kingdom
-
Higher Education
-
1 - 100 Employee
-
Software Engineer
-
Feb 2019 - Sep 2019
Machine Learning algorithm development - Clustering cancer cell lines based on drug responses1. Analyzed pros and cons of conventional algorithms and proposed candidate algorithms.2. As the co-author to propose a new clustering algorithm, SEABED, to address the drawbacks of previous models targeting cancer drugs development.3. Responsible for data collection and pre-processing with feature engineering tricks such as one-hot code and feature crosses.4. Designed and developed SEABED in Python based on graph theory and network analysis with tools such as Scikit-Learn, Numpy. 5. Experimented to compare SEABED with K-Means and hierarchical clustering and evaluated the performance by combining visualization of clustering and internal evaluation indexes.6. Proved that SEABED has remarkable performance in handling outliers and balance of size of clusters, which can be employed to support data analysis in precision medicine.7. Corresponding paper has been published on Nature journal (Nature Partner Journal, Systems Biology and Applications).
-
-
-
Deloitte
-
Business Consulting and Services
-
700 & Above Employee
-
Software Engineer
-
May 2018 - Jul 2018
1.Understood the usage scenarios of Salesforce in customers and learned Apex developed Salesforce system.2. Developed web-base management system to learn commonly used technologies, including SpringBoot, SpringMVC, Spring, Hibernate, Bootstrap, RESTful API and JSON. 1.Understood the usage scenarios of Salesforce in customers and learned Apex developed Salesforce system.2. Developed web-base management system to learn commonly used technologies, including SpringBoot, SpringMVC, Spring, Hibernate, Bootstrap, RESTful API and JSON.
-
-
Education
-
英国谢菲尔德大学
Master of Science - MS, Computer Science -
广东财经大学
学士学位, 计算机软件工程