Praveen kumar

Data Scientist/Machine Learning/Data Engineer at Frost National Bank
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
Dallas-Fort Worth Metroplex

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

You need to have a working account to view this content.
You need to have a working account to view this content.

Experience

    • United States
    • Banking
    • 1 - 100 Employee
    • Data Scientist/Machine Learning/Data Engineer
      • Mar 2019 - Present

       Create and maintain reporting infrastructure to facilitate visual representation of manufacturing data for purposes of operations planning and execution.  Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.  Constructed product-usage SDK data and data aggregations by using PySpark, Scala, Spark SQL and Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dashboarding, and ad-hoc analyses.  Involved in data processing using an ETL pipeline orchestrated by AWS Data Pipeline using Hive.  A highly immersive Data Science program involving Data Manipulation & Visualization, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.  Converting SQL codes to Spark codes using Scala, PySpark, and Spark -SQL for faster testing and processing of data.  Configured spark streaming data to receive real-time data from Kafka and store it in HDFS.  Experience in moving high and low volume data objects from Teradata and Hadoop to snowflake.  Involved in HBase setup and storing data into HBase, which will be used for further analysis.  Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift and used JSON schema to define table and column mapping from S3 data to Redshift.  Worked on importing some of the data from NoSQL databases including HBase.  Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.  Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.  Implemented a script to transmit information from Oracle to HBase using Sqoop.  Create a clustered and non-clustered index, analyze queries to improve the performance of the query.  Used Scoop to store the data into HBase and Hive. Show less

    • United States
    • Outsourcing/Offshoring
    • 700 & Above Employee
    • Data Engineer
      • Oct 2017 - Feb 2019

      • Developed Python/Django application for Google Analytics aggregation and reporting. • Used Django configuration to manage URLs and application parameters. • Worked on Python Open stack APIs. • Used Python scripts to update content in the database and manipulate. • Hands on experience with IAM to set up user roles with corresponding user and group policies using JSON and add project users to the AWS account with multi factor authentication enabled and least privilege permissions. • Utilized AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS and create nightly AMIs for mission critical production servers as backups. • Experience with EC2, Cloud Watch, Elastic Load Balancing and managing securities on AWS. • Used AWS lambda to run code virtually. queries from Python using Python -MySQL connector and MySQL database package. • Designed high availability environment for Application servers and database servers on EC2 by using ELB and Auto-scaling. • Designed front end and backend of the application utilizing Python on Django Web Framework. • Configuring and Networking of Virtual Private Cloud (VPC), Cloud Front. • Extensive experience focusing on services like IAM and S3. Show less

    • United States
    • Wellness and Fitness Services
    • 700 & Above Employee
    • Data scientist/Data Engineer
      • Jan 2016 - Sep 2017

      • Involved in gathering Business Requirements from Business users. • Analyzed all jobs in the project and prepared ADS document for the impacted jobs. • Change the existing job as per the Business requirements. • Coordinating offshore team and making the offshore team work smoothly and getting the quality work from offshore on time. • Designed, developed and tested the DataStage jobs using Designer and Director based on business requirements and business rules to load data from source to target tables. • Modified the existing job with new functionality in the code. • Prepared the test cases for system test. • Used several stages like Sequential file, Hash file, Aggregator, Funnel, Change Capture, Change Apply, Row Generator, Peek, Remove Duplicates, Copy, Lookup, Join, Merge, Filter, Datasets during the development process of the DataStage jobs. • Deploying the code into all other test environments and making sure QA to pass all their test cases. • Resolving the defects which have been raised by QA. • Established best practices for DataStage jobs to ensure optimal performance, reusability, and restart ability. • Extracted the data from the DB2 database and loading into downstream Mainframe files for generating the reports. • Loading the NDM files into HDFS and further created Hive tables on top of them. • Created Hive views on top of Hive tables for business users. • Automated the Data Ingestion using Sqoop, shell script. • Ingested Data from Oracle DB to Hive using Sqoop script. Show less

    • United States
    • Telecommunications
    • 700 & Above Employee
    • Jr.Data Engineer
      • Jul 2014 - Dec 2015

       Experience in building distributed high-performance systems using Spark and Scala.  Experience developing Scala applications for loading/streaming data into NoSQL databases (MongoDB) and HDFS.  Perform T-SQL tuning and optimizing queries for and SSIS packages.  Designed Distributed algorithms for identifying trends in data and processing them effectively.  Creating an SSIS package to import data from SQL tables to different sheets in Excel.  Used Spark and Scala for developing machine learning algorithms that analyze clickstream data.  Used Spark SQL for data pre-processing, cleaning, and joining very large data sets.  Performed data validation with Redshift and constructed pipelines designed over 100TB per day.  Co-developed the SQL server database system to maximize performance benefits for clients.  Assisted senior-level data scientists in the design of ETL processes, including SSIS packages.  Database migrations from traditional data warehouses to spark clusters.  Ensure the data warehouse was populated only with quality entries by performing regular cleaning and integrity check.  Used Oracle relational tables and used them in process design.  Developed SQL queries to perform data extraction from existing sources to check format accuracy.  Developed automated tools and dashboards to capture a display dynamic data.  Installed a Linux operated Cisco sever and performed regular updates and backup and used MS excel functions for data validation.  Coordinated data security issues and instructed other departments about secure data transmission and encryption. Show less

    • Australia
    • Telecommunications
    • 200 - 300 Employee
    • Developer & Business Support
      • Jul 2012 - Apr 2014

      • Identifying the business rules that are implemented in the complete project and documenting them. The information is being used by Business team to write up the requirements. • Validating the UI specs and correcting them to make sure that they are in line with the existing system. • Responsible for getting updates from the onshore team, conducting stand up meetings and providing the updates to the Scrum Master. • Analyzing and documenting the details on various external reports which are obtained from external systems within the company and third-party vendors. • Responsible for tracking the story updates in RTC which helps to prepare the burndown chart which provides the graphical representation to clients on the work left to do versus time. • Implemented and enhanced CRUD operations for the applications using the MVC (Model View Controller) architecture of Django framework and Python conducting code reviews. • Wrote Python modules to extract/load asset data from the MySQL source database. • Analyzed the requirements and developed Use Cases, UML Diagrams, Class Diagrams, Sequence and State Machine Diagrams. Show less

Community

You need to have a working account to view this content. Click here to join now