Ayyappala Naidu Bandaru
Senior Data Engineer at PDX, Inc.- Claim this Profile
Click to upgrade to our gold package
for the full feature experience.
Topline Score
Bio
Experience
-
PDX, Inc.
-
United States
-
Software Development
-
100 - 200 Employee
-
Senior Data Engineer
-
Apr 2020 - Present
Responsibilities: Performed data analysis and developed analytic solutions.Data investigation to discover correlations / trends and the ability to explain them. Worked with Data Engineers, Data Architects, to define back-end requirements for data products (aggregations, materialized views, tables – visualization) Developed frameworks and processes to analyze unstructured information. Assisted in Azure Power BI architecture design. Conduct performance analysis and optimize data processes. Make recommendations for continuous improvement of the data processing environment Conduct performance analysis and optimize data processes. Make recommendations for continuous improvement of the data processing environment. Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell Scripting, and scheduling tools. Data profiling and data wrangling of XML, Web feeds and file handling using python, UNIX and Sql. Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python. Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics Performing data analysis, statistical analysis, generated reports, listings and graphs using SAS tools, SAS/Graph, SAS/SQL, SAS/Connect and SAS/Access. Developing Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats. Using Kafka and integrating with the Spark Streaming. Developed data analysis tools using SQL and Python code. Authoring Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data labeling and for all Cleaning and conforming tasks. Migrate data from on-premises to AWS storage buckets. Show less
-
-
-
Walmart
-
United States
-
Retail
-
700 & Above Employee
-
Senior Data Engineer
-
Aug 2018 - Mar 2020
Familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs. Installed OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning. Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration. Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard. Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done. Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features. Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems. Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase and Hive. Wrote Flume configuration files for importing streaming log data into HBase with Flume. Imported several transactional logs from web servers with Flume to ingest the data into HDFS Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Analyzed the SQL scripts and designed the solution to implement using Scala. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks. Developing python scripts for Redshift Cloudwatch metrics data collection and automating the datapoints to RedShift database. Show less
-
-
-
Chase
-
United States
-
Financial Services
-
700 & Above Employee
-
Big Data Engineer
-
Oct 2016 - Jul 2018
Responsibilities: Gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production Built APIs that will allow customer service representatives to access the data and answer queries. Designed changes to transform current Hadoop jobs to HBase Handled fixing of defects efficiently and worked with the QA and BA team for clarifications. Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files. Extending the functionality of Hive with custom UDF s and UDAF's The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users Implemented Bucketing and Partitioning using hive to assist the users with data analysis Used Oozie scripts for deployment of the application and perforce as the secure versioning software. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java Performed statistical analysis using SQL, Python, R Programming and Excel Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB. Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS Show less
-
-
-
Bank of America
-
United States
-
Banking
-
700 & Above Employee
-
Hadoop Developer
-
Jan 2016 - Sep 2016
Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations Experience in designing and developing applications in PySpark using python to compare the performance of Spark with Hive Headed negotiations to find optimal solutions with project teams and clients Mapped client business requirements to internal requirements of trading platform products Supported revenue management using statistical and quantitative analysis, developed several statistical approaches and optimization models Led the business analysis team of four members, in absence of the Team Lead. Added value by providing innovative solutions and delivering improved upon methods of data presentation by focusing on the Business need and the Business Value of the solution. Worked for Internet Marketing - Paid Search channels. Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders. Incorporated predictive modeling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations and integrated with the Tableau viz Create/Modify shell scripts for scheduling various data cleansing scripts and ETL load process. Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software GUI prompts user to enter personal information, charity items to donate, and deliver options. Developed a fully functioning C# program that connects to SQL Server Management Studio and integrates information users enter with preexisting information in the database. Used Sqoop to transfer data between relational databases and Hadoop. Worked on HDFS to store and access huge datasets within Hadoop. Good hands-on experience with GitHub. Show less
-
-
-
Caliber
-
United States
-
Financial Services
-
700 & Above Employee
-
Data Analyst
-
May 2014 - Nov 2015
Responsibilities Worked on analyzing Hadoop cluster and different big data analytic tools including Hive and Sqoop. Develop data pipeline using Sqoop and MapReduce to ingest current data and historical data in data staging area Responsible for defining data flow in Hadoop ecosystem to different teams. Wrote Pig scripts for data cleansing and data transformation as ETL tool before loading in HDFS. Worked on importing normalize data from staging area to HDFS using Scoop and perform analysis using Hive Query Language (HQL) Create Managed tables and External tables in Hive and load data from HDFS. Performed query optimization for HiveQL and denormalized Hive tables to increase speed of data retrieval Transferred analyzed data from HDFS to BI team for visualization and to data scientist team for predictive modelling Experience in scheduling workflows using Autosys Experience in running Hive queries on Spark execution engine Create different SAS reports like bar charts, tabular reports, cross tab reports etc. using SAS Web Report Studio and Create pages and Portlets in SAS information delivery Portal Publish the reports in SAS Information Delivery Portal and give access to different group of users Improving the project quality in terms of performance and the related documentation Performed Impact Analysis of the changes done to the existing mappings and provided the feedback. Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse Demonstrable experience designing and implementing complex applications and distributed systems into public cloud infrastructure (AWS, GCP, Azure, etc…) Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics, Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks Show less
-
-
Education
-
Alagappa University
Bachelor's degree, Electronics and Communications Engineering