Advertisement

hirak06datascience

About Candidate

Education

M
Master of Science in Quantitative Management (STEM May 2020
DUKE UNIVERSITY, The Fuqua School of Business
M
Master’s in business administration, Finance June 2012
MUMBAI UNIVERSITY
B
Bachelor of Engineering, Computer Science/IT June 2010
MUMBAI UNIVERSITY

Work & Experience

S
Senior Data Scientist Jan 2022 - Currently Working
UPGRADE Inc

• Created and maintained optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks. • Developed pipelines in ADF using Linked services, Datasets, and Pipelines to extract, transform, and load data from sources such as Teradata, Blob storage, and Azure SQL Data Warehouse. • Built the Oozie pipeline for file movements, Sqoop data transfers from Teradata or SQL, and exports into Hive staging tables with subsequent business-driven aggregations and main table loads. • Established infrastructure for optimal ETL processes from diverse data sources using SQL and big data technologies like Hadoop Hive and Azure Data Lake Storage. • Collaborated on ETL tasks to maintain data integrity and pipeline stability. • Applied data modeling, ETL processes, and data warehousing concepts within Power BI and QlikView environments. • Designed and implemented database solutions with Azure Blob Storage for data storage and retrieval. • Deployed data factory pipelines to orchestrate data flows into SQL Databases. • Maintained data processing solutions using Azure HDInsight, Azure Databricks, and Azure Stream Analytics for real-time and batch data processing. • Cleansed, manipulated, and analyzed large datasets, including semi-structured and unstructured data (XML, JSON, CSV, PDFs) using Python. • Developed Python scripts for data filtering, cleansing, mapping, and aggregation. • Designed and developed ETL workflows in Informatica, leveraging Informatica Designer components to build complex mappings, sessions, and workflows for an Enterprise Data Warehouse. • Created informative dashboards and data visualizations in Power BI, enabling business users to make data-driven decisions. • Employed Big Query to manage and analyze large datasets, optimizing queries for efficient and scalable data processing. • Translated business requirements into secure and maintainable code, aligning technical solutions with business objectives. • Conducted extensive data analysis and root cause analysis to troubleshoot issues and improve data quality. • Reviewed and streamlined business processes, translating them into effective BI reporting and analysis solutions to enhance operational efficiency. • Adhered to SDLC processes and various project management methodologies to ensure timely and successful project delivery. • Utilized data modeling techniques (star/snowflake schema design, data marts, slowly changing dimensions) to create physical and logical data models for data warehouses. • Identified and resolved BI application performance bottlenecks through thorough analysis and tuning, enhancing overall system performance.

D
Data science and Marketing Analytics Jan 2021 - Dec 2021
AVANT INC

• Led various phases of SDLC from requirement gathering to testing, delivering scalable data solutions for healthcare analytics. • Designed and implemented a streaming platform using AMQ-Streams, Kafka, Camel, and Spring, enhancing real-time data processing capabilities. • Utilized Sqoop for efficient data loading into Spark SQL for RDDs, Datasets, and DataFrames creation. • Implemented data formats such as Avro, Parquet, ORC, and JSON, and developed UDFs in Hive and Pig for advanced data processing. • Managed Change Data Capture (CDC) using Qlik Replicate and automated data loading into HDFS. • Developed Airflow scheduling scripts in Python for orchestrating complex data workflows. • Led end-to-end machine learning workflows, from data gathering and preprocessing to model evaluation and deployment using Azure and Snowflake data sources. • Implemented cloud computing solutions with HDInsight, Azure Data Lake, Azure Data Factory, and Azure Machine Learning, leveraging PowerShell scripting. • Developed Spark scripts in PySpark for data processing and analytics. • Built applications using Django and Flask frameworks, integrating REST APIs and leveraging dependency injection with Spring Framework. • Designed and implemented data pipelines using Oozie and Airflow for efficient data processing and workflow automation. • Utilized K-Streams for real-time data streaming and analytics, enhancing decision-making capabilities. • Proficient in developing data-driven stories and advanced data analysis using QlikView and Power BI, creating intuitive visualizations for business users. • Deployed and configured Power BI in cloud and on-premises environments, ensuring data security and automatic report refresh. • Implemented Snowflake and Data Vault modeling approaches for managing large datasets and metadata. • Optimized data processing using Spark's in-memory capabilities, efficient joins, and transformations. • Implemented machine learning algorithms including linear regression, logistic regression, decision trees, random forests, and XGBoost for predictive analytics. • Developed machine learning models using recurrent neural networks (LSTM) for time series analysis and predictive modeling. • Automated processes using Ansible Python API and managed collections in Python for efficient data manipulation. • Integrated Workday HCM modules and managed data pipelines using Fivetran and HVR for data replication into Snowflake.

I
Intern, Data Scientist March 2020 - May 2020
Moog Music Inc - Technology

6% saving in costs for component manufacturing giant by designing end-to-end solution – cloud migration, building predictive models

D
Data Science Consultant Jun 2018 - May 2019
AVAAMO TECHNOLOGIES

Hadoop Ecosystem and Data Integration: • Utilized all Hadoop ecosystems (HDFS, YARN, MapReduce, Hive, Flume, Oozie, Zookeeper, Impala, HBase, Sqoop) through Cloudera Manager. • Collected and aggregated large volumes of log data using Apache Flume, staging it in HDFS for further analysis. • Created core Python API used across multiple modules for seamless integration. • Implemented data ingestion and managed clusters for real-time processing using Kafka. • Set up multi-hop, fan-in, and fan-out workflows in Flume to streamline data processing. • Imported transactional logs from web servers into HDFS using Flume. • Developed custom sterilizers and interceptors in Flume to mask confidential data and filter unwanted records from event payloads. Data Analytics and Machine Learning: • Designed and implemented data analytics solutions in the Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, and Kafka. • Converted Hive/SQL queries into Spark transformations using Spark RDDs, Python, and Scala. • Developed parser and loader MapReduce applications to extract data from HDFS and store it in HBase and Hive. • Wrote user-defined functions (UDFs) in Hadoop PySpark for data transformations and loads. • Built various machine learning models (Logistic Regression, KNN, Gradient Boosting) using Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn in Python. • Experimented with ensemble methods to enhance model accuracy, deploying models on AWS. Database Management and Development: • Extracted data from Teradata into HDFS/Dashboards using Spark Streaming. • Worked on MongoDB database concepts including locking, transactions, indexes, sharing, replication, and schema design. • Automated RabbitMQ cluster installations and configurations using Python/Bash. • Workflow Automation and Batch Processing • Utilized OOZIE for batch processing and dynamically scheduling workflows.

A
Associate Director – Equities and Private Banking 2014 - 2015
HDFC Bank
H
Head of 2013 - 2014
Meghna Fabrics
C
Credit Analytics 2012 - 2013
Larsen & Toubro Finance