Professional Skill Set

Statistics

Data Analysis

Python

R

MySQL

Data Wrangling

ETL Processes

Machine Learning

Data Modeling

Data Visualization

AWS

Git

Projects

Social Media Community Identification and Analysis

I pioneered the analysis of social media communities through advanced Python and PRAW library techniques for Reddit data extraction. By employing machine learning algorithms like K-Means clustering and Random Forest, I efficiently classified communities and revealed their dynamics using network graph visualizations, highlighting intricate topic clustering patterns.

Evolution Of Wildfire

In an advanced Virtual Reality (VR) modeling project, we leveraged state-of-the-art Data Mining methods, integrating Python for precise data preprocessing and R for comprehensive Exploratory Data Analysis (EDA). Our approach meticulously managed a substantial 45 GB dataset, enabling us to generate detailed visualizations that traced wildfire progression across diverse terrains. This process not only unveiled intricate environmental patterns but also provided insightful trends in a clear and thorough manner.

Virtual Tissue Simulation

I developed a sophisticated Retrieval-Augmented Generation (RAG) framework, integrating pre-trained Language Models (LLMs) for enhanced information retrieval. I streamlined a text summarization pipeline with a dynamic context window to refine prompts for optimal outcomes. Additionally, I crafted a Python script for visualizing LLM embeddings on academic texts using t-SNE, employing advanced NLP techniques for semantic analysis and topic clustering.

Job Recommendation System

I designed a predictive model to identify the most fitting IT roles from resume skills, employing ML algorithms such as SVM, Decision Trees, Random Forest, Neural Networks, and Naive Bayes, sourced from LinkedIn and Indeed via web scraping. The Random Forest algorithm achieved a remarkable 92% accuracy, enhancing user satisfaction and engagement by 15%. Results were visualized through network graphs, demonstrating deep insights into community dynamics and topic clustering.

Disease Identification of a Leaf

In my deep learning project, I developed a TensorFlow-based model utilizing Convolutional Neural Networks (CNNs) and data augmentation techniques to classify leaf diseases with 92% accuracy. I automated data preprocessing and employed TensorFlow's dataset API for efficient data management, leveraging Google Cloud Platform (GCP) for scalable model training and deployment. This comprehensive approach ensured robust disease identification, demonstrating the model's potential for real-world agricultural applications.

Smart Trolley

Our team developed a smart cart with a mobile app, enhancing shopping by simplifying product location and streamlining checkout with Raspberry Pi-based barcode scanning. This system automatically displays product info in the app and calculates the total bill at checkout, facilitating a more efficient shopping experience

Journey so far

System Engineer at Tata Consultancy Services

Oct 2020 – July 2022, Pune, India
  • Data Pipeline Development and Optimization: Led the creation of a high-throughput ETL data pipeline using Apache Spark and Airflow, with Python for efficient processing, achieving a 20% reduction in data processing time. This enhancement boosted real-time analytics, supporting strategic decisions, and resulting in a 15% improvement in analytical decision-making speed.
  • Cloud Migration and Security Expertise: Directed the smooth transition of on-premises data systems to AWS, optimizing costs for scalable cloud infrastructure. Automated migration processes with Python scripts led to a 10% cost saving in cloud operations. Enhanced data security with robust encryption and AWS security features, reducing security incidents by 25% and ensuring 100% compliance with industry standards.
  • Data Automation and Visualization Leadership: Established CI/CD pipelines for data applications using Python, reducing data inconsistencies by 30%. Created custom Tableau dashboards integrated with AWS, enhancing data-driven decisions and contributing to a 20% increase in stakeholder satisfaction.
  • IT Infrastructure management and SCCM Proficiency: Proficiently managed IT infrastructure at Maersk, leveraging SCCM for application deployment, which improved system efficiency by 20% and reduced downtime by 30%. Achieved a system reliability score of 99.5%, ensuring high performance and system reliability.