Data analyst with an astrophysics background and a habit of asking why. Work experience across retail, finance, healthcare, EdTech and nonprofit. Currently exploring data analytics, data science, and AI opportunities where analysis drives real decisions.
View the Project on GitHub MonikaGundecha/portfolio
Data Analyst
Technical Skills: Python, SQL, R, Tableau, Power BI
Education
-
| Masters in Data Analytics |
Northeastern University |
-
| Master of Science in Physics |
Pune University |
-
| Bachelor of Science in Physics |
Pune University |
Work Experience
Data Science Intern @ Bristol Myers Squibb (Jul 2025 - Sep 2025)
- Analyzed biomarker and neuroinflammation datasets in R using EDA and statistical testing to support experimental decisions.
- Converted manual Excel models into reproducible R pipelines, reducing errors and saving 40 hours of manual work weekly.
- Collaborated with scientists to validate assumptions, interpret results, and translate analyses into actionable insights.
Data Analyst @ Madden Global Solutions (Jan 2025 - July 2025)
- Designed and implemented a Python-based ETL pipeline using pandas to automate Circana data processing into Google Looker Studio for a BI reporting system from scratch to track weekly KPIs. Enabled real-time reporting for 14 clients and reduced manual reporting by 80%.
- Built an integrated supply chain analytics platform using Excel Macros to achieve 98% in-stock rates and a 15% sales increase.
- Implemented a cost-benefit analysis framework using Excel to evaluate promotional campaigns, delivering a 235% ROI.
- Automated Excel-based financial budget watch reports by extracting and transforming SAP data to track $76 million in CPG shipments.
- Identified supply chain and compliance gaps at the store level, analyzing POS data using Excel, and recovered $3.9 million in lost sales for a client.
Graduate Teaching Assistant @ Northeastern University (Apr 2024 - Jul2025)
- Provided academic support to 160 students in Machine Learning and Statistics, focusing on GLM, KNN models, regression, regularization, and hypothesis testing using Chi-Square, ANOVA, T-tests, etc., through personalized and group study sessions.
- Guided 60 students in understanding Hadoop Distributed File System (HDFS), and big data analytics using tools like Cloudera, Apache Spark, Apache Hadoop, Hue, and Impala, contributing to 70% of students achieving an A-grade.
- Guided students on AI governance and responsible AI, covering bias, fairness, and evaluation frameworks.
- Tutored 120 students in Probability and Statistics, statistical hypothesis testing and designing experiments for A/B testing.
Research Analyst @ Northeastern University (Oct 2024 - Apr 2025)
- Processed and cleaned a high-volume dataset of 5 million job postings in R by deduplicating, normalizing, and restructuring
records while performing validation and compression, optimizing storage efficiency by 80% and enabling scalable analysis.
- Utilized data mining to extract hidden credentials from unstructured job descriptions using NLP in R, ensuring data integrity for
accurate credential demand analysis.
- Transformed data using Alteryx and built Tableau dashboards to track credential vs. degree demand metrics and trends.
Product Analyst @ Infinity Learn (Oct 2022 - Apr 2023)
- Designed Generative AI workflows to test content variants and reduce production timelines by 40% with automated pipelines.
- Performed rigorous data quality validation in Excel across 6,000+ assessments and 500+ articles using standardized audit logs.
- Analyzed competitor platforms and engagement metrics, identifying content gaps that increased user engagement by 14%.
Product Manager @ LIDO Learning (Apr 2022 - Aug 2022)
- Led a cross-functional curriculum development team of 6 using JIRA to release digital learning products, earning a 3.9/4 rating.
- Analyzed LMS data using SQL and built Tableau dashboards informing UX decisions, increasing engagement by 20%.
- Conducted competitor and sentiment analysis on customer feedback, identifying product gaps that guided release of 3 features.
Tutor Trainer @ LIDO Learning (May 2021 - Mar 2022)
- Mentored 200+ teachers and 400+ Business Development Associates, elevating instruction quality and resulting in improved customer retention and 18% new customer growth.
- Analyzed Tutor data using advanced Excel, created dashboards using Power BI to develop an Online Teaching Practices program using Articulate Rise, resulting in a 30% increase in tutor delivery ratings and driving a 15% revenue growth within a month.
Curriculum Designer @ LIDO Learning (Oct 2019 - Apr 2021)
- Designed and developed immersive EdTech products, collaborating with the product team on strategy and research, resulting in highly
engaging educational content that achieved a 3.95/4.00 stakeholder rating.
- Collaborated with the Data Analytics team to gather student data using LMS and perform analysis to drive student growth by 12%.
- Mentored and onboarded 4 new curriculum designers and helped them achieve a rating of 3.80/4.00 in their first month.
Program Manager @ Science for All Foundation (Jun 2017 - Sep 2019)
- Managed stakeholder relationships and facilitated communication between the organization and 20 low-income public schools across the
city, ensuring alignment of services throughout the academic year.
- Implemented a comprehensive training program for 35 instructors, enhancing Math and Science delivery in Public Schools, resulting in improved academic performance and a 150% increase in client base.
- Led yearlong boot camp interventions in 30 low-income public schools to increase the basic math operation skills of middle school students by 65%.
- Organized the βPune Science Festival 2018,β a two-day science exhibition, with the assistance of over 50 public school student volunteers, drawing in a crowd of over 2,500 visitors.
Projects
Engaging Worlds, Gold Medal - MIT Education Hackathon β24 β Python, Convai, Anthropic
Link
- Developed an AI-powered immersive learning platform that integrates interactive experiences with automated assessment across
multiple devices, including VR.
- Integrated NLP for real-time student interaction analysis, generating actionable insights for educators.
Link
- Built an end to end analytics pipeline in Databricks to process ER visit data and analyze patient flow from arrival to discharge.
- Designed a star schema with fact and dimension tables to support analysis of wait time, length of stay, and delay patterns.
- Performed data cleaning and transformations using PySpark and SQL.
- Developed Tableau dashboards to identify bottlenecks, peak hours, and patient level trends.
Retail Sales Analytics Pipeline and Dashboard β Azure, Databricks, SQL, Power BI
Link
- Built an end to end retail analytics pipeline using Azure Data Factory concepts, Azure Data Lake, and Databricks to process data from multiple sources.
- Implemented medallion architecture with Bronze, Silver, and Gold layers to improve data quality and support scalable data processing.
- Performed data transformations and aggregations using SQL to generate business metrics such as total sales, product performance, and store level insights.
- Developed an interactive Power BI dashboard to analyze sales trends, top products, and regional performance across stores and customers.
-
FIFA Players Analytics β R, Excel
Link
- Conducted extensive Exploratory Data Analysis (EDA) on 18,483 FIFA player records, identifying key correlations and patterns in player attributes, market values, and wages.
- Developed a Logistic regression model to classify players with high or low wages with a 0.93 AUC score and a regularization model to predict
playersβ potential ratings with 98.92% accuracy using R.
- Performed hypothesis testing to compare overall ratings between European and South American players, revealing statistically significant differences (p-value <0.05) and providing insights for regional scouting strategies.
Kidney Failure Production β Python, SQL, Excel
Link
- Led a team of 4 members to build a Kidney Failure Prediction model using Logistic Regression and GBT models with 95.6% accuracy.
- Combined 6 datasets related to demographics, medical history, lab test, diet of the patients with over 143 variables using SQL to create
train and test datasets to build a model.
NYPD Crime Analysis β Excel, Tableau
Link
- Discovered that 20% of male perpetrators involved in molestation-related crimes are young adults after performing data cleaning and
Exploratory Data Analysis (EDA) using Excel.
- Developed interactive Tableau dashboards connecting various data sources and defining calculations to provide crime insights.
Drug Misuse Database Management System β SQL, R, Excel
Link
- Developed SQL database design to study drug consumption among different age groups over the period and deployed it on Azure.
- Utilized Excel for data transformation and R for data visualization of drug consumption patterns.
Analysis of Magneto Hydrodynamic Waves and Energy Transport in the Solar Corona - IDL
Link
- Investigated the coronal heating problem using Magneto-Hydrodynamics (MHD) to study energy transport and dissipation via waves in the solar atmosphere.
- Analyzed properties of propagating waves in the solar corona using data from the Solar Dynamics Observatory (SDO). Performed spatial and temporal image analysis on multi-wavelength observations using Interactive Data Language (IDL) and SolarSoft.
- Conducted Fourier analysis to study frequency distributions and power spectra at various coronal heights.