I am a Ph.D. candidate at NYU Data Science and founder at SimPPL, working with the Center for Social Media and Politics and collaborating with researchers at Oxford University and the Integrity Institute. My research deals with limiting disinformation on social networks using tools from simulation-based inference and causality. I use probabilistic programs to simulate user behavior and information propagation across online platforms. I also lead Fake News Research at the AI4ABM Foundation and serve on the Board of Studies at D.J. Sanghvi College of Engineering. I’ve previously interned at Twitter Civic Integrity, Adobe Research, and worked on machine learning for particle physics at CERN.
I founded SimPPL (read: ‘simulate people’), to commercialize my research in a simulation-as-a-service venture to advance network science. We create models of information diffusion from public digital trace data to aid organizational decision-making with data-driven insights. In an ongoing partnership, we are supporting journalists in Europe with their investigation of online propaganda. SimPPL has received support from the NYC Media Lab and the AI4ABM Foundation and was invited to be part of the NYU Tech Venture Workshop.
I also founded and lead a collaborative research community for undergrads and grad students, Unicode Research, with whom I recently launched a Google-backed independent ML Summer Course over 10 weeks. This team was born out of the parent org. I co-founded in 2017 to teach open-source development to undergrads, called DJ Unicode.
- I’ve accepted a position on the Board of Studies at my alma mater, D.J. Sanghvi College of Engineering.
- I’ve accepted a position to lead Fake News Research at the AI4ABM Foundation.
- I’m giving an invited talk on studying “disinformation landscapes” on social networks at Truth and Trust Online, in Boston.
- I received an independent Google Cloud Research Grant to support SimPPL.
- JournalismAI team Parrot comprising Ippen Digital (DE) and The Sunday Times (UK) is partnering with SimPPL for studying the spread of potentially manipulated narratives on Twitter from two well-known Russian state-backed media outlets.
- I’ve been invited to attend the IPI World Congress with globally renowned journalists.
- I presented my research on ‘Tools to Limit Misinformation on Social Networks’ at the MIT Media Lab. Recording available here.
- I gave an invited talk at Misinformation Village @ DEFCON 2022. (Coolest conference I’ve been to, hands down!)
- SimPPL has been accepted to be part of the NYU Tech Venture Workshop, 2022!
- I will present our work on studying the causal effects of Twitter’s interventions on Donald Trump’s tweets at the first Stanford Trust and Safety Research Conference in September 2022
- My work with Oxford and CSMAP on studying the effects of influence operations on content ranking and recommendations was accepted for an Oral presentation (top 3/26 papers) at the AI for Agent-based Models Workshop at ICML 2022 – see y’all in Baltimore!
- I’m interning at Twitter’s Civic Integrity team with Curren Pangler and Mridu Atray, working on misinformation detection in tweets for Summer 2022
- After a wonderful 4 months working on SimPPL we showcased our product at Demo Day and launched a pilot with the VTDigger local news org. based in Vermont
- Our workshop “AI for Everyone: Learnings from the Local News Challenge” was accepted at Computation + Journalism 2022, at Columbia University! I will present SimPPL at the session.
- After remotely collaborating for nearly a year, I am spending May 2022 in the UK with Oxford’s Torr Vision Group visiting Prof. Philip Torr and Prof. Atilim Gunes Baydin to work on social networks, disinformation, and recommendation systems. One year into the original offer, we finally beat the pandemic to make it an in-person collaboration :)
- I’ve successfully passed my candidacy-equivalent exam!
- My team, SimPPL, is part of the NYC Media Lab’s AI and Local News Challenge!
- Early work on SimPPL: Simulating Social Network Interventions with Probabilistic Programs was invited for a presentation to:
- Twitter’s Cortex Org.
- Facebook’s Probability Org.
- I received an offer to be a Data Scientist Intern with Microsoft’s Azure ML Team
- My (preliminary) work on COVID-modeling with probabilistic programs with a MS student Noah Kasmanoff has been accepted as a poster at PROBPROG 2021 at MIT
- I received a generous grant from Google Research India to teach a (independent) 10-week Unicode ML Summer Course in Summer 2021 to students from Tier II and Tier III universities in India. Much obliged to my TAs from Unicode Research for their hard work and commitment!
- We had a fantastic Demo Day showcasing student projects at the Unicode ML Summer Course!
- I’m co-lead organizer (with fellow CDS Ph.D. student Angelica Chen) for the 2022 NYU AI School, expanding to over 300 students this year!
- I’m teaching a weekend class (Jan - May 2021) at Unicode Research on Advanced Statistics based on notes from the amazing Cosma Shalizi (videos on YouTube)
- I delivered a talk on probabilistic programming to the Flatiron Computational Biology Group
- I’m co-organizing the NYU AI School 2021
- I’m attending the Nordic Probabilistic AI School, 2021
- I’ve been accepted to the ML Summer School, Taipei, 2021
- My internship research “Open Domain Trending Hashtag Recommendation for Videos” with Adobe Research was accepted at the International Symposium for Multimedia, 2021
2019 - 2020
- I’m a Data Science Research Intern at Adobe Research from May - Dec 2020 working on Trending Hashtag Recommendation for long-form Videos with Dr. Somdeb Sarkhel and Dr. Vishy Swaminathan
- I delivered a talk on Graph Neural Networks and another on Interpretable ML to the BFS Reading Group at the NYU Courant Institute; presenting both at Unicode Research as well
- I was a panelist at the NYU AI School 2020 on their ‘Careers Panel’
- I participated in my first podcast discussing CERN, higher-education, and more with some fantastic students from my alma mater
- I’m grateful to have received the IRIS-HEP Fellowship, 2019
- I launched a collaborative research group expanding a student club that I co-founded in 2017 to train student-software engineers a la Google Summer of Code (DJ Unicode) towards pursuing academic research – Unicode Research
- I’m attending the Deep Learning and Reinforcement Learning Summer School in Canada where I received a full travel scholarship, 2019
- I’ve been accepted to the DeepMind/Transylvanian ML Summer School with a travel scholarship, 2019
- I’ve been accepted to the Berlin Mathematical Summer School, 2019
- I’ve been invited to present our work on DeepJet at the Nvidia GPU Tech Conference, 2019
- I’ve been accepted to the UC Berkeley Deep Learning for Science School with a travel scholarship, 2019
2018: CERN Technical Student
- I won the Google Cloud Award at the Deep Learning Indaba for our work on ML x Particle Physics using DeepJet
- I was selected as one of the youngest attendees and presented DeepJet at the ML Summer School, Madrid
- I placed 2nd in the CodaLab Challenge at the ML for High-energy Physics School organised by Yandex at Oxford University
- I was one of the youngest finalists for the $150,000 Reliance Dhirubhai Scholarship for pursuing an MBA degree at Stanford University (declined)
- I presented our work on DeepJet at the CERN IML Workshop and other Working Group Meetings at CERN
SimPPL: Social Network Simulation
- Talk at Twitter Cortex, Dec 2021. (Twitter Employee-only Access)
- Talk at Facebook Research: Probability Org., Nov 2021.
- Slides Available on Request
I am building SimPPL - a social network simulator to demonstrate how to combine heterogeneous datasets in a principled manner so as to create an expressive model of online social networks that is conditioned on real-world data. It is part of my ongoing research on misinformation control to highlight the applications of such a tool towards understanding the diffusion of information and the evolution of beliefs on platforms like Facebook and Twitter. There are a rich set of downstream applications of such a simulator, including interventions to curb misinformation spread, and the causal modeling of online user behavior.
I am collaborating with the Torr Vision Group at Oxford on the applications of such a simulation tool on estimating the effects of coordinated inauthentic behavior on content recommendations in social networks!
Here are a set of tutorials I delivered at my weekly reading group sessions with some students I mentor at a research group I founded in 2019 called Unicode Research. This is a research effort as part of the larger organization I co-founded in 2017, called DJ Unicode.
- Probabilistic Programming - I based on the Probabilistic AI School
- Probabilistic Programming - II: Social Network Simulation
- My YouTube ProbProg Playlist curating different talks/lectures/presentations/code walkthroughs
I hold a Bachelor’s degree in Computer Engineering and worked at Adobe Research on trending hashtag recommendation (accepted at ISM 2021) and the European Organisation for Nuclear Research (CERN) on particle physics. I’ve worked on graph neural networks, machine learning for high-energy physics, recommender systems, machine learning in cybersecurity, and have a working knowledge of language models and deep convolutional neural networks.
Causal Effects of Interventions on Misinformation
I am interested in examining the impact of interventions taken by social networks to limit the spread of misinformation. In recent work with Jim Bisbee at NYU CSMAP, I extend the analysis of Sanderson et. al, 2021 to analyse the causal effects of warning labels and tweet removal on Twitter, Facebook, Instagram, and Reddit. The results will be presented at the Stanford Internet Observatory’s first Trust and Safety Research Conference, 2022.
Probabilistic Programming and COVID-19 Models
- Poster @ PROBPROG 2022 (S. Mehta, N. Kasmanoff)
- Early Draft on arXiv - Compartmental Models for COVID-19 and Control via Policy Interventions
This project offers an exposition of COVID-19 modeling techniques based on the ideas and problem setup highlighted in Wood et al., (2020). We define a generative model corresponding to our intuition about epidemiological modeling using the probabilistic programming framework Pyro and apply probabilistic inference to draw insights into controlling the COVID-19 pandemic through interventions. In particular, we estimate the confidence intervals for the outbreak parameters to ensure that a predetermined goal is achieved. We are not epidemiologists; the sole aim of this study is to serve as a guide to generative modeling, not to draw inference about real-world impact of policy-making for COVID-19.
Recommender Systems, Graph Neural Networks
- Paper: Open-domain Trending Hashtag Recommendation for Videos, S. Mehta et al., IEEE International Symposium on Multimedia (2021)
- Patent: Under review, Adobe Research
Recommendations determine the type, ranking, and placement of most content appearing on our screen ranging from social networks to e-commerce sites and advertisements. A laser focus on personalization has led to a plethora of issues from bias and lack of interpretability to filter bubbles and echo chambers. My work at Adobe dealt with a zero-shot prediction problem, building a production-ready graph attention-network based system and a novel hashtag matching algorithm that, in combination, effectively matched trending hashtags with relevant videos for improving content discovery via all Adobe products. Part of our motivation was to develop a tool to address the content discovery problem exacerbated by poorly designed recommendations. Furthermore, I am using open-source recommendation systems in SimPPL: A Social Network Simulator with Probabilistic Programs in order to simulate content spread and shilling attacks (bad actors using fake reviews to boost virality) and stimulate research on detection and control.
ML x Particle Physics, Graph Neural Nets
- Zenodo: DeepJetCore, Kieseler et al., 2020
- OpenReview Submission - DeepJet: A Machine Learning Environment for High-energy Physics
I’ve also developed strategies using the cornerstone of artificial intelligence to advance the natural sciences. I used to work on graph-based approaches to particle track reconstruction (similar to the TrackML Challenge on Kaggle) - specifically using the representation of 3D point cloud data as a (lower-dimension) graph followed by training a graph neural network on it, possibly conditioned on additional physical information (meta data). Problems in high-energy physics and science in general prove to be a rich testbed for statistical machine learning and Bayesian inference. It is exciting to see a growing focus on making this area more practical especially as optimization toolkits and features are released within popular frameworks.
Natural Language Processing, 3D Modeling, Cybersecurity
As part of my Bachelor’s thesis, my teammates and I designed a framework to prototype chatbots with context-based question-answering models based on Jack the Reader (ACL, 2018).
I have also built an automated assessment tool for Blender (3D Modeling) Assignments, and a dynamic, automated workflow for an award-winning cybersecurity tool, IllusionBlack.
Knowledge Transfer: DJ Unicode
- Co-founder, 2017 - present, DJ Unicode
I am passionate about knowledge transfer actively working with a student-run organisation that I co-founded. Unicode was born of the need for skill development at the grassroots level in addition to the need for a rapport between college freshmen, sophomores, and juniors at universities that don’t offer such opportunities by means of the coourse structure. Our aim is to extend the ‘summer-of-code’ workflow to the rest of the year helping our students to build a strong foundational understanding of software development. I’m leading the expansion of our mentorship into teaching math and statistics for machine learning through comprehensive reading groups on standard texts in the subject.
Unicode started in 2017 with 15-20 students separated into 5 teams based on their projects. Today, we are a thriving community of 200+ members, with teams winning hackathons, students receiving international internship offers, multiple selections for Google Summer of Code each year, and alumni at Ivy League universities and FAANG companies in the USA!
Founder, 2020 - Present, Unicode Research Group
- Advanced Statistics Reading Group, 2021: Shalizi Stats
- Google Research funded 10-week Unicode Machine Learning Summer Course, 2021: website
I founded a research arm within Unicode, focused on doing collaborative research in statistics and machine learning, with particular emphasis on AI for social good. This includes extensions of projects by students in our ML Summer Course, and ideas by Unicode students and collaborators. I was joined by Dr. Akash Srivastava from the MIT-IBM AI Lab to help teach the students about deep generative models in a palatable fashion, introducing them to probabilistic machine learning.
Ongoing research projects at Unicode Research include estimating the causal effect of mentorship on student career outcomes, social network analysis using probabilistic machine learning, and other topics in deep generative modeling.
Look up my tech articles published in the Open Source for You (OSFY) Magazine
I (like to think that I) am an artist.
- Apart from cooking, and biking, I spend time:
I enjoy participating in hackathons where you are likely to find me scrounging food around midnight. I’m partial to a steaming cup of sweet, milky tea (also termed ‘cutting chai’ by the Indian streetside tea stalls).
- I like to run the occasional marathon.