Research and Major Project Topics
Swapneel Mehta, Ph.D.
Cofounder, SimPPL
Postdoc, Boston University & MIT
March 2026
SimPPL: Rebuilding Digital Trust
SimPPL is a global community of 200+ engineers and researchers working to make reliable information accessible for the global majority. We are a U.S. 501(c)(3) nonprofit that builds responsible computing tools and publishes research at top venues.
$500K+
Raised with collaborators
(Google, Mozilla, Ford, Omidyar)
16+
Publications
(AAAI, NeurIPS, ICML, ICWSM)
8
Countries
with active partnerships
Accepted into the Fast Forward Tech Nonprofit Accelerator (previous: Amnesty International, Allen Institute for AI, ICFJ). Selected for UNDP AI Trust and Safety inaugural cohort.
Sakhi: Health Literacy on Mobile
We asked how to deliver reliable information for critical healthcare needs in multilingual contexts? That research question led to Sakhi: a multilingual platform delivering verified women's health information to mobile phones.
What was built
- Mobile multimodal messaging for health education
- Monitoring dashboards for last-mile care delivery
- Gamified rewards for community health workers
- Multilingual Q&A for reproductive health (1000 Q&A dataset)
Where it went
- RCT with 100 families in Jalgaon, Maharashtra over 2 years
- Expanded to 250 families in Bangladesh for menstrual health
- Presented at Psychology of Technology Conference in DC
- Technical evaluations with Cohere (co-authoring publication)
The Students Behind Sakhi
Mrunmayi Parkar
Former Program Manager and Research Engineer at SimPPL. Led the Sakhi team. Selected for MIT IDEAS Social Innovation Challenge.
Now: TPM Intern and soon full-time at Google, MS CS at UT Dallas.
Nahush Patil
Former Research Engineer at SimPPL. Part of the Sakhi team. Part of team that won the MIT PKG Center 10K Amazon Prize for Social Good.
Now: Intern at an industrial engg. firm and now full-time. MS CS at UT Dallas.
Utkarsh Verma
Software Developer, now senior member of the ML Engineering team at SimPPL.
B.E. Computer Engineering from DJ Sanghvi College of Engineering (2021-2025). One of your own.
All three were undergraduates when they built this. Side project, then research project at top venues, then a product serving 450 families across India and Bangladesh.
InfluenceCheck: Verifying Influencer Claims
From that same survey, we asked: which influencers are shaping what young people believe about health and finance, and are their claims actually true?
What was built
- A system that verifies claims influencers make in their online videos across Instagram
- Focused on health and finance sectors, where misleading claims cause real harm
- Automated claim extraction from video transcripts + fact-checking pipeline
Where it went
- Presented to the head of Jagran New Media, who was also head of the International Fact-Checking Network (IFCN)
- This person worked collaboratively with the team for a year
- Presented at the India AI Summit to Ashwini Vaishnaw (Minister of MeitY)
- The Jagran head left his organization to launch a startup around this product
The Students Behind InfluenceCheck
Dhvani Shah
Built the InfluenceCheck system. Worked directly with the head of the IFCN on influencer claim verification. Now working with NYU Data Science professor, former hedge fund manager to study prediction markets.
Currently: Still an undergraduate, wrapping up her final year.
Atmik Shetty
ML Engineer at SimPPL. Co-built InfluenceCheck. Specializes in NLP, LLM inferencing, and optimization.
B.E. IT, St. Francis Institute of Technology (2021-2025). Recently graduated.
A product built by two undergraduates convinced the former President of one of the world's most important fact-checking organization to leave his job and build a startup around it. It does not take preexisting knowledge to do good research. It takes determination and an open mind.
Real Talk: What Good Research Looks Like
The problem I see
- Students publishing at low-quality journals because it seems easier
- To everyone outside your college, more papers at bad venues = less credibility, not more. You are self-selecting into a community that does not care about research but about posturing
- The bar for NeurIPS, ICML, and AAAI workshop papers is genuinely achievable in 4-6 months with a good research problem
What matters more than papers
- Write a really good technical blog post and release a well-documented library. LLMs will use your library without users even knowing, and you can cite that as impact
- This is a day and age of builders. If you are not building, ask yourself why. It is clearly not technology stopping you
- Research is about learning new things about the world, not about writing papers. The point is exploration and genuine curiosity
When I was in your position, I did research into machine learning and nuclear physics. I had neither expertise nor any idea about the value of either. But doing that research taught me I enjoy working with large datasets, optimization, and productionizing code. The research you do does not have to dictate your career goals.
The Job Market Right Now
What the data says
- 40% of jobs globally exposed to AI, 60% in advanced economies (IMF, 2024). Exposure is not displacement, but it is task redesign
- AI increased productivity 14% in customer support (NBER) and 37% faster writing tasks (SSRN). Biggest gains for less experienced workers
- Heavy AI use makes junior developers less capable of supervising AI effectively (Anthropic)
- Value capture goes to integrators and infrastructure, not model builders. Foundation models are commoditizing
What I learned interviewing
I interviewed at OpenAI, Anthropic, and DeepMind for data science and research engineer roles. Here is what I learned:
- Debugging is the new entry-level skill. Programming is assumed, not differentiating. The bar has shifted to debugging code and planning architecture
- LeetCode-style interviews are being replaced by debugging exercises on platforms
- What differentiates you: systems thinking, the ability to plan, and building things that work in production
Skills That Still Matter
| What companies test for |
Why it matters |
How to build it |
| Debugging |
AI generates code, humans fix it. Entry-level is no longer writing code |
Debug on platforms, read others' code, contribute to open source |
| Planning & architecture |
Knowing what to build before building it. AI cannot decide what matters |
Design systems before coding. Write design docs. Lead a project |
| Building products |
A shipped product with real users > 10 papers at low-tier venues |
Ship something. Deploy it. Get 5 real users. Iterate on feedback |
| Research taste |
Knowing which problems are worth solving. Comes from reading good papers |
Read 2 papers/week from top venues. Follow researchers. Attend talks |
| Communication |
If you cannot explain what you built, it does not exist to anyone else |
Write blog posts. Give talks. Document projects with care |
"How much of this do you do?"
How We Do Research at SimPPL
Project pitch (1 page, 4 questions)
- What is the idea? (2 sentences max)
- Why is it important? (2 sentences max)
- What have others done and how is this different? (review 4-6 papers)
- What experiment will highlight this difference?
Methodology from Rajesh Ranganath (NYU), Dan Fu and Jennifer Widom (Stanford)
Collaborative process
- First meeting: write a set of questions together
- Spend time reading and reviewing papers! Refine your research question before you start looking for answers.
- Meet 1-3 times/week, iterate for 12-36 weeks
- First authors handle the hardest subtasks + organizing
- Last author does mentoring, feedback, and guidance including several 1-2 hour live editing walkthrough
Tools we use: Overleaf, Zotero, cloud compute. Tools you should learn: Google Scholar, Semantic Scholar, Elicit, Perplexity for literature reviews. AlphaXiv for paper discussions.
Arbiter: Our Research Platform
Arbiter is an open investigative platform for cross-platform discourse analysis. We analyze posts across X, YouTube, Reddit, Bluesky, and TikTok. Our partners include Deutsche Welle Akademie (Kenya), NEST Center (Mongolia), Jagran New Media (India), and New York Public Radio (US).
Collection
→
Embedding
→
Retrieval
→
Clustering
→
Labeling
→
AI Agent
Each stage in this pipeline involves open research questions. I'll present some research questions that will make it a little more intuitive to you what I believe is worth studying for a major project and use Arbiter as an example to motivate those questions.
What makes it different
Cross-platform harms tracing (not just one social network). Designed for journalists and researchers. Investigative social intelligence.
Real-world outcomes
Contributed to Meta's takedown of Bangladeshi networks. Twitter/X Site Integrity followed up on accounts we identified. $15K quarterly revenue.
Better Retrieval for Social Media
When a journalist asks "show me posts about election manipulation in the Philippines," how do you find the right posts from millions of candidates?
What exists and why it falls short
- Keyword matching misses relevant posts using different words for the same concept
- Semantic search returns too many vaguely related results
- Social media text is short and noisy, so standard query expansion adds more noise than signal
Research questions
- How do you evaluate retrieval quality on social media data where no gold-standard relevance set exists?
- Can you build better query expansion for non-English languages where training data is sparse?
- How does retrieval precision change across platforms with different post formats?
NLP Information Retrieval Elasticsearch
Example: What Retrieval Looks Like
A journalist in Kenya types "Ibrahim Traore Burkina Faso" into Arbiter. Two words in, and the system needs to find dozens of related concepts across platforms in multiple languages.
| She typed | What the system also needs to find |
| "Ibrahim Traore" | Actors: "Captain Traoré," "Capitaine Traoré," "IB" |
| "Burkina Faso" | Organizations: MPSR, Alliance of Sahel States, CNSP |
| (nothing typed) | Events: Wagner Group departure, Sahel sovereignty movement |
| (nothing typed) | Phrases: "military junta," "pan-African sovereignty" |
We analyzed 974 YouTube posts and 1,117 Twitter posts about Traoré. YouTube showed templated promotional accounts using identical sentence structures with only the positive claim swapped out, consistent with AI-generated content. Twitter showed more organic political discourse.
Theme Discovery from Posts
Given thousands of social media posts, how do you automatically discover what people are talking about and label those topics in a way that is actually useful?
Current state of the art
- Traditional topic models (LDA) struggle with short text
- LLMs generate labels but repeat themselves: "IndiGo Flight Disruptions" appeared in 28 out of 53 labels
- We solved this using embedding geometry to guarantee 100% unique labels
Research questions
- Can you improve clustering for multilingual data where embeddings are weaker?
- What is the right way to evaluate whether a topic label is "good"?
- Can you do theme discovery in real-time as posts arrive?
Clustering UMAP HDBSCAN LLMs
AI Agents for Analysis
Can we build an AI assistant that helps journalists analyze social media data by calling the right tools at the right time? Arbiter's agent uses GPT-4o with 7 tools. We used GEPA (ICLR 2026) to optimize its system prompt for $5.77 total.
What is still hard
- Getting agents to pick the right tool for ambiguous queries
- Evaluating correctness (not just whether it ran without errors)
- Preventing hallucination when agents produce charts people trust
Research questions
- Can you build specialized agents for specific journalism tasks?
- How do you design evaluation benchmarks where policy compliance conflicts with task completion?
- What happens when you red-team an agent with prompt injection?
Agents Tool-Calling Evaluation
Example: AI Agent Investigation
A journalist asks: "Are there accounts coordinating to promote banned trading platforms across YouTube and Twitter?"
The agent chains 5 tools automatically:
- searchPosts retrieves mentions of Exness, Quotex, Pocket Option across all platforms
- getThemeActors identifies which accounts post most within promotional themes
- getActorTimeline checks posting frequency to surface coordinated scheduling
- getTopicStance separates promotional from educational content
- compareAcrossPlatforms checks if the same accounts appear on other platforms
Result: 6,345 posts analyzed. Promotion is concentrated almost entirely on YouTube. Exness (banned by SEBI in India): 102 YouTube posts. Quotex (banned in EU): 99. These accounts disguise promotion as financial education.
Coordinated Network Detection
How do you find groups of accounts working together to spread misleading information? Our Parrot tool analyzed 70M tweets from 14M accounts. Twitter/X's Site Integrity lead followed up. On Meta, our analysis of 600 public Facebook pages led to a takedown of Bangladeshi harassment networks.
Real impact from student work
- 600 pages, 95M views, 500K posts analyzed on Meta
- 4,500 Telegram channels spreading pro-Russian disinformation identified
- Accepted at Stanford T&S and Underground Economy Conference
Research questions
- Can you detect coordination across platforms, not just within one?
- How do you distinguish organic consensus from manufactured coordination?
- Can temporal GNNs scale to millions of nodes in near-real-time?
Graph Analysis Network Science GNNs
Multilingual Analysis
83% of NLP misinformation research focused on monolingual high-resource languages. Social media in India is full of code-mixing and transliteration. If you speak a non-English language, you have a genuine research advantage that most Western labs lack.
Why this matters for Arbiter
- Partners in Kenya, India, Mongolia, Bangladesh need analysis in their languages
- LLMs are overconfident in languages where they perform worst (Nature Sci. Reports, 2026)
- Even small annotated datasets are publishable contributions
Research questions
- Does translate-then-retrieve or retrieve-in-native-language work better for RAG?
- Can you build a code-mixing-aware sentiment analyzer for Hindi-English?
- How does Arbiter's clustering degrade on non-English posts?
Multilingual NLP RAG Code-Mixing
Algorithmic Auditing Tools
A CHI 2025 paper found 435 AI auditing tools but none that support the full audit lifecycle. India's DPDP Act is being implemented. The EU requires algorithmic audits. The infrastructure to do them does not exist.
Why this is an engineering problem
- Building usable audit tools requires good engineering, not novel ML
- Exactly the kind of work undergraduate engineers are good at
- FAccT 2026 lists "audits and assurance testing" as a focus area
Research questions
- Can you build an open-source tool that automates data schema documentation for auditors?
- How would you test a recommendation algorithm for differential treatment across user profiles?
- What does an "inspectability API" for Arbiter look like?
Engineering Policy Transparency
AI-Generated Content Detection
Deepfake videos grew from ~500K (2023) to ~8M (2025). The bigger problem is "cheap fakes": real images paired with misleading captions. Current detectors focus on pixel-level forgery and miss semantic mismatch entirely.
What is broken
- Detectors trained on one generation model fail on newer models
- False positives on edited authentic content create legal problems
- Text-based AI detectors cannot reliably distinguish human from AI text (Nature Communications, 2025)
What you could build
- Benchmark AI-text detectors on code-mixed content
- Test false positive rates on edited journalistic photos
- Build a C2PA content provenance tracker
Crowdsourced Fact-Checking Systems
Meta, YouTube, and TikTok launched Community Notes-style systems. X open-sources its algorithm and data. Do bridging-based algorithms systematically fail on polarizing content?
What you could build
- Measure latency gap between note creation and viral peak
- Compare note quality on polarizing vs. non-polarizing topics
- Test LLM-augmented note-writing
Simulating Social Systems with AI
MIT Media Lab's AgentTorch (AAMAS 2025) built a digital twin of NYC with 8.4M agents. What's interesting about this?
What you could build
- Simulate misinfo spread through WhatsApp groups during an election
- Validate simulation predictions against fact-checker databases
- Test proposed moderation policies in simulation before deployment
SimPPL connection
- Social network simulation at ICML '23
- Marketplace design at IC2S2 '24
- Transparency regulation at ACM DGO '24
Which Areas Fit Your Skills?
| Research Area |
ML Depth |
Data Access |
Publish Where |
| Retrieval |
Medium |
High (Arbiter data) |
SIGIR, ACL, EMNLP |
| Theme Discovery |
Medium |
High (Arbiter data) |
ICWSM, ACL, EMNLP |
| AI Agents |
Low-Medium |
High (build your own) |
NeurIPS, ICML, FAccT |
| Network Detection |
Medium |
Medium (API limits) |
WebSci, ICWSM, WWW |
| Multilingual Analysis |
Medium |
Low (must annotate) |
ACL, EACL SRW, EMNLP |
| Algorithmic Auditing |
Low |
High (public systems) |
CHI, FAccT |
If you can code but lack deep ML expertise,
AI agents, algorithmic auditing, and multilingual analysis are the most accessible starting points. The
EACL 2026 Student Research Workshop explicitly welcomes undergraduate submissions.
What Makes a Good Researcher - I
1. Curiosity
If you are not curious, do not do research. You will waste your time and your collaborators' time. Do engineering instead. You will both benefit more and be happier.
2. Time
Most people seem to think research is a side pursuit. It is not. Research is a full-time job, and good research gets you both visibility and hired at top companies.
What Makes a Good Researcher - II
3. Determination
You will keep hitting wall after wall. Nothing works, compute is too expensive, ideas are a dime a dozen, Claude Code can solve a problem faster than you. If you do research just for the heck of it, you will quit after the third wall. Or you will compromise and write a meaningless paper that never gets noted anywhere. That is a waste of your time and that of others.
4. Creativity
There is seldom a linear solution in good research. Creativity generally increases from having time on your hands and having curiosity about learning new ideas. Without those, it is really hard to be creative.
Appendix
Additional slides for reference
All Products Built by Students
Parrot
Coordinated network detection. 10M+ accounts. Wikimedia award. Times/Sunday Times funding.
Arbiter
Cross-platform social listening. 1B+ posts. Semantic search, alerts, AI agent.
Sakhi
Multilingual health literacy. 450 families in India and Bangladesh.
Audience Analytics
GenAI for newsroom analytics. Pilot at NYPR, expanding to LION Network.
Audio Search
Multimodal search for podcasts. 50+ languages. Built with UN agencies.
InfluenceCheck
Verify influencer claims in health/finance videos. Presented at India AI Summit.
Impact and Partnerships
$500K+
Raised with collaborators
Partners: Deutsche Welle, Jagran New Media, NEST Center, Spreeha Foundation, Migrasia, VTDigger, New York Public Radio, The Times, United Nations, UN Global Pulse, Tattle, TechGlobal Institute, Yale News
Presented at: UNESCO, Stanford T&S, MIT Media Lab, Columbia, NYU, Swiss Embassy, Embassy of Finland, World Economic Forum
NextGenAI Fellowship Program
6-12 month programs to train 200+ undergraduate students from the global majority to build and launch responsible computing tools.
- Unicode: Programming Community
- Shalizi Stats Reading Group: Advanced Statistics
- Unicode ML Summer Course: Machine Learning
- NYU AI School: AI/ML education for non-STEM majors
- NYU AI, Misinformation, and Policy Seminar
Outcomes: 8+ top-tier publications, partnerships in 4 countries, USD 132,000 in competitive global awards.
NextGenAI Fellowship
What we look for
- Students who want to build real products that real people use
- Comfort with Python or JavaScript (we teach the rest)
- Curiosity about how information spreads online
- Willingness to read papers, run experiments, and iterate
What you get
- Co-authorship on publications at top venues
- Your code deployed and used by journalists in 8 countries
- Mentorship from researchers at MIT, NYU, Oxford, and BU
- Access to Overleaf, Zotero, compute, and real datasets
No current iterations. Past program details at nextgenai.simppl.org
Four Research Pillars at SimPPL
Misleading Claims
Twitter, Meta, YouTube, Telegram, Truth Social, Bluesky, Wikipedia
User Behavior
Decentralized platforms, influencer strategies, political transcendence
Social Media Policy
Transparency regulation, platform interventions, shared language
Safety by Design
Recommendation algorithms, marketplace design, causal effects