Research and Major Project Topics

Swapneel Mehta, Ph.D.
Cofounder, SimPPL
Postdoc, Boston University & MIT
March 2026

SimPPL: Rebuilding Digital Trust

SimPPL is a global community of 200+ engineers and researchers working to make reliable information accessible for the global majority. We are a U.S. 501(c)(3) nonprofit that builds responsible computing tools and publishes research at top venues.

$500K+

Raised with collaborators
(Google, Mozilla, Ford, Omidyar)

16+

Publications
(AAAI, NeurIPS, ICML, ICWSM)

Countries
with active partnerships

Accepted into the Fast Forward Tech Nonprofit Accelerator (previous: Amnesty International, Allen Institute for AI, ICFJ). Selected for UNDP AI Trust and Safety inaugural cohort.

Sakhi: Health Literacy on Mobile

We asked how to deliver reliable information for critical healthcare needs in multilingual contexts? That research question led to Sakhi: a multilingual platform delivering verified women's health information to mobile phones.

What was built

Mobile multimodal messaging for health education
Monitoring dashboards for last-mile care delivery
Gamified rewards for community health workers
Multilingual Q&A for reproductive health (1000 Q&A dataset)

Where it went

RCT with 100 families in Jalgaon, Maharashtra over 2 years
Expanded to 250 families in Bangladesh for menstrual health
Presented at Psychology of Technology Conference in DC
Technical evaluations with Cohere (co-authoring publication)

The Students Behind Sakhi

Mrunmayi Parkar

Former Program Manager and Research Engineer at SimPPL. Led the Sakhi team. Selected for MIT IDEAS Social Innovation Challenge.

Now: TPM Intern and soon full-time at Google, MS CS at UT Dallas.

Nahush Patil

Former Research Engineer at SimPPL. Part of the Sakhi team. Part of team that won the MIT PKG Center 10K Amazon Prize for Social Good.

Now: Intern at an industrial engg. firm and now full-time. MS CS at UT Dallas.

Utkarsh Verma

Software Developer, now senior member of the ML Engineering team at SimPPL.

B.E. Computer Engineering from DJ Sanghvi College of Engineering (2021-2025). One of your own.

  All three were undergraduates when they built this. Side project, then research project at top venues, then a product serving 450 families across India and Bangladesh.

InfluenceCheck: Verifying Influencer Claims

From that same survey, we asked: which influencers are shaping what young people believe about health and finance, and are their claims actually true?

What was built

A system that verifies claims influencers make in their online videos across Instagram
Focused on health and finance sectors, where misleading claims cause real harm
Automated claim extraction from video transcripts + fact-checking pipeline

Where it went

Presented to the head of Jagran New Media, who was also head of the International Fact-Checking Network (IFCN)
This person worked collaboratively with the team for a year
Presented at the India AI Summit to Ashwini Vaishnaw (Minister of MeitY)
The Jagran head left his organization to launch a startup around this product

The Students Behind InfluenceCheck

Dhvani Shah

Built the InfluenceCheck system. Worked directly with the head of the IFCN on influencer claim verification. Now working with NYU Data Science professor, former hedge fund manager to study prediction markets.

Currently: Still an undergraduate, wrapping up her final year.

Atmik Shetty

ML Engineer at SimPPL. Co-built InfluenceCheck. Specializes in NLP, LLM inferencing, and optimization.

B.E. IT, St. Francis Institute of Technology (2021-2025). Recently graduated.

  A product built by two undergraduates convinced the former President of one of the world's most important fact-checking organization to leave his job and build a startup around it. It does not take preexisting knowledge to do good research. It takes determination and an open mind.

Real Talk: What Good Research Looks Like

The problem I see

Students publishing at low-quality journals because it seems easier
To everyone outside your college, more papers at bad venues = less credibility, not more. You are self-selecting into a community that does not care about research but about posturing
The bar for NeurIPS, ICML, and AAAI workshop papers is genuinely achievable in 4-6 months with a good research problem

What matters more than papers

Write a really good technical blog post and release a well-documented library. LLMs will use your library without users even knowing, and you can cite that as impact
This is a day and age of builders. If you are not building, ask yourself why. It is clearly not technology stopping you
Research is about learning new things about the world, not about writing papers. The point is exploration and genuine curiosity

  When I was in your position, I did research into machine learning and nuclear physics. I had neither expertise nor any idea about the value of either. But doing that research taught me I enjoy working with large datasets, optimization, and productionizing code. The research you do does not have to dictate your career goals.

The Job Market Right Now

What the data says

40% of jobs globally exposed to AI, 60% in advanced economies (IMF, 2024). Exposure is not displacement, but it is task redesign
AI increased productivity 14% in customer support (NBER) and 37% faster writing tasks (SSRN). Biggest gains for less experienced workers
Heavy AI use makes junior developers less capable of supervising AI effectively (Anthropic)
Value capture goes to integrators and infrastructure, not model builders. Foundation models are commoditizing

What I learned interviewing

I interviewed at OpenAI, Anthropic, and DeepMind for data science and research engineer roles. Here is what I learned:

Debugging is the new entry-level skill. Programming is assumed, not differentiating. The bar has shifted to debugging code and planning architecture
LeetCode-style interviews are being replaced by debugging exercises on platforms
What differentiates you: systems thinking, the ability to plan, and building things that work in production

Skills That Still Matter

What companies test for	Why it matters	How to build it
Debugging	AI generates code, humans fix it. Entry-level is no longer writing code	Debug on platforms, read others' code, contribute to open source
Planning & architecture	Knowing what to build before building it. AI cannot decide what matters	Design systems before coding. Write design docs. Lead a project
Building products	A shipped product with real users > 10 papers at low-tier venues	Ship something. Deploy it. Get 5 real users. Iterate on feedback
Research taste	Knowing which problems are worth solving. Comes from reading good papers	Read 2 papers/week from top venues. Follow researchers. Attend talks
Communication	If you cannot explain what you built, it does not exist to anyone else	Write blog posts. Give talks. Document projects with care

"How much of this do you do?"

How We Do Research at SimPPL

Project pitch (1 page, 4 questions)

What is the idea? (2 sentences max)
Why is it important? (2 sentences max)
What have others done and how is this different? (review 4-6 papers)
What experiment will highlight this difference?

Methodology from Rajesh Ranganath (NYU), Dan Fu and Jennifer Widom (Stanford)

Collaborative process

First meeting: write a set of questions together
Spend time reading and reviewing papers! Refine your research question before you start looking for answers.
Meet 1-3 times/week, iterate for 12-36 weeks
First authors handle the hardest subtasks + organizing
Last author does mentoring, feedback, and guidance including several 1-2 hour live editing walkthrough

  Tools we use: Overleaf, Zotero, cloud compute. Tools you should learn: Google Scholar, Semantic Scholar, Elicit, Perplexity for literature reviews. AlphaXiv for paper discussions.

Arbiter: Our Research Platform

Arbiter is an open investigative platform for cross-platform discourse analysis. We analyze posts across X, YouTube, Reddit, Bluesky, and TikTok. Our partners include Deutsche Welle Akademie (Kenya), NEST Center (Mongolia), Jagran New Media (India), and New York Public Radio (US).

Collection → Embedding → Retrieval → Clustering → Labeling → AI Agent

Each stage in this pipeline involves open research questions. I'll present some research questions that will make it a little more intuitive to you what I believe is worth studying for a major project and use Arbiter as an example to motivate those questions.

What makes it different

Cross-platform harms tracing (not just one social network). Designed for journalists and researchers. Investigative social intelligence.

Real-world outcomes

Contributed to Meta's takedown of Bangladeshi networks. Twitter/X Site Integrity followed up on accounts we identified. $15K quarterly revenue.

Better Retrieval for Social Media

When a journalist asks "show me posts about election manipulation in the Philippines," how do you find the right posts from millions of candidates?

What exists and why it falls short

Keyword matching misses relevant posts using different words for the same concept
Semantic search returns too many vaguely related results
Social media text is short and noisy, so standard query expansion adds more noise than signal

Research questions

How do you evaluate retrieval quality on social media data where no gold-standard relevance set exists?
Can you build better query expansion for non-English languages where training data is sparse?
How does retrieval precision change across platforms with different post formats?

NLP Information Retrieval Elasticsearch

Example: What Retrieval Looks Like

A journalist in Kenya types "Ibrahim Traore Burkina Faso" into Arbiter. Two words in, and the system needs to find dozens of related concepts across platforms in multiple languages.

She typed	What the system also needs to find
"Ibrahim Traore"	Actors: "Captain Traoré," "Capitaine Traoré," "IB"
"Burkina Faso"	Organizations: MPSR, Alliance of Sahel States, CNSP
(nothing typed)	Events: Wagner Group departure, Sahel sovereignty movement
(nothing typed)	Phrases: "military junta," "pan-African sovereignty"

  We analyzed 974 YouTube posts and 1,117 Twitter posts about Traoré. YouTube showed templated promotional accounts using identical sentence structures with only the positive claim swapped out, consistent with AI-generated content. Twitter showed more organic political discourse.

Theme Discovery from Posts

Given thousands of social media posts, how do you automatically discover what people are talking about and label those topics in a way that is actually useful?

Current state of the art

Traditional topic models (LDA) struggle with short text
LLMs generate labels but repeat themselves: "IndiGo Flight Disruptions" appeared in 28 out of 53 labels
We solved this using embedding geometry to guarantee 100% unique labels

Research questions

Can you improve clustering for multilingual data where embeddings are weaker?
What is the right way to evaluate whether a topic label is "good"?
Can you do theme discovery in real-time as posts arrive?

Clustering UMAP HDBSCAN LLMs

AI Agents for Analysis

Can we build an AI assistant that helps journalists analyze social media data by calling the right tools at the right time? Arbiter's agent uses GPT-4o with 7 tools. We used GEPA (ICLR 2026) to optimize its system prompt for $5.77 total.

What is still hard

Getting agents to pick the right tool for ambiguous queries
Evaluating correctness (not just whether it ran without errors)
Preventing hallucination when agents produce charts people trust

Research questions

Can you build specialized agents for specific journalism tasks?
How do you design evaluation benchmarks where policy compliance conflicts with task completion?
What happens when you red-team an agent with prompt injection?

Agents Tool-Calling Evaluation

Example: AI Agent Investigation

A journalist asks: "Are there accounts coordinating to promote banned trading platforms across YouTube and Twitter?"

The agent chains 5 tools automatically:

searchPosts retrieves mentions of Exness, Quotex, Pocket Option across all platforms
getThemeActors identifies which accounts post most within promotional themes
getActorTimeline checks posting frequency to surface coordinated scheduling
getTopicStance separates promotional from educational content
compareAcrossPlatforms checks if the same accounts appear on other platforms

  Result: 6,345 posts analyzed. Promotion is concentrated almost entirely on YouTube. Exness (banned by SEBI in India): 102 YouTube posts. Quotex (banned in EU): 99. These accounts disguise promotion as financial education.

Coordinated Network Detection

How do you find groups of accounts working together to spread misleading information? Our Parrot tool analyzed 70M tweets from 14M accounts. Twitter/X's Site Integrity lead followed up. On Meta, our analysis of 600 public Facebook pages led to a takedown of Bangladeshi harassment networks.

Real impact from student work

600 pages, 95M views, 500K posts analyzed on Meta
4,500 Telegram channels spreading pro-Russian disinformation identified
Accepted at Stanford T&S and Underground Economy Conference

Research questions

Can you detect coordination across platforms, not just within one?
How do you distinguish organic consensus from manufactured coordination?
Can temporal GNNs scale to millions of nodes in near-real-time?

Graph Analysis Network Science GNNs

Multilingual Analysis

83% of NLP misinformation research focused on monolingual high-resource languages. Social media in India is full of code-mixing and transliteration. If you speak a non-English language, you have a genuine research advantage that most Western labs lack.

Why this matters for Arbiter

Partners in Kenya, India, Mongolia, Bangladesh need analysis in their languages
LLMs are overconfident in languages where they perform worst (Nature Sci. Reports, 2026)
Even small annotated datasets are publishable contributions

Research questions

Does translate-then-retrieve or retrieve-in-native-language work better for RAG?
Can you build a code-mixing-aware sentiment analyzer for Hindi-English?
How does Arbiter's clustering degrade on non-English posts?

Multilingual NLP RAG Code-Mixing

Algorithmic Auditing Tools

A CHI 2025 paper found 435 AI auditing tools but none that support the full audit lifecycle. India's DPDP Act is being implemented. The EU requires algorithmic audits. The infrastructure to do them does not exist.

Why this is an engineering problem

Building usable audit tools requires good engineering, not novel ML
Exactly the kind of work undergraduate engineers are good at
FAccT 2026 lists "audits and assurance testing" as a focus area

Research questions

Can you build an open-source tool that automates data schema documentation for auditors?
How would you test a recommendation algorithm for differential treatment across user profiles?
What does an "inspectability API" for Arbiter look like?

Engineering Policy Transparency

AI-Generated Content Detection

Deepfake videos grew from ~500K (2023) to ~8M (2025). The bigger problem is "cheap fakes": real images paired with misleading captions. Current detectors focus on pixel-level forgery and miss semantic mismatch entirely.

What is broken

Detectors trained on one generation model fail on newer models
False positives on edited authentic content create legal problems
Text-based AI detectors cannot reliably distinguish human from AI text (Nature Communications, 2025)

What you could build

Benchmark AI-text detectors on code-mixed content
Test false positive rates on edited journalistic photos
Build a C2PA content provenance tracker

Crowdsourced Fact-Checking Systems

Meta, YouTube, and TikTok launched Community Notes-style systems. X open-sources its algorithm and data. Do bridging-based algorithms systematically fail on polarizing content?

What you could build

Measure latency gap between note creation and viral peak
Compare note quality on polarizing vs. non-polarizing topics
Test LLM-augmented note-writing

Key references

Simulating Social Systems with AI

MIT Media Lab's AgentTorch (AAMAS 2025) built a digital twin of NYC with 8.4M agents. What's interesting about this?

What you could build

Simulate misinfo spread through WhatsApp groups during an election
Validate simulation predictions against fact-checker databases
Test proposed moderation policies in simulation before deployment

SimPPL connection

Social network simulation at ICML '23
Marketplace design at IC2S2 '24
Transparency regulation at ACM DGO '24

Which Areas Fit Your Skills?

Research Area	ML Depth	Data Access	Publish Where
Retrieval	Medium	High (Arbiter data)	SIGIR, ACL, EMNLP
Theme Discovery	Medium	High (Arbiter data)	ICWSM, ACL, EMNLP
AI Agents	Low-Medium	High (build your own)	NeurIPS, ICML, FAccT
Network Detection	Medium	Medium (API limits)	WebSci, ICWSM, WWW
Multilingual Analysis	Medium	Low (must annotate)	ACL, EACL SRW, EMNLP
Algorithmic Auditing	Low	High (public systems)	CHI, FAccT

If you can code but lack deep ML expertise, AI agents, algorithmic auditing, and multilingual analysis are the most accessible starting points. The EACL 2026 Student Research Workshop explicitly welcomes undergraduate submissions.

What Makes a Good Researcher - I

1. Curiosity

If you are not curious, do not do research. You will waste your time and your collaborators' time. Do engineering instead. You will both benefit more and be happier.

2. Time

Most people seem to think research is a side pursuit. It is not. Research is a full-time job, and good research gets you both visibility and hired at top companies.

What Makes a Good Researcher - II

3. Determination

You will keep hitting wall after wall. Nothing works, compute is too expensive, ideas are a dime a dozen, Claude Code can solve a problem faster than you. If you do research just for the heck of it, you will quit after the third wall. Or you will compromise and write a meaningless paper that never gets noted anywhere. That is a waste of your time and that of others.

4. Creativity

There is seldom a linear solution in good research. Creativity generally increases from having time on your hands and having curiosity about learning new ideas. Without those, it is really hard to be creative.

Thank You

Questions?

Arbiter · Our Student Fellowship Program · simppl.org

Appendix

Additional slides for reference

APPENDIX

All Products Built by Students

Parrot

Coordinated network detection. 10M+ accounts. Wikimedia award. Times/Sunday Times funding.

Arbiter

Cross-platform social listening. 1B+ posts. Semantic search, alerts, AI agent.

Sakhi

Multilingual health literacy. 450 families in India and Bangladesh.

Audience Analytics

GenAI for newsroom analytics. Pilot at NYPR, expanding to LION Network.

Audio Search

Multimodal search for podcasts. 50+ languages. Built with UN agencies.

InfluenceCheck

Verify influencer claims in health/finance videos. Presented at India AI Summit.

APPENDIX

Impact and Partnerships

100M+

Global News Views

$500K+

Raised with collaborators

16+

Publications

40+

Talks globally

Partners: Deutsche Welle, Jagran New Media, NEST Center, Spreeha Foundation, Migrasia, VTDigger, New York Public Radio, The Times, United Nations, UN Global Pulse, Tattle, TechGlobal Institute, Yale News

Presented at: UNESCO, Stanford T&S, MIT Media Lab, Columbia, NYU, Swiss Embassy, Embassy of Finland, World Economic Forum

APPENDIX

NextGenAI Fellowship Program

6-12 month programs to train 200+ undergraduate students from the global majority to build and launch responsible computing tools.

Unicode: Programming Community
Shalizi Stats Reading Group: Advanced Statistics
Unicode ML Summer Course: Machine Learning
NYU AI School: AI/ML education for non-STEM majors
NYU AI, Misinformation, and Policy Seminar

  Outcomes: 8+ top-tier publications, partnerships in 4 countries, USD 132,000 in competitive global awards.

APPENDIX

NextGenAI Fellowship

What we look for

Students who want to build real products that real people use
Comfort with Python or JavaScript (we teach the rest)
Curiosity about how information spreads online
Willingness to read papers, run experiments, and iterate

What you get

Co-authorship on publications at top venues
Your code deployed and used by journalists in 8 countries
Mentorship from researchers at MIT, NYU, Oxford, and BU
Access to Overleaf, Zotero, compute, and real datasets

No current iterations. Past program details at nextgenai.simppl.org

APPENDIX

Four Research Pillars at SimPPL

Misleading Claims

Twitter, Meta, YouTube, Telegram, Truth Social, Bluesky, Wikipedia

User Behavior

Decentralized platforms, influencer strategies, political transcendence

Social Media Policy

Transparency regulation, platform interventions, shared language

Safety by Design

Recommendation algorithms, marketplace design, causal effects