Research and Major Project Topics

Swapneel Mehta, Ph.D.
Cofounder, SimPPL
Postdoc, Boston University & MIT
March 2026

SimPPL: Rebuilding Digital Trust

SimPPL is a global community of 200+ engineers and researchers working to make reliable information accessible for the global majority. We are a U.S. 501(c)(3) nonprofit that builds responsible computing tools and publishes research at top venues.

$500K+
Raised with collaborators
(Google, Mozilla, Ford, Omidyar)
16+
Publications
(AAAI, NeurIPS, ICML, ICWSM)
8
Countries
with active partnerships

Accepted into the Fast Forward Tech Nonprofit Accelerator (previous: Amnesty International, Allen Institute for AI, ICFJ). Selected for UNDP AI Trust and Safety inaugural cohort.

Sakhi: Health Literacy on Mobile

We asked how to deliver reliable information for critical healthcare needs in multilingual contexts? That research question led to Sakhi: a multilingual platform delivering verified women's health information to mobile phones.

What was built

  • Mobile multimodal messaging for health education
  • Monitoring dashboards for last-mile care delivery
  • Gamified rewards for community health workers
  • Multilingual Q&A for reproductive health (1000 Q&A dataset)

Where it went

  • RCT with 100 families in Jalgaon, Maharashtra over 2 years
  • Expanded to 250 families in Bangladesh for menstrual health
  • Presented at Psychology of Technology Conference in DC
  • Technical evaluations with Cohere (co-authoring publication)

The Students Behind Sakhi

Mrunmayi Parkar

Former Program Manager and Research Engineer at SimPPL. Led the Sakhi team. Selected for MIT IDEAS Social Innovation Challenge.

Now: TPM Intern and soon full-time at Google, MS CS at UT Dallas.

Nahush Patil

Former Research Engineer at SimPPL. Part of the Sakhi team. Part of team that won the MIT PKG Center 10K Amazon Prize for Social Good.

Now: Intern at an industrial engg. firm and now full-time. MS CS at UT Dallas.

Utkarsh Verma

Software Developer, now senior member of the ML Engineering team at SimPPL.

B.E. Computer Engineering from DJ Sanghvi College of Engineering (2021-2025). One of your own.

All three were undergraduates when they built this. Side project, then research project at top venues, then a product serving 450 families across India and Bangladesh.

InfluenceCheck: Verifying Influencer Claims

From that same survey, we asked: which influencers are shaping what young people believe about health and finance, and are their claims actually true?

What was built

  • A system that verifies claims influencers make in their online videos across Instagram
  • Focused on health and finance sectors, where misleading claims cause real harm
  • Automated claim extraction from video transcripts + fact-checking pipeline

Where it went

  • Presented to the head of Jagran New Media, who was also head of the International Fact-Checking Network (IFCN)
  • This person worked collaboratively with the team for a year
  • Presented at the India AI Summit to Ashwini Vaishnaw (Minister of MeitY)
  • The Jagran head left his organization to launch a startup around this product

The Students Behind InfluenceCheck

Dhvani Shah

Built the InfluenceCheck system. Worked directly with the head of the IFCN on influencer claim verification. Now working with NYU Data Science professor, former hedge fund manager to study prediction markets.

Currently: Still an undergraduate, wrapping up her final year.

Atmik Shetty

ML Engineer at SimPPL. Co-built InfluenceCheck. Specializes in NLP, LLM inferencing, and optimization.

B.E. IT, St. Francis Institute of Technology (2021-2025). Recently graduated.

A product built by two undergraduates convinced the former President of one of the world's most important fact-checking organization to leave his job and build a startup around it. It does not take preexisting knowledge to do good research. It takes determination and an open mind.

Real Talk: What Good Research Looks Like

The problem I see

  • Students publishing at low-quality journals because it seems easier
  • To everyone outside your college, more papers at bad venues = less credibility, not more. You are self-selecting into a community that does not care about research but about posturing
  • The bar for NeurIPS, ICML, and AAAI workshop papers is genuinely achievable in 4-6 months with a good research problem

What matters more than papers

  • Write a really good technical blog post and release a well-documented library. LLMs will use your library without users even knowing, and you can cite that as impact
  • This is a day and age of builders. If you are not building, ask yourself why. It is clearly not technology stopping you
  • Research is about learning new things about the world, not about writing papers. The point is exploration and genuine curiosity
When I was in your position, I did research into machine learning and nuclear physics. I had neither expertise nor any idea about the value of either. But doing that research taught me I enjoy working with large datasets, optimization, and productionizing code. The research you do does not have to dictate your career goals.

The Job Market Right Now

What the data says

  • 40% of jobs globally exposed to AI, 60% in advanced economies (IMF, 2024). Exposure is not displacement, but it is task redesign
  • AI increased productivity 14% in customer support (NBER) and 37% faster writing tasks (SSRN). Biggest gains for less experienced workers
  • Heavy AI use makes junior developers less capable of supervising AI effectively (Anthropic)
  • Value capture goes to integrators and infrastructure, not model builders. Foundation models are commoditizing

What I learned interviewing

I interviewed at OpenAI, Anthropic, and DeepMind for data science and research engineer roles. Here is what I learned:

  • Debugging is the new entry-level skill. Programming is assumed, not differentiating. The bar has shifted to debugging code and planning architecture
  • LeetCode-style interviews are being replaced by debugging exercises on platforms
  • What differentiates you: systems thinking, the ability to plan, and building things that work in production

Skills That Still Matter

What companies test for Why it matters How to build it
Debugging AI generates code, humans fix it. Entry-level is no longer writing code Debug on platforms, read others' code, contribute to open source
Planning & architecture Knowing what to build before building it. AI cannot decide what matters Design systems before coding. Write design docs. Lead a project
Building products A shipped product with real users > 10 papers at low-tier venues Ship something. Deploy it. Get 5 real users. Iterate on feedback
Research taste Knowing which problems are worth solving. Comes from reading good papers Read 2 papers/week from top venues. Follow researchers. Attend talks
Communication If you cannot explain what you built, it does not exist to anyone else Write blog posts. Give talks. Document projects with care
"How much of this do you do?"

How We Do Research at SimPPL

Project pitch (1 page, 4 questions)

  1. What is the idea? (2 sentences max)
  2. Why is it important? (2 sentences max)
  3. What have others done and how is this different? (review 4-6 papers)
  4. What experiment will highlight this difference?

Methodology from Rajesh Ranganath (NYU), Dan Fu and Jennifer Widom (Stanford)

Collaborative process

  1. First meeting: write a set of questions together
  2. Spend time reading and reviewing papers! Refine your research question before you start looking for answers.
  3. Meet 1-3 times/week, iterate for 12-36 weeks
  4. First authors handle the hardest subtasks + organizing
  5. Last author does mentoring, feedback, and guidance including several 1-2 hour live editing walkthrough
Tools we use: Overleaf, Zotero, cloud compute. Tools you should learn: Google Scholar, Semantic Scholar, Elicit, Perplexity for literature reviews. AlphaXiv for paper discussions.

Arbiter: Our Research Platform

Arbiter is an open investigative platform for cross-platform discourse analysis. We analyze posts across X, YouTube, Reddit, Bluesky, and TikTok. Our partners include Deutsche Welle Akademie (Kenya), NEST Center (Mongolia), Jagran New Media (India), and New York Public Radio (US).

Collection Embedding Retrieval Clustering Labeling AI Agent

Each stage in this pipeline involves open research questions. I'll present some research questions that will make it a little more intuitive to you what I believe is worth studying for a major project and use Arbiter as an example to motivate those questions.

What makes it different

Cross-platform harms tracing (not just one social network). Designed for journalists and researchers. Investigative social intelligence.

Real-world outcomes

Contributed to Meta's takedown of Bangladeshi networks. Twitter/X Site Integrity followed up on accounts we identified. $15K quarterly revenue.

Better Retrieval for Social Media

When a journalist asks "show me posts about election manipulation in the Philippines," how do you find the right posts from millions of candidates?

What exists and why it falls short

  • Keyword matching misses relevant posts using different words for the same concept
  • Semantic search returns too many vaguely related results
  • Social media text is short and noisy, so standard query expansion adds more noise than signal

Research questions

  • How do you evaluate retrieval quality on social media data where no gold-standard relevance set exists?
  • Can you build better query expansion for non-English languages where training data is sparse?
  • How does retrieval precision change across platforms with different post formats?
NLP Information Retrieval Elasticsearch

Example: What Retrieval Looks Like

A journalist in Kenya types "Ibrahim Traore Burkina Faso" into Arbiter. Two words in, and the system needs to find dozens of related concepts across platforms in multiple languages.

She typedWhat the system also needs to find
"Ibrahim Traore"Actors: "Captain Traoré," "Capitaine Traoré," "IB"
"Burkina Faso"Organizations: MPSR, Alliance of Sahel States, CNSP
(nothing typed)Events: Wagner Group departure, Sahel sovereignty movement
(nothing typed)Phrases: "military junta," "pan-African sovereignty"
We analyzed 974 YouTube posts and 1,117 Twitter posts about Traoré. YouTube showed templated promotional accounts using identical sentence structures with only the positive claim swapped out, consistent with AI-generated content. Twitter showed more organic political discourse.

Theme Discovery from Posts

Given thousands of social media posts, how do you automatically discover what people are talking about and label those topics in a way that is actually useful?

Current state of the art

  • Traditional topic models (LDA) struggle with short text
  • LLMs generate labels but repeat themselves: "IndiGo Flight Disruptions" appeared in 28 out of 53 labels
  • We solved this using embedding geometry to guarantee 100% unique labels

Research questions

  • Can you improve clustering for multilingual data where embeddings are weaker?
  • What is the right way to evaluate whether a topic label is "good"?
  • Can you do theme discovery in real-time as posts arrive?
Clustering UMAP HDBSCAN LLMs

AI Agents for Analysis

Can we build an AI assistant that helps journalists analyze social media data by calling the right tools at the right time? Arbiter's agent uses GPT-4o with 7 tools. We used GEPA (ICLR 2026) to optimize its system prompt for $5.77 total.

What is still hard

  • Getting agents to pick the right tool for ambiguous queries
  • Evaluating correctness (not just whether it ran without errors)
  • Preventing hallucination when agents produce charts people trust

Research questions

  • Can you build specialized agents for specific journalism tasks?
  • How do you design evaluation benchmarks where policy compliance conflicts with task completion?
  • What happens when you red-team an agent with prompt injection?
Agents Tool-Calling Evaluation

Example: AI Agent Investigation

A journalist asks: "Are there accounts coordinating to promote banned trading platforms across YouTube and Twitter?"

The agent chains 5 tools automatically:

  • searchPosts retrieves mentions of Exness, Quotex, Pocket Option across all platforms
  • getThemeActors identifies which accounts post most within promotional themes
  • getActorTimeline checks posting frequency to surface coordinated scheduling
  • getTopicStance separates promotional from educational content
  • compareAcrossPlatforms checks if the same accounts appear on other platforms
Result: 6,345 posts analyzed. Promotion is concentrated almost entirely on YouTube. Exness (banned by SEBI in India): 102 YouTube posts. Quotex (banned in EU): 99. These accounts disguise promotion as financial education.

Coordinated Network Detection

How do you find groups of accounts working together to spread misleading information? Our Parrot tool analyzed 70M tweets from 14M accounts. Twitter/X's Site Integrity lead followed up. On Meta, our analysis of 600 public Facebook pages led to a takedown of Bangladeshi harassment networks.

Real impact from student work

  • 600 pages, 95M views, 500K posts analyzed on Meta
  • 4,500 Telegram channels spreading pro-Russian disinformation identified
  • Accepted at Stanford T&S and Underground Economy Conference

Research questions

  • Can you detect coordination across platforms, not just within one?
  • How do you distinguish organic consensus from manufactured coordination?
  • Can temporal GNNs scale to millions of nodes in near-real-time?
Graph Analysis Network Science GNNs

Multilingual Analysis

83% of NLP misinformation research focused on monolingual high-resource languages. Social media in India is full of code-mixing and transliteration. If you speak a non-English language, you have a genuine research advantage that most Western labs lack.

Why this matters for Arbiter

  • Partners in Kenya, India, Mongolia, Bangladesh need analysis in their languages
  • LLMs are overconfident in languages where they perform worst (Nature Sci. Reports, 2026)
  • Even small annotated datasets are publishable contributions

Research questions

  • Does translate-then-retrieve or retrieve-in-native-language work better for RAG?
  • Can you build a code-mixing-aware sentiment analyzer for Hindi-English?
  • How does Arbiter's clustering degrade on non-English posts?
Multilingual NLP RAG Code-Mixing

Algorithmic Auditing Tools

A CHI 2025 paper found 435 AI auditing tools but none that support the full audit lifecycle. India's DPDP Act is being implemented. The EU requires algorithmic audits. The infrastructure to do them does not exist.

Why this is an engineering problem

  • Building usable audit tools requires good engineering, not novel ML
  • Exactly the kind of work undergraduate engineers are good at
  • FAccT 2026 lists "audits and assurance testing" as a focus area

Research questions

  • Can you build an open-source tool that automates data schema documentation for auditors?
  • How would you test a recommendation algorithm for differential treatment across user profiles?
  • What does an "inspectability API" for Arbiter look like?
Engineering Policy Transparency

AI-Generated Content Detection

Deepfake videos grew from ~500K (2023) to ~8M (2025). The bigger problem is "cheap fakes": real images paired with misleading captions. Current detectors focus on pixel-level forgery and miss semantic mismatch entirely.

What is broken

  • Detectors trained on one generation model fail on newer models
  • False positives on edited authentic content create legal problems
  • Text-based AI detectors cannot reliably distinguish human from AI text (Nature Communications, 2025)

What you could build

  • Benchmark AI-text detectors on code-mixed content
  • Test false positive rates on edited journalistic photos
  • Build a C2PA content provenance tracker

Crowdsourced Fact-Checking Systems

Meta, YouTube, and TikTok launched Community Notes-style systems. X open-sources its algorithm and data. Do bridging-based algorithms systematically fail on polarizing content?

What you could build

  • Measure latency gap between note creation and viral peak
  • Compare note quality on polarizing vs. non-polarizing topics
  • Test LLM-augmented note-writing

Key references

Simulating Social Systems with AI

MIT Media Lab's AgentTorch (AAMAS 2025) built a digital twin of NYC with 8.4M agents. What's interesting about this?

What you could build

  • Simulate misinfo spread through WhatsApp groups during an election
  • Validate simulation predictions against fact-checker databases
  • Test proposed moderation policies in simulation before deployment

SimPPL connection

  • Social network simulation at ICML '23
  • Marketplace design at IC2S2 '24
  • Transparency regulation at ACM DGO '24

Which Areas Fit Your Skills?

Research Area ML Depth Data Access Publish Where
Retrieval Medium High (Arbiter data) SIGIR, ACL, EMNLP
Theme Discovery Medium High (Arbiter data) ICWSM, ACL, EMNLP
AI Agents Low-Medium High (build your own) NeurIPS, ICML, FAccT
Network Detection Medium Medium (API limits) WebSci, ICWSM, WWW
Multilingual Analysis Medium Low (must annotate) ACL, EACL SRW, EMNLP
Algorithmic Auditing Low High (public systems) CHI, FAccT
If you can code but lack deep ML expertise, AI agents, algorithmic auditing, and multilingual analysis are the most accessible starting points. The EACL 2026 Student Research Workshop explicitly welcomes undergraduate submissions.

What Makes a Good Researcher - I

1. Curiosity

If you are not curious, do not do research. You will waste your time and your collaborators' time. Do engineering instead. You will both benefit more and be happier.

2. Time

Most people seem to think research is a side pursuit. It is not. Research is a full-time job, and good research gets you both visibility and hired at top companies.

What Makes a Good Researcher - II

3. Determination

You will keep hitting wall after wall. Nothing works, compute is too expensive, ideas are a dime a dozen, Claude Code can solve a problem faster than you. If you do research just for the heck of it, you will quit after the third wall. Or you will compromise and write a meaningless paper that never gets noted anywhere. That is a waste of your time and that of others.

4. Creativity

There is seldom a linear solution in good research. Creativity generally increases from having time on your hands and having curiosity about learning new ideas. Without those, it is really hard to be creative.

Thank You

Questions?

Arbiter · Our Student Fellowship Program · simppl.org

swapneel@simppl.org · All rights reserved. © SimPPL 2026.

Appendix

Additional slides for reference

APPENDIX

All Products Built by Students

Parrot

Coordinated network detection. 10M+ accounts. Wikimedia award. Times/Sunday Times funding.

Arbiter

Cross-platform social listening. 1B+ posts. Semantic search, alerts, AI agent.

Sakhi

Multilingual health literacy. 450 families in India and Bangladesh.

Audience Analytics

GenAI for newsroom analytics. Pilot at NYPR, expanding to LION Network.

Audio Search

Multimodal search for podcasts. 50+ languages. Built with UN agencies.

InfluenceCheck

Verify influencer claims in health/finance videos. Presented at India AI Summit.

APPENDIX

Impact and Partnerships

100M+
Global News Views
$500K+
Raised with collaborators
16+
Publications
40+
Talks globally

Partners: Deutsche Welle, Jagran New Media, NEST Center, Spreeha Foundation, Migrasia, VTDigger, New York Public Radio, The Times, United Nations, UN Global Pulse, Tattle, TechGlobal Institute, Yale News

Presented at: UNESCO, Stanford T&S, MIT Media Lab, Columbia, NYU, Swiss Embassy, Embassy of Finland, World Economic Forum

APPENDIX

NextGenAI Fellowship Program

6-12 month programs to train 200+ undergraduate students from the global majority to build and launch responsible computing tools.

  1. Unicode: Programming Community
  2. Shalizi Stats Reading Group: Advanced Statistics
  3. Unicode ML Summer Course: Machine Learning
  4. NYU AI School: AI/ML education for non-STEM majors
  5. NYU AI, Misinformation, and Policy Seminar
Outcomes: 8+ top-tier publications, partnerships in 4 countries, USD 132,000 in competitive global awards.
APPENDIX

NextGenAI Fellowship

What we look for

  • Students who want to build real products that real people use
  • Comfort with Python or JavaScript (we teach the rest)
  • Curiosity about how information spreads online
  • Willingness to read papers, run experiments, and iterate

What you get

  • Co-authorship on publications at top venues
  • Your code deployed and used by journalists in 8 countries
  • Mentorship from researchers at MIT, NYU, Oxford, and BU
  • Access to Overleaf, Zotero, compute, and real datasets

No current iterations. Past program details at nextgenai.simppl.org

APPENDIX

Four Research Pillars at SimPPL

Misleading Claims

Twitter, Meta, YouTube, Telegram, Truth Social, Bluesky, Wikipedia

User Behavior

Decentralized platforms, influencer strategies, political transcendence

Social Media Policy

Transparency regulation, platform interventions, shared language

Safety by Design

Recommendation algorithms, marketplace design, causal effects