Team

Why this project

Cell-type annotation is one of the most time-consuming steps in scRNA-seq analysis. LLMs are now capable enough to either generate full pipelines end-to-end or do zero-shot annotation directly from marker genes. A clean head-to-head benchmark — same data, same prompts, expert-annotated ground truth — would be a useful contribution.

What a team could build in one day

  • Pick a small set of well-annotated public datasets (the 10X / Seurat PBMC tutorial data is a clean starting point).
  • Run several LLMs (Claude, GPT, open-weights models) under matched prompts — both the “generate a pipeline” framing and the “zero-shot annotation from marker genes” framing.
  • Score against expert annotations and produce a leaderboard.

Minimum viable demo: a comparison table for one dataset across 2–3 LLMs.

Stretch directions

  • Extend across tissues and conditions for a broader benchmark.
  • Multi-step agentic pipelines (LLM as orchestrator over sklearn / scanpy).
  • Cost vs. latency vs. accuracy comparisons.

Compute

Low — 16 GB RAM and a standard CPU should suffice. Most work is API calls.