Meet Darcy.
Your AI-native data analyst.
Meet Darcy — your AI-native data analyst. Drop in messy source files, tell Darcy what you need, and get clean, production-ready CSV and XLSX outputs back on your machine. For complex operations, Darcy writes and runs Python scripts automatically — saving them to scripts/ for later reuse. No pipelines. No dashboards. Just clean data, fast.
Data flows one way. You stay in control.
Source files are never touched. Work happens in a staging workspace. Outputs are only confirmed after your review.
Add files to data/
Put raw CSVs, exports, or code lists under data/{project}/. They stay untouched as the permanent source of truth.
Describe the prep goal
Say what you need — clean this, dedupe that, build a master list, map these codes. No forms, no config files. For complex operations, Darcy checks scripts/ for an existing script first — tweaking or forking it if relevant, writing from scratch only if nothing fits.
Review before confirming
All work lands in workspace/ first. A sanity check report is saved automatically so you know exactly what changed.
Elevate to outputs/
Once you're happy, outputs are saved with a run timestamp so anyone can trace exactly when and from what data they were produced.
Everything you need for data prep
The full capability list lives in capabilities/capabilities.csv. Here's what's covered.
capabilities/capabilities.csv. Add a new row and Darcy will pick it up on the next job — no code required.Every project can bring its own rules.
Generic capabilities apply to any dataset. But each project can define its own prep instructions — naming conventions, code standards, match rules, source notes — stored in context/instructions/.
capabilities/capabilities.csv. These apply to every project and never change per-project.context/instructions/. These define domain-specific logic — which codes are canonical, how entities should be matched, what standards apply.context/instructions/ for any project. Define which codes are canonical, how entities match, and what standards apply. The agent reads it before every job in that domain.capabilities/capabilities.csvstandards.md
context/instructions/{project}.mdOne file per project. Add as many projects as needed.
Files, not documents.
Every job produces a usable file saved to a typed output folder. A sanity check report is always saved alongside it.
| Request | Output | Saved to |
|---|---|---|
| Clean one source | Cleaned CSV or XLSX | outputs/cleaned/ |
| Build a master list | Canonical linked file | outputs/master-lists/ |
| Create code or alias mappings | Mapping table CSV | outputs/mappings/ |
| Review unresolved records | QA file or review sheet | outputs/qa/ |
| Any job | Dropped records file | outputs/discarded/ |
| Any job | Sanity check report | outputs/reports/ |
Always know which data produced which output.
When a source file changes, the old copy is archived automatically. Every output is stamped with the exact run timestamp so the lineage is traceable at a glance.
Change detected automatically
When a source file arrives, Darcy compares it against the existing copy using a file hash. Unchanged files are copied as-is — no version bump. Changed files trigger the archive step.
Prior copy archived with timestamp
The existing copy is renamed with a version number and the import timestamp: customers_v1_2026-04-15_143022.csv. The new file takes the clean name: customers.csv — always the current version.
Version log updated
A _versions.md file in the same folder records each archived version with its timestamp and a change note.
Outputs stamped with run timestamp
Every output file carries the exact timestamp of the run that produced it: customers-master_2026-04-15_143022.csv. Each run is distinct and traceable.
Reuse before you rebuild.
Before writing any new script, Darcy inspects scripts/ and picks the right action based on fit.
Same domain, small change
Different source column, new filter, adjusted output field — Darcy edits the existing script in place and states exactly what changed and why.
Same domain, different job
Different grain, matching logic, or output shape — Darcy copies the script under a new descriptive name and adapts it. The original stays untouched.
Nothing relevant exists
No existing script fits the job. Darcy writes one from scratch, saves it to scripts/, and it becomes available for reuse on future jobs.
Simple and intentional.
Generic capabilities live at the root. Project-specific context and instructions are scoped inside context/.
your-repo/ ├── capabilities/ ← generic, reusable across any project │ └── capabilities.csv ├── context/ ← project-specific only │ └── instructions/ ← prep rules for this project ├── data/ ← source files, never modified │ └── {project}/ │ └── source_file.csv ├── workspace/ ← Darcy copies data/ files here automatically │ ├── working/ ← intermediate outputs, inspectable before confirmation │ └── reference/ ← source lookups; prior versions archived here with timestamp │ ├── source_file.csv │ ├── source_file_v1_2026-04-15_143022.csv │ └── _versions.md ├── outputs/ ← confirmed, reviewed files only │ ├── cleaned/ ← created when a cleaning job runs │ ├── master-lists/ ← created when a master list job runs │ ├── mappings/ ← created when a mapping job runs │ ├── qa/ ← created when a QA job runs │ ├── discarded/ ← dropped records, every job, never silent │ └── reports/ ← sanity check report, every job ├── scripts/ ← scripts reused, tweaked, or forked before new ones are written ├── standards.md ← naming and code standards ├── CLAUDE.md ← agent instructions └── preferences.json ← behaviour toggles
| Folder | What goes in it |
|---|---|
| data/ | Original source files, organised by project subfolder. Never modified directly. |
| workspace/working/ | Intermediate outputs written during processing — inspectable before anything is elevated to outputs/. |
| workspace/reference/ | Source files copied from data/ and used as lookups to inform the job — not being cleaned themselves. |
| scripts/ | Processing scripts written by Darcy. Before writing a new one, Darcy inspects this folder and tweaks, forks, or writes from scratch depending on fit. Controlled by commitScripts in preferences. |
| context/instructions/ | One Markdown file per project defining prep rules, match logic, and standards for that domain. |
| capabilities/ | The master list of supported prep operations. Generic and reusable across all projects. |
| outputs/ | Confirmed, reviewed files only. Subfolders are created by the agent when a job runs. |
| outputs/discarded/ | Records dropped during every job — no identifier, failed validation, or unresolvable conflict. Never silently lost. |
| standards.md | Cross-project defaults for naming, codes, entity matching, and date formats. |
| preferences.json | Behavioural toggles — controls confirmations, commits, sanity checks, and language. |
All behaviour is configurable.
Every toggle lives in preferences.json. Darcy reads it at the start of every session. Change a setting and the behaviour changes — no code required.
| Setting | Default | What it does |
|---|---|---|
| commitOutputs | false | Allow files in outputs/ to be committed to git. |
| commitWorkspace | false | Allow files in workspace/ to be committed to git. |
| commitContext | false | Allow files in context/ to be committed to git. |
| commitScripts | false | Allow files in scripts/ to be committed to git. |
| pushAfterCommit | false | Push to remote automatically after every commit. |
| confirmBeforeSave | true | Ask before writing any output file. |
| confirmBeforeCommit | true | Ask before committing. |
| confirmBeforeGenerate | true | Ask for confirmation when the request is ambiguous. |
| runEvidenceCheck | true | Inspect data/, workspace/, and context files before starting any job. |
| includeSanityCheck | true | Write a sanity check report to outputs/reports/ after every job. |
| updateSourceRegistry | true | Offer to update standards.md when new source rules appear. |
| language | "en-US" | Writing language for outputs. Supports "en-US" and "en-GB". |
commit* setting is changed to true, Darcy updates .gitignore automatically to unignore that folder. data/ is always ignored regardless of any setting.
Built for Claude Code.
The instruction file is CLAUDE.md — Claude Code picks it up automatically. Can be extended to Cursor or GitHub Copilot by copying the contents into the relevant instruction file for that tool.
Claude Code
CLAUDE.md
Cursor
.cursor/rules/agent-da.mdc
GitHub Copilot
.github/copilot-instructions.md