Darcy — AI-native data analyst · for your data · built for Claude Code

Meet Darcy.
Your AI-native data analyst.

Meet Darcy — your AI-native data analyst. Drop in messy source files, tell Darcy what you need, and get clean, production-ready CSV and XLSX outputs back on your machine. For complex operations, Darcy writes and runs Python scripts automatically — saving them to scripts/ for later reuse. No pipelines. No dashboards. Just clean data, fast.

See how it works View output model

15+ categories

Capability coverage

100% local

Nothing pushed without ask

File → file

CSV & XLSX outputs

Claude Code

Extendable to Cursor & Copilot

How it works

Data flows one way. You stay in control.

Source files are never touched. Work happens in a staging workspace. Outputs are only confirmed after your review.

📁

Step 1

data/

Source files live here. Never modified. Organized by project subfolder.

→

⚙️

Step 2

workspace/

Files are copied here for active processing. Staging area for intermediate work.

→

👁️

Step 3

Review

You inspect the workspace output before anything is elevated.

→

✅

Step 4

outputs/

Confirmed, reviewed files only. Every output tagged with source version and date.

Drop in sources

Add files to `data/`

Put raw CSVs, exports, or code lists under data/{project}/. They stay untouched as the permanent source of truth.

Give a plain request

Describe the prep goal

Say what you need — clean this, dedupe that, build a master list, map these codes. No forms, no config files. For complex operations, Darcy checks scripts/ for an existing script first — tweaking or forking it if relevant, writing from scratch only if nothing fits.

Inspect workspace

Review before confirming

All work lands in workspace/ first. A sanity check report is saved automatically so you know exactly what changed.

Approve outputs

Elevate to `outputs/`

Once you're happy, outputs are saved with a run timestamp so anyone can trace exactly when and from what data they were produced.

Capabilities

Everything you need for data prep

The full capability list lives in capabilities/capabilities.csv. Here's what's covered.

🧹

Source Cleaning

Fix encodings, normalize case, remove duplicates, clean diacritics, handle missing values, and standardize inconsistent labels across files.

cleaningfile_ops

🔗

Schema Alignment

Rename, reorder, split, and combine columns. Align multiple related sources to a common structure for clean downstream joins.

columnscombine

🏷️

Entity Resolution

Fuzzy matching, canonicalization, alias mapping, record linkage, deduplication across sources, and unique ID assignment with provenance tracking.

entity_resolution

📋

Master Lists

Build canonical customer, supplier, product, language, or location master lists from multiple related sources with all source IDs preserved.

combineoutput

🗺️

Code & Alias Mapping

Build mapping tables, alias lists, and link tables for codes, identifiers, and entity names. State all match rules explicitly.

entity_resolutiontext

🔍

Data Quality Review

Null checks, uniqueness counts, range validation, referential integrity, consistency checks, and outlier detection — saved as QA output files.

validationfiltering

📅

Date & Time Prep

Parse mixed date formats, standardize to ISO, align timezones, extract date parts, fill missing intervals, and sort by time.

datetime

📊

Viz-ready Output

Reshape wide-to-long, enforce one row per element, align granularity, standardize labels, pre-aggregate for charts, and compute percentages and rankings.

viz_preptransform

🔤

Text Processing

Tokenization, fuzzy string matching, regex pattern extraction, special character removal, and encoding artifact repair.

textcleaning

Need something not listed?

Darcy's capabilities are driven by capabilities/capabilities.csv. Add a new row and Darcy will pick it up on the next job — no code required.

See folder structure →

Project instructions

Every project can bring its own rules.

Generic capabilities apply to any dataset. But each project can define its own prep instructions — naming conventions, code standards, match rules, source notes — stored in context/instructions/.

⚙️

Generic capabilities

Cleaning, alignment, entity resolution, validation, and all other prep operations live in capabilities/capabilities.csv. These apply to every project and never change per-project.

capabilities/

📌

Project instructions

Each project adds its own prep rules as a Markdown file under context/instructions/. These define domain-specific logic — which codes are canonical, how entities should be matched, what standards apply.

context/instructions/

📁

Your project

Add a Markdown file under context/instructions/ for any project. Define which codes are canonical, how entities match, and what standards apply. The agent reads it before every job in that domain.

context/instructions/{project}.md

Generic layer

capabilities/capabilities.csv
standards.md

Project-specific layer

context/instructions/{project}.md
One file per project. Add as many projects as needed.

Fully guided prep

The agent reads both layers before every job — generic capabilities tell it what it can do, project instructions tell it how to do it for this domain.

Output model

Files, not documents.

Every job produces a usable file saved to a typed output folder. A sanity check report is always saved alongside it.

Request	Output	Saved to
Clean one source	Cleaned CSV or XLSX	outputs/cleaned/
Build a master list	Canonical linked file	outputs/master-lists/
Create code or alias mappings	Mapping table CSV	outputs/mappings/
Review unresolved records	QA file or review sheet	outputs/qa/
Any job	Dropped records file	outputs/discarded/
Any job	Sanity check report	outputs/reports/

Versioning

Always know which data produced which output.

When a source file changes, the old copy is archived automatically. Every output is stamped with the exact run timestamp so the lineage is traceable at a glance.

Change detected automatically

When a source file arrives, Darcy compares it against the existing copy using a file hash. Unchanged files are copied as-is — no version bump. Changed files trigger the archive step.

Prior copy archived with timestamp

The existing copy is renamed with a version number and the import timestamp: customers_v1_2026-04-15_143022.csv. The new file takes the clean name: customers.csv — always the current version.

Version log updated

A _versions.md file in the same folder records each archived version with its timestamp and a change note.

Outputs stamped with run timestamp

Every output file carries the exact timestamp of the run that produced it: customers-master_2026-04-15_143022.csv. Each run is distinct and traceable.

Scripts

Reuse before you rebuild.

Before writing any new script, Darcy inspects scripts/ and picks the right action based on fit.

↻ Tweak

Same domain, small change

Different source column, new filter, adjusted output field — Darcy edits the existing script in place and states exactly what changed and why.

⎇ Fork

Same domain, different job

Different grain, matching logic, or output shape — Darcy copies the script under a new descriptive name and adapts it. The original stays untouched.

✦ New

Nothing relevant exists

No existing script fits the job. Darcy writes one from scratch, saves it to scripts/, and it becomes available for reuse on future jobs.

Folder structure

Simple and intentional.

Generic capabilities live at the root. Project-specific context and instructions are scoped inside context/.

your-repo/
├── capabilities/           ← generic, reusable across any project
│   └── capabilities.csv
├── context/                ← project-specific only
│   └── instructions/       ← prep rules for this project
├── data/                   ← source files, never modified
│   └── {project}/
│       └── source_file.csv
├── workspace/              ← Darcy copies data/ files here automatically
│   ├── working/            ← intermediate outputs, inspectable before confirmation
│   └── reference/          ← source lookups; prior versions archived here with timestamp
│       ├── source_file.csv
│       ├── source_file_v1_2026-04-15_143022.csv
│       └── _versions.md
├── outputs/                ← confirmed, reviewed files only
│   ├── cleaned/            ← created when a cleaning job runs
│   ├── master-lists/       ← created when a master list job runs
│   ├── mappings/           ← created when a mapping job runs
│   ├── qa/                 ← created when a QA job runs
│   ├── discarded/          ← dropped records, every job, never silent
│   └── reports/            ← sanity check report, every job
├── scripts/                ← scripts reused, tweaked, or forked before new ones are written
├── standards.md            ← naming and code standards
├── CLAUDE.md               ← agent instructions
└── preferences.json        ← behaviour toggles

Folder	What goes in it
data/	Original source files, organised by project subfolder. Never modified directly.
workspace/working/	Intermediate outputs written during processing — inspectable before anything is elevated to outputs/.
workspace/reference/	Source files copied from data/ and used as lookups to inform the job — not being cleaned themselves.
scripts/	Processing scripts written by Darcy. Before writing a new one, Darcy inspects this folder and tweaks, forks, or writes from scratch depending on fit. Controlled by commitScripts in preferences.
context/instructions/	One Markdown file per project defining prep rules, match logic, and standards for that domain.
capabilities/	The master list of supported prep operations. Generic and reusable across all projects.
outputs/	Confirmed, reviewed files only. Subfolders are created by the agent when a job runs.
outputs/discarded/	Records dropped during every job — no identifier, failed validation, or unresolvable conflict. Never silently lost.
standards.md	Cross-project defaults for naming, codes, entity matching, and date formats.
preferences.json	Behavioural toggles — controls confirmations, commits, sanity checks, and language.

Preferences

All behaviour is configurable.

Every toggle lives in preferences.json. Darcy reads it at the start of every session. Change a setting and the behaviour changes — no code required.

Setting	Default	What it does
commitOutputs	false	Allow files in outputs/ to be committed to git.
commitWorkspace	false	Allow files in workspace/ to be committed to git.
commitContext	false	Allow files in context/ to be committed to git.
commitScripts	false	Allow files in scripts/ to be committed to git.
pushAfterCommit	false	Push to remote automatically after every commit.
confirmBeforeSave	true	Ask before writing any output file.
confirmBeforeCommit	true	Ask before committing.
confirmBeforeGenerate	true	Ask for confirmation when the request is ambiguous.
runEvidenceCheck	true	Inspect data/, workspace/, and context files before starting any job.
includeSanityCheck	true	Write a sanity check report to outputs/reports/ after every job.
updateSourceRegistry	true	Offer to update standards.md when new source rules appear.
language	"en-US"	Writing language for outputs. Supports "en-US" and "en-GB".

When a commit* setting is changed to true, Darcy updates .gitignore automatically to unignore that folder. data/ is always ignored regardless of any setting.

Compatibility

Built for Claude Code.

The instruction file is CLAUDE.md — Claude Code picks it up automatically. Can be extended to Cursor or GitHub Copilot by copying the contents into the relevant instruction file for that tool.

⚡

Claude Code

CLAUDE.md

🖱️

Cursor

.cursor/rules/agent-da.mdc

🐙

GitHub Copilot

.github/copilot-instructions.md

Meet Darcy.Your AI-native data analyst.

Data flows one way. You stay in control.

Add files to data/

Describe the prep goal

Review before confirming

Elevate to outputs/

Everything you need for data prep

Every project can bring its own rules.

Files, not documents.

Always know which data produced which output.

Change detected automatically

Prior copy archived with timestamp

Version log updated

Outputs stamped with run timestamp

Reuse before you rebuild.

Same domain, small change

Same domain, different job

Nothing relevant exists

Simple and intentional.

All behaviour is configurable.

Built for Claude Code.

Claude Code

Cursor

GitHub Copilot

Meet Darcy.
Your AI-native data analyst.

Add files to `data/`

Elevate to `outputs/`