Product

How AI Draft Matching Actually Works — and Why It Needs Your Sent History

By Hannah Liu October 15, 2024 6 min read

Abstract visualization of AI matching a person's writing voice — two overlapping text pattern layers

Every email assistant that's ever claimed to "write like you" has run into the same wall: generality. A general-purpose language model knows how professional email sounds in the abstract. It can produce grammatically correct, tonally appropriate sentences. What it can't do — without a specific signal — is reproduce the particular rhythm of how you write to this person, about this topic, at this level of formality. That gap is what voice-matching is trying to close.

This piece explains how it actually works, what data it needs to work well, and why the sent-mail folder is the key signal — not your writing style in general.

Why generic drafts feel wrong even when they're correct

Consider a simple scenario: you're a senior director at a professional services firm, and you've been emailing the same partner at a collaborating firm every few weeks for two years. Your emails to them have a specific shape. You skip formalities. You use shorthand phrases that only make sense in context. You tend to front-load the decision or question and follow with rationale, not the other way around. You sometimes end with an informal line that has nothing to do with the topic.

A generic draft won't know any of that. It'll produce something like: "Hi [Name], I hope this finds you well. I wanted to follow up on our previous discussion about..." — correct, professional, completely wrong for this relationship.

The problem isn't the quality of the AI. It's the absence of relationship-specific signal. Fixing it requires not just knowing how you write in general, but how you write to them specifically.

The two axes of voice matching

There are two dimensions a voice model needs to capture: compositional style and relational register.

Compositional style is the easier one. It includes sentence length distribution, use of active versus passive voice, paragraph density, punctuation habits (do you use em-dashes? serial commas?), and the ratio of context-setting to directive content. These patterns are relatively stable across contexts and can be extracted from a reasonably large sample of any person's writing.

Relational register is more granular. It describes the specific tone modulation you apply to a specific correspondent. You likely write differently to your direct reports than to your board. You write differently to a vendor you're frustrated with than to a vendor you trust. You write differently to someone you've known for a decade than to someone you met at a conference last month. Register is contextual — it changes per relationship, not just per person-type.

Most draft tools only address the first axis. Voice-matching at the relational level requires analyzing the sent-history for that specific thread or sender, not just a global writing sample.

What the model actually reads

The core input is your sent-mail folder, filtered to the specific correspondent. When you receive an email from someone you've emailed before, a voice-matching system looks at the last N exchanges with that sender — not your entire inbox, not a generic writing corpus, but those specific threads.

From that thread history, the model extracts several signals:

Greeting and closing patterns: Do you use their first name or a full name? Do you sign off with your name, initials, or nothing? Do you use "Thanks," "Best," "Cheers," or nothing at all?
Opening move: Do you typically acknowledge their previous email before pivoting? Or do you open directly with your response?
Formality markers: Contractions ("I'll" vs. "I will"), hedging language ("perhaps we could" vs. "let's"), and honorifics all vary by relationship.
Message length norms: Some correspondent pairs maintain consistently short emails. Others run long. The model can detect this and calibrate draft length accordingly.
Topic-specific vocabulary: Shared shorthand or project names that only appear in exchanges with this person.

The more exchange history exists with a given sender, the more accurate the voice calibration. This is why the quality of voice matching tends to improve over time — the model is building a richer per-correspondent profile as new exchanges occur.

The cold-start problem

New correspondents present a challenge. When someone emails you for the first time, there's no prior exchange history to draw on. In this case, the model falls back to the compositional style layer — your general writing patterns — and applies them with the relational register set to "neutral professional."

This is honest about what's possible. We're not saying a voice model can perfectly calibrate to a new correspondent from a single email. It can't. What it can do is produce a draft that sounds like you wrote it, even if it doesn't yet know how you'd write to this particular person. The first few exchanges serve as a calibration period — each sent reply gets added to the correspondent-specific training signal, and the drafts get sharper.

There's a reason some tools claim instant perfect voice matching regardless of history: they're not doing relationship-level calibration. They're just running your general style. That's useful, but it's a different thing.

Why local processing matters for this to work

Voice matching requires access to your sent-mail history. That's a significant privacy surface — your sent folder contains sensitive communications going back years, possibly including legal, financial, and personal content. Any tool that uploads your sent history to a remote training pipeline for model improvement creates a risk that's hard to accept for most knowledge workers.

The architecture that makes voice matching privacy-safe is local processing: the voice model runs inference on your device or in an isolated per-user context, the sent-mail analysis is not used to train a shared model, and the extracted patterns remain scoped to your account. What goes up to the server is the draft request and the draft response — not the underlying emails that informed the style model.

This is a meaningful architectural choice, not just a marketing claim. When evaluating any email AI that does voice matching, the right question is: does the sent-mail analysis stay scoped to my account, or does it feed a shared model? The answer should be in the privacy documentation, not buried in terms of service.

Confidence scoring and the review layer

No voice model gets every draft right. There are situations where the system doesn't have enough correspondent history, where the incoming email asks something outside the normal interaction pattern, or where the compositional style extraction produces a draft that's technically plausible but tonally off in a way that's hard to quantify.

A draft confidence score addresses this honestly. A high-confidence draft — generated from rich correspondent history on a topic the system has seen before — can reasonably be reviewed quickly and sent with minimal editing. A low-confidence draft — perhaps the first message to a new correspondent, or a reply on an emotionally sensitive topic — is flagged for closer attention.

This doesn't replace human judgment. The value is in knowing where to apply it. A knowledge worker processing a hundred emails a day can't give every draft the same attention. A confidence signal helps them allocate review time where it actually matters, rather than either rubber-stamping everything or manually reading every draft with equal scrutiny.

The limit of the mechanism

Voice matching handles style and register well. What it doesn't handle is situational judgment — knowing that this particular vendor is going through a difficult quarter and that your normal direct tone will land badly right now, or that the person you're replying to just had a major win and a congratulatory opening is appropriate even if you wouldn't normally include one.

Those are things you know and the model doesn't. The draft is a starting point. The value isn't that it replaces your thinking — it's that you start the reply process 70% done rather than at a blank page. The cognitive load of opening-sentence construction and formatting decisions gets offloaded; the substantive judgment stays with you.

Understanding this boundary is important for using the tool well. The drafts that need the most editing are not failures of the voice model — they're flags that the situation has information the system doesn't have access to. Treat those edits as the natural, expected complement to what the mechanism does well.

Hannah Liu

CEO & Co-Founder, Inboxwright

Hannah built Inboxwright after spending too long on email that didn't need her. She writes about attention, communication, and how AI can make knowledge work feel more human — not less.