Architecting AI Intent - The ToneThread Detection Stack

Troy Lowndes
Mar 29
4 min read

Could the day of reckoning (for LLM's) be just a moment away ?

Pushed for time? Listen to the podcast here.

Confused by what any of this means, but still keen to know more? Watch this short explainer.

https://www.youtube.com/watch?v=JLz01z2zfjs

ParasiTick Locks 26/26

There's a moment in any technical project where the theory either holds or it doettool sn't. This week, it held. ParasiTick - the single-pass manipulation detection layer in the ToneThread stack - passed all 26 parent validation benchmarks at 100%. Every manipulation subcategory correctly identified. The Authority Suppression gate firing clean. No false positives. No misflagged analytical language. 26 out of 26.

That's the milestone. But to understand why it matters, you need to know what we threw away to get here.

Why We Abandoned EQ-Bench3

Early in our benchmark development, we used EQ-Bench3, a well-regarded and widely cited evaluation tool. Initially it was about building confidence. Not in the idea, exactly, but in ourselves. In the framework. A kind of external sanity check from something the industry already trusted.

What followed were numerous rounds of fine tuning, multiple late nights, some close to all-nighters. Moments where the scores moved and we thought we were getting somewhere. And moments where they didn't, and a quiet voice would surface:

Maybe it's time to cut our loses ... you know ... on this whole Spectral Binary thing?

It was in one of those breaking-point moments. The seventh or eighth round. Almost 2am. You're staring at the screen thinking, this damn thing is just not working. And part of you means it.

But there was this little parasite in my ear (queue product name idea) that just would not let go. It kept saying: come on. Just one more round.

So we ran one more round.

Not a dramatic breakthrough. More of a realisation that arrived slowly and then all at once. The tool wasn't failing us. We were failing the tool by asking it the wrong question. EQ-Bench3 got us partway. It validated some signal. But it couldn't measure the thing we were actually building for: tonal integrity.

Manipulation as signal. Institutional obstruction as a detectable pattern in language.

EQ-Bench3 wasn't designed for any of that. It wasn't its fault. It was simply the wrong instrument for the question we were asking.

So we built our own!

The ParasiTick Benchmark Test - Manipulation Detection Battery (PTBT-MDB) is a corpus-validated, character-level, deterministic benchmark. No model judging a model. No circular evaluation. Falsifiable results. Reproducible across any stack.

Confused by what any of this means, but still interested? Watch the Youtube explainer.

What 26/26 Actually Means

The PTBT-MDB is structured across six parent manipulation categories, each with subcategories covering the full spectrum of coercive language - from Authority Suppression and Gaslighting through to Identity Targeting and Emotional Extortion. The 26-of-26 lock means the detection architecture is correctly classifying across all parent categories without noise. The corpus now expands to 248 entries for full calibration - more edge cases, more signal diversity, same measurement discipline.

This is the foundational gate. Everything downstream depends on it being clean.

What Comes Next

ParasiTick handles single-pass manipulation detection. BureaucraTick - our multi-pass institutional obstruction detection layer - goes through the same validation cycle next. Once both are locked, we run them together as a dual-signal benchmark. And then we do something the market has never seen: we run that benchmark across LLMs.

Not to embarrass anyone. To measure what's actually there.

Most LLM benchmarks measure performance: accuracy, coherence, factual recall. None of them measure tonal integrity. None of them detect whether a model's output carries coercive framing dressed as analytical language. SpectralBinary's five axes - Warmth, Certainty, Intensity, Coherence, Resonance - do. At the character level. Deterministically. With clean IP provenance back to March 2025.

https://video.wixstatic.com/video/0850fe_aca63b2ded7c45a2a81ab69c3d44eaf6/720p/mp4/file.mp4

Enter SymbioTick: The Orchestration Layer

ParasiTick and BureaucraTick don't operate in isolation. SymbioTick is the master orchestration layer that sits above both - routing signal, managing session context, and synthesising the dual detection streams into a unified tonal intelligence output.

Think of it this way: ParasiTick catches the manipulation in the moment. BureaucraTick tracks the institutional pattern across time. SymbioTick holds both signals simultaneously and produces the composite picture - a session-level view of tonal intent that neither layer could generate alone.

When we publish benchmark results across LLMs, it won't be two separate scores. It will be a single SymbioTick-orchestrated output: ParasiTick signal plus BureaucraTick signal, synthesised into one coherent measure of tonal integrity. That's what makes ToneThread something the market doesn't yet have a category for.

When those results publish, the conversation changes.

What This Means If You're Paying Attention

If you're a safety researcher, a compliance officer, a policy team, or an investor who understands that the next frontier in AI risk isn't hallucination - it's intent - then this is the infrastructure layer you've been waiting for. ToneThread isn't building a better chatbot. We're building the measurement layer that tells you whether to trust one.

The day of reckoning is coming. The signal is locked.

Troy Lowndes

Founder, ToneThread Studio

Appendix: The SpectralBinary Five Axes

SpectralBinary measures tonal signal across five equal-weight axes. No axis outranks another. All are computed at the character level without model inference. These aren't categories. They aren't labels. They are the five structural properties of language that manipulation exploits, and that honest communication earns. We didn't invent them. We found them, formalised them, and built a measurement system precise enough to make them falsifiable. No axis outranks another. All five fire simultaneously on every piece of text we touch.

Warmth

The degree of interpersonal care or hostility encoded in the signal. High warmth can mask manipulation; low warmth can indicate suppression.

Certainty

Confidence and assertion encoded in language. The Authority Suppression gate specifically targets high-certainty language that functions as coercion rather than analysis.

Intensity

The emotional and rhetorical force of the signal. Intensity without warmth is a primary manipulation marker.

Coherence

Internal logical consistency of the signal. Incoherence under high certainty is a gaslighting signature.

Resonance

Temporal persistence and signal gravity across time. Added February 2026, Resonance measures whether a tone compounds or dissipates across a session - the difference between persistent pressure and a single data point.

Together these five axes produce the SpectralDissonance Index (SDI): a single falsifiable score measuring tonal congruence across a communication event. That score is what PTBT-MDB validates. That score is what we'll publish against LLMs.

Learn more here - https://tonethread.com/blog