MM: Meaning Model — White Paper

Every large language model discovers the same knowledge: that betrayal hurts, kindness builds trust, and fire is dangerous. They discover it from trillions of tokens at a cost of hundreds of millions of dollars. They store it in billions of opaque weights that cannot be read, audited, or explained. MM is a fundamentally different architecture. It extracts explicit causal structures from text — input, behaviour, outcome — on six biological prediction error channels, each carrying a reliability score computed from frequency.

1. The Problem with Large Language Models

Large language models work. They generate coherent text, answer questions, translate languages, and write code. They have transformed every industry that depends on language. But they have four fundamental limitations that no amount of scaling will solve, because these limitations are architectural, not computational.

1.1 They Cannot Decide

An LLM generates the most probable next token given the context. It does not decide anything. It does not weigh competing priorities. When asked ‘Should I take this job?’, it generates plausible advice by predicting what words typically follow that question in its training data. It has no mechanism to compare the resource implications against the belonging implications against the status implications of the decision.

1.2 They Cannot Compute Fairness

Fairness requires comparison: what happened to me versus what happened to them. An LLM has no representation of ‘me’ or ‘them.’ It has no channel values to compare. Its training data contains biases that it reproduces without detecting them, because detection requires a fairness metric and no such metric exists in the architecture.

1.3 They Hallucinate

An LLM assigns every output the same implicit confidence. It generates ‘The capital of France is Paris’ with the same mechanism it uses to generate ‘The capital of France is Lyon.’ The model has no frequency count for either claim. Hallucination is not a bug. It is the inevitable consequence of a system that stores knowledge as statistical weights without provenance, frequency, or reliability.

1.4 They Cannot Learn After Training

An LLM is frozen after training. It cannot update its knowledge from a single new piece of evidence. If new information arrives, the entire model must be retrained at enormous cost. The model cannot learn the way humans learn: from individual events, immediately, permanently, with the frequency of confirmation preserved.

1.5 The Root Cause

All four limitations share a single root cause: the loss function. Every LLM optimises for one objective — predict the next token correctly. This single objective produces fluent language but cannot produce decision, fairness, reliability, or continuous learning, because none of these are expressible as token prediction accuracy. The solution is not a better LLM. It is a different loss function.

• • •

2. Prediction Error Comparator Framework

PCF is a theory of intelligence derived from neuroscience, evolutionary biology, psychopharmacology, and clinical evidence. It answers the question: what is intelligence FOR? The answer: intelligence exists to predict what will happen and adjust behaviour when predictions are wrong, across multiple biologically grounded dimensions.

2.1 Six Channels

All human emotional and motivational experience maps onto six prediction error channels. Each channel corresponds to a neurotransmitter system, a brain region, and a survival function:

Channel	Neurotransmitter	Brain Region	Question
R — Resource	Opioids	Reward system	Do I have enough?
S — Status	Serotonin	Social hierarchy circuits	Am I respected?
B — Belonging	Oxytocin	Bonding system	Am I connected?
V — Values	PFC (computed)	Prefrontal cortex	Is this fair?
C — Curiosity	Dopamine	SEEKING system	Do I understand?
F — Fear	Adrenaline	Amygdala	Am I safe?

Five of these channels (R, S, B, C, F) are independent subcortical systems with their own chemistry and brain regions. They process in parallel, not sequentially. The sixth channel, Values, is the computed output of the prefrontal cortex — the result of comparing the five inputs. However, this output itself generates prediction errors that feed back into the system.

2.2 Eleven Channel Resolution

Positive and negative states on each channel are neurochemically distinct experiences. Gaining resource (R+) is not simply the absence of losing resource (R−). This yields eleven channels: R+, R−, S+, S−, B+, B−, V+, V−, C+, C−, and F−. Fear-positive barely exists — safety is the absence of alarm, not a rich positive experience. This gives 5 × 2 + 1 = 11 distinct signal channels.

2.3 The Learning Equation

Learning at every synapse follows one equation:

Weight₂ = Weight₁ + (PE × Reliability × Plasticity)

Where PE is the prediction error (the difference between expected and actual outcome on a channel), Reliability is the frequency of prior confirmations, and Plasticity is a learning rate that decreases with age and certainty. This equation is local to each synapse. It requires no backward pass, no global loss function, and no gradient computation.

2.4 Two Numbers Per Connection

Every artificial neural network since Rosenblatt’s 1958 perceptron transmits one number per connection: the weight. This single number is the ratio of accumulated evidence, but the count of evidence — the frequency — is destroyed. A weight of 0.7 could result from 7 observations or 7 million. The system cannot distinguish them.

Biological neurons transmit spike trains that carry two signals simultaneously: the trend (the prediction error value) and the frequency (how many times the signal has been confirmed). Adding frequency to every connection eliminates hallucination, enables reliability computation, and makes backpropagation unnecessary.

2.5 Five Brains, Not Layers

The brain contains five parallel subcortical systems, each specialising in one channel, each with its own chemistry and memory, each processing the same event simultaneously from its own perspective. These five systems converge on the prefrontal cortex, which compares their outputs — weighted by personality and gated by reliability — and produces a decision.

This architecture is parallel, not sequential. Decision arises from comparison between five perspectives, not from deeper processing of one perspective. No layered neural network can decide, because decision requires multiple independent evaluations to compare.

• • •

3. MM Architecture

MM implements PCF as a language processing system. It reads text, extracts causal structures, accumulates them with frequency, and generates language — all explicitly, all auditably.

3.1 Three Collapses

Language is finite. MM enumerates it completely through three collapses:

Collapse	From	To	Ratio
Vocabulary	10,439 words	245 emotional tokens	43:1
Behaviours	2,941 verbs	1,036 unique structures	3:1
Grammar	All English sentences	79 patterns	Finite

The vocabulary collapse maps every emotionally meaningful word to a position on one or more of the six channels. The behaviour collapse maps every verb to an input-behaviour-outcome structure with eleven-channel values. The grammar collapse identifies 79 sentence patterns that cover all English constructions, each with an explicit extraction rule.

3.2 The World Model

The world model is a collection of generative causal laws:

IF [input state on channels] THEN [behaviour] PRODUCES [outcome on channels]
CONFIRMED [frequency] TIMES

The world model starts with 1,036 pre-loaded structures covering 22 categories with 101 properties. It is generative: it predicts outcomes for situations it has never encountered by matching the current channel signature to the nearest stored structure. It grows from reading — every sentence either confirms an existing structure (increasing frequency) or discovers a new one.

3.3 The Processing Pipeline

Grammar parser identifies the sentence pattern (1 of 79) and extracts subject, verb, object, and context.
Synonym library converts each word to channel values on six dimensions.
Context nouns blend with verb channels to shift emphasis (e.g., ‘helped with money’ shifts toward R+).
World model generates the expected outcome channels for this verb-noun combination.
Five brains predict independently: each sends expected value + reliability to PFC.
Prediction error is computed: PE = actual minus expected, on each of eleven channels.
Frequency is updated: confirmation increases it (excitation), contradiction decreases it (inhibition).
Cortex records the event against the relevant entity profile.
PFC compares channels, computes drive (largest expected value from acting), and updates the fairness assessment.

3.4 Generation

MM generates text by selecting, not predicting. Generation requires three choices: what to say (which structure), how to say it (which grammar pattern from 79), and which words to use (selected from the synonym library by channel value). Style is controlled by stylesheets that specify pattern distribution, sentence length, modifier vocabulary, and rhythm.

• • •

4. What MM Can Do That LLMs Cannot

4.1 Decide

Every event produces five channel signals, each carrying a reliability score. The five brains send expected values to the PFC, weighted by personality and reliability. The channel with the largest expected value from acting determines the drive. This is a computed decision, not a generated response.

4.2 Compute Fairness

Fairness Prediction Error (FPE) = PE_self minus PE_other, per channel. This equation is symmetrical. It has no bias term. There is no place in the arithmetic for prejudice to hide. A judge can check the equation. A child can understand it.

4.3 Know Its Own Reliability

Every claim MM makes carries a frequency. Frequency 3,000 means certain. Frequency 2 means guessing. Frequency 0 means silent. Frequency provides four epistemic capabilities: know what you know, know what you do not know, know when to change your mind, and know who to trust.

4.4 Learn Continuously

Every sentence MM reads is a learning event. Confirmations increase frequency. Contradictions decrease it. There is no training run. There are no epochs. Reading is training. MM knows when to stop learning: when frequency reaches threshold, the prediction error approaches zero and no further updates are needed.

4.5 Never Hallucinate

When MM does not know something, it says so — because frequency zero is a readable state, not an absence of information. There is no interpolation, no statistical inference, no generation of plausible but ungrounded claims.

4.6 Explain Every Output

Every output traces to: this word at this channel value, extracted by this grammar pattern, expressing this structure, learned from these sentences, confirmed this many times. The entire chain is auditable.

• • •

5. The Missing Number

In 1958, Frank Rosenblatt built the perceptron. Each connection transmitted one number: a weight. Every neural network since — every backpropagation network, every convolutional network, every transformer, every large language model — inherited that single number. Trillions of dollars of computation have been built on connections that transmit value without frequency.

The brain does not work this way. Every synapse transmits a spike train that carries two signals: the prediction error (the message) and the reliability (the confidence). Karl Friston identified that prediction errors carry precision but expressed it in variational calculus. Geoffrey Hinton identified that backpropagation works but never asked why the brain does not need it. The answer: the forward signal already carries reliability, so every synapse can adjust itself locally.

One number was missing from every artificial connection. Adding it back eliminates the need for backpropagation, makes hallucination impossible, enables reliability computation, and reduces learning from trillions of examples to single digits. The cost is one additional number per connection.

• • •

6. Efficiency Comparison

Dimension	LLM	MM
Token vocabulary	50,000	245 (emotional tokens)
Behaviour patterns	Implicit in weights	1,036 (explicit)
Grammar patterns	Implicit in weights	79 (explicit)
Loss function	1 question (Was I right?)	5 questions (right about what, how much, relative to expected, from whom, how reliable?)
Training data	13 trillion tokens	7 trials per structure
Learning mechanism	Backpropagation	Local frequency update
Hardware	Thousands of GPUs	One CPU
Training cost	~$100 million	$0
Model size	~3,600 GB	~0.6 MB
Reliability	Not computed	On every output
Auditability	Opaque	Every output traceable
Continuous learning	No (frozen)	Yes (every sentence)

• • •

7. System Components

Component	Function	Biological Equivalent
Synonym Library	10,439 words to channel values	Vocabulary (learned language)
Grammar Parser	79 patterns, structure extraction	Broca’s/Wernicke’s areas
World Model	1,036 generative causal laws	Cortex (stored experience)
Spike Train	PE + frequency on each connection	Synaptic transmission
Five Channel Nets	Parallel evaluation on R, S, B, C, F	Subcortical emotional systems
PFC Comparator	Compare channels, compute drive + fairness	Prefrontal cortex
Cortex Ledger	Entity profiles from accumulated evidence	Cortical memory
Mirror System	Simulate others’ channels, gate by belonging	Mirror neurons in PFC

All components use the same message format: a prediction error value and a frequency count. All learning follows the same equation. All connections carry two numbers instead of one.

• • •

8. Neuroscience Evidence

8.1 Five Independent Systems

Damage to each subcortical system produces a specific emotional deficit while leaving the others intact. Bilateral amygdala destruction (Patient SM) eliminates fear while preserving other emotions. Dopamine depletion in Parkinson’s eliminates curiosity. Severe early neglect disrupts oxytocin-mediated belonging. These independent damage profiles confirm five physically separate systems.

8.2 PFC as Comparator

Phineas Gage’s prefrontal cortex destruction preserved all emotions but eliminated social judgment. Damasio’s patient Elliot showed intact logic and intact emotion but could not make decisions after ventromedial PFC damage. The PFC does not generate emotion. It compares emotional signals and produces decisions.

8.3 The 80% Inhibition

Approximately 80% of neural activity is inhibitory. MM offers an explanation: 80% of incoming signals are unreliable. Inhibition is reliability filtering. The 20% that propagate are the ones the brain trusts. Thinking is what remains after unreliable signals have been filtered out.

8.4 Excitation and Inhibition Unified

A reliable signal (high frequency, consistent) excites — it propagates. An unreliable signal (low frequency, contradictory) inhibits — it is suppressed. Not two systems. One system where confirmation strengthens and contradiction weakens.

8.5 Wisdom and Plasticity

Children learn fast but believe everything (high plasticity, low reliability filtering). Adults learn slowly but filter carefully. The elderly speak from deep evidence but struggle with novelty. All follows from the single equation with plasticity as a decreasing parameter.

• • •

9. Applications

9.1 Legal Document Translation

Legal documents use complex grammar patterns to express simple causal structures. MM extracts these structures and re-expresses them in simple patterns. Every clause traces to the source. No clause is invented. No clause is missed. This provides access to justice: legal language translated to plain English without hallucination, with verifiable accuracy.

9.2 Medical Decision Support

Patient records contain emotional signals across channels: resource concerns (R), dignity concerns (S), isolation (B), confusion about diagnosis (C), and fear (F). MM reads intake forms and identifies which channels are in deficit, tracks trends over time, and flags deterioration — from words alone, on any device.

9.3 Educational Assessment

A child’s writing reveals channel states. Fear-dominant writing with collapsed curiosity signals disengagement. MM reads the child’s words, computes the channel state, and informs the teacher which channel needs attention — without surveillance, from text alone.

9.4 Automated Fairness Auditing

Any document can be assessed for fairness by computing the FPE across all affected parties. MM reads the document, identifies who benefits and who suffers on which channels, and produces an auditable report. The fairness assessment is arithmetic, not opinion.

9.5 Content Analysis

MM reads any text and identifies which channels the author is targeting and with what reliability. Fear-based messaging with no factual basis is detectable: the F channel is activated without corresponding evidence frequency. This provides a manipulation detector grounded in channel analysis.

9.6 Code Translation

Code grammar patterns map directly to MM’s 79 patterns: if/else is the conditional pattern, for/while is the habitual pattern, try/except is the concessive pattern. MM translates between plain English, legal English, and executable code through the same underlying structures, with structural equivalence verified across all three outputs.

9.7 Foreign Language Translation

The six channels are biological, not linguistic. Translation is a language-to-structure-to-language operation. The structure in the middle is language-independent channel values. Untranslatable words like ‘komorebi’ and ‘Schadenfreude’ have no word equivalents but have precise channel values. The channels capture what no single word can.

9.8 Adaptive Tutoring

MM reads the student’s channel state from their words in real time. When a student writes ‘I don’t understand this stupid subject,’ an LLM sees a request for explanation. MM sees S-:0.50 (shame) dominating over C-:0.40 (confusion). The problem is not comprehension — it is shame. MM addresses S- first, then teaches. It builds per-student channel profiles, tracks which teaching methods produce C+ for each individual, and monitors all six channels to detect blocking states. Learning does not happen when shame, anxiety, or isolation are present. MM detects these from word choice alone.

9.9 Call Centre Intelligence

A customer who says ‘I’ve been waiting three weeks for my refund and nobody has called me back’ presents R-:0.40 (money owed), B-:0.50 (ignored), and V-:0.60 (promise broken). The dominant deficit is V-, not R-. Processing the refund does not resolve the broken promise. MM identifies this in real time, advises the agent to address V- before R-, monitors the conversation for channel shifts, and enables channel-based call routing — matching agent strengths to customer deficits. After resolution, MM generates a channel-state summary showing which deficits were resolved and which require follow-up.

9.10 Text Analysis and Manipulation Detection

Every text has a channel signature revealing intent. A legitimate argument has C+ (evidence). A manipulation has F- without C+ (fear without facts). MM compares sources covering the same event: one may show F-:0.60 C+:0.05 (fear-based, no evidence) while another shows C+:0.40 F-:0.10 (evidence-based). The channel signatures reveal editorial strategy as arithmetic, not opinion.

9.11 Verbal Mathematics

LLMs fail at word problems because they confuse who has what, mix up givers and receivers, and miss negation. These are comprehension problems. MM extracts entities, behaviours, quantities, and relationships using its 79 grammar patterns, then hands the structured equation to a simple arithmetic module. The comprehension is explicit. The arithmetic is trivial.

9.12 Knowledge Librarian

Every book reduces to a channel profile at approximately 500 bytes. A million books is 500 megabytes. Queries become channel queries: ‘a book about overcoming fear’ matches narratives where F- resolves. ‘Books where the hero sacrifices status for love’ matches entities where S- and B+ co-occur. This works across languages because channel signatures are biological.

• • •

10. MM vs LLM: A Different Category

MM is not a better LLM. It is a different kind of machine.

Property	LLM	MM
Unit of knowledge	Token association	Causal structure (I-B-O)
Dimensions	1 (probability)	11 (biological channels)
Meaning	None (statistical)	Biological (PE channels)
Reliability	Lost in weights	Preserved as frequency
Explanation	Cannot explain	Every output traceable
Decision	Cannot decide	Largest expected value
Fairness	Cannot compute	FPE = mine − theirs
Learning	Batch (frozen)	Continuous (every sentence)
Hallucination	Inherent	Impossible (frequency = 0 = silent)
Hardware	Thousands of GPUs	Any CPU
Connection	One number (value)	Two numbers (value + frequency)
Architecture	96 sequential layers	5 parallel nets + comparator

LLMs process language. MM understands meaning. Processing language is pattern matching on tokens. Understanding meaning is computing emotional value on channels with reliability. Both produce coherent text. One can tell you why.

• • •

11. Conclusion

For sixty-seven years, every artificial neural network has transmitted one number per connection. This architectural decision, made in 1958, has shaped the entire field of artificial intelligence. Trillions of dollars of computation have been spent compensating for its consequences.

MM corrects this by adding one number to every connection: the frequency. This single addition enables a machine that decides from channel comparison, computes fairness from arithmetic, knows its own reliability from evidence counts, learns continuously from individual events, never hallucinates because unconfirmed claims carry zero frequency, and explains every output through defined channels, grammar patterns, and causal structures.

Language is finite: 245 emotional tokens, 1,036 behaviour patterns, 79 grammar patterns. Knowledge is finite: a bounded set of causal laws with frequency. The world model fits in 0.6 megabytes. It runs on any device. It requires no GPU, no cloud, no subscription, no corporate gatekeeper.

MM is not artificial general intelligence. It reads text, understands meaning, learns causal structures, and generates language — explicitly, auditably, and reliably. It does the one thing that no large language model can do at any scale: it knows what it knows, knows what it does not know, and can prove the difference.

• • •

References

Nash, S. (2026). Internet of Value: The Signal. Unpublished manuscript.

Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.

Panksepp, J. (1998). Affective Neuroscience: The Foundations of Human and Animal Emotions. Oxford University Press.

Damasio, A. (1994). Descartes’ Error: Emotion, Reason, and the Human Brain. Putnam.

Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65(6), 386–408.

Hinton, G., Rumelhart, D., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.

LeDoux, J. (1996). The Emotional Brain. Simon & Schuster.

Vaswani, A. et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.

Miller, G. A. (1956). The magical number seven, plus or minus two. Psychological Review, 63(2), 81–97.

MM: Meaning ModelA New Architecture for Artificial Intelligence

1. The Problem with Large Language Models

1.1 They Cannot Decide

1.2 They Cannot Compute Fairness

1.3 They Hallucinate

1.4 They Cannot Learn After Training

1.5 The Root Cause

2. Prediction Error Comparator Framework

2.1 Six Channels

2.2 Eleven Channel Resolution

2.3 The Learning Equation

2.4 Two Numbers Per Connection

2.5 Five Brains, Not Layers

3. MM Architecture

3.1 Three Collapses

3.2 The World Model

3.3 The Processing Pipeline

3.4 Generation

4. What MM Can Do That LLMs Cannot

4.1 Decide

4.2 Compute Fairness

4.3 Know Its Own Reliability

4.4 Learn Continuously

4.5 Never Hallucinate

4.6 Explain Every Output

5. The Missing Number

6. Efficiency Comparison

7. System Components

8. Neuroscience Evidence

8.1 Five Independent Systems

8.2 PFC as Comparator

8.3 The 80% Inhibition

8.4 Excitation and Inhibition Unified

8.5 Wisdom and Plasticity

9. Applications

9.1 Legal Document Translation

9.2 Medical Decision Support

9.3 Educational Assessment

9.4 Automated Fairness Auditing

9.5 Content Analysis

9.6 Code Translation

9.7 Foreign Language Translation

9.8 Adaptive Tutoring

9.9 Call Centre Intelligence

9.10 Text Analysis and Manipulation Detection

9.11 Verbal Mathematics

9.12 Knowledge Librarian

10. MM vs LLM: A Different Category

11. Conclusion

References