we've been thinking about verification for a *long* time at
@gensynai, it's been very interesting to see attention increase and subsequently bifurcate recently:
EXECUTION VERIFICATION
i.e. do I trust the correctness of this computation - has it been executed as specified?
REWARD VERIFICATION
i.e. do I trust the correctness of this answer - does it deserve the reward signal?
in many ways these two problems are interlinked, without execution verification it's very hard to achieve reward verification. Typically we solve execution verification by using human-world trust mechanisms (either a human themselves through supervised learning or a function created by a human executing on trusted hardware owned by a company with a contract to perform the correct work and a reputation to uphold)
neither of those two approaches scale well at all, they're incredibly expensive in terms of overall energy cost (human brainpower, bureaucracy, money, redundant hardware, electricity, etc..). For the next era of scaling, we need to be able to create trust mechanisms that can verify and arbitrate using only electricity - allowing machines to establish execution verification as a base primitive
once we have execution verification solved for arbitrary operations, then we can create reward structures through competitive market forces that are implemented by machines - those market structures can incentivise progress towards the creation of machines that do two things:
1. digital knowledge curation (i.e. generalised compression of all analog data into parameter space); and
2. digital reasoning (i.e. take multi-step actions based on that knowledge within the full digital, and subsequently physical through embodiment by robots and reward-incentivised humans, environments)
the last era of ML scaling (the OAI era) came from vertical scaling of imperative learning algos.
the next era of ML scaling will come from emergent intelligence over infinitely horizontally scalable primitives defined as protocol standards.
AI PROMPTING → AI VERIFYING
AI prompting scales, because prompting is just typing.
But AI verifying doesn’t scale, because verifying AI output involves much more than just typing.
Sometimes you can verify by eye, which is why AI is great for frontend, images, and video. But for anything subtle, you need to read the code or text deeply — and that means knowing the topic well enough to correct the AI.
Researchers are well aware of this, which is why there’s so much work on evals and hallucination.
However, the concept of verification as the bottleneck for AI users is under-discussed. Yes, you can try formal verification, or critic models where one AI checks another, or other techniques. But to even be aware of the issue as a first class problem is half the battle.
For users: AI verifying is as important as AI prompting.