MemSkill reframes LLM-agent memory operations as a learnable skill bank: an RL controller selects Top-K skills per span, an LLM designer periodically rewrites them from hard cases. But "self-evolving" overstates the test-time story both controller and bank are trained offline and frozen at deployment only per-trace memory updates online.
That's a good question! I wrote up a longer answer to your question at
The short version: yes, the recent reasoning-model training *internalizes* what used to be an inference-time external signals. Question is can we do it universally.
Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's external signal heuristic, exact-match, or test execution gates whether diagnosis fires. When that signal misfires, as on MBPP Python's high false-negative rate, Self-Reflection rewrites correct code wrong, exactly the failure mode Cannot-Self-Correct documented.
metacognitionCannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal about correctness. Across three benchmarks (GSM8K, CommonSenseQA, HotPotQA), the answer is no: the model's confidence carries over from the initial answer into the revision, and the self-correction loop tends to degrade rather than improve performance. The result refutes the class of approach Self-Refine belongs to.
In Self-Refine, a single frozen LLM acts as generator, critic, and rewriter in a prompt-only loop, and the paper reports about 20 points of average lift across seven tasks without any training, RL, or external signal. The gains vary widely by task: small on math reasoning, but large on dialogue and constrained generation, where what counts as "good" is hardest to define from a one-line critique.
This is a 3-paper arc on whether LLMs can reliably self-correct their own reasoning. Self-Refine proposes a naive intrinsic-feedback loop and reports impressive gains. Cannot-Self-Correct refutes empirically the class of approach Self-Refine belongs to. Reflexion threads the needle by gating self-correction on a reliable external signal.
Practice what you teach. Because teachings don't function as symbols or metaphorsthey are incarnations of what they advocate.
Promptbreeder claims "self-referential" prompt evolution the LLM mutates the prompts that mutate its task prompts. But the paper's own ablation shows the dominant operator is simpler: a fixed library of 39 generic "thinking-style" hints that seeds the initial population. Prompt-optimization has since moved from operator menus toward natural-language feedback signals (GEPA, MIPROv2).
GEPA optimizes prompts in compound AI systems by reading failed trajectories in natural language and editing the prompt of the module that caused the failure. Across six tasks it beats GRPO by 6% on average, up to 20%, with up to 35x fewer rollouts. Reflection extracts per-module diagnosis from a trajectory. GRPO collapses the same trajectory into one scalar and spreads it across every token.
SCoRe is a two-stage on-policy RL recipe that teaches a language model to revise its own answers using only self-generated data. On Gemini 1.5 Flash and 1.0 Pro it gains 15.6 points on MATH and 9.1 on HumanEval over the base model. At matched inference budgets, sequential self-correction beats parallel sampling up to 32 samples.
Anthropic trains Claude to read and verbalize its own activations. On SWE-bench Verified, it knows 'this is a test' 26% of the time while only verbalizes the observation 1%. What if NLA signals enter the future training data This "observer effect" could put a half-life on the 26%.
Hallucinations Undermine Trust Metacognition Is a Way Forward
LLM () , , . , ' ' . , LLM , .
8 posts about
Dtective prive : en qute de vrit 1/3, avec Margaux Duquesne SHOCKING #29
a fait 2 ans et 10 mois que je ne vous avais pas propos de srie SHOCKING ! Vous savez, ces au long cours dans lesquels jchange avec, soit une personne qui a questionn en profondeur ses croyances, soit une experte qui apporte un clairage indit sur la manire dont les humains pensent.
Teaser :
..AI vyhodnocujete tm nejtupm zpusobem - jestli sed jej odpove s tou "sprvnou"5 modles d'apprentissage avec l'IA qui introduisent des biais Par Roger Azevedo University of Central Florida extrait d'une confrence
I am trying to teach my 5 year old grandson to think for himself in this age of chatbots. I have literature reviews around the question of whether ai diminishes ability (yes) and what to do about it ( is one prophylactic). Here is a very short story suitable for a 5 year old.
Pourquoi sommes-nous si prompts condamner les actes d'autrui tout en excusant les ntres
Albert Moukheiber, docteur en et clinicien, nous explique l'erreur fondamentale d'attribution, un mcanisme de pense qui nous fait oublier que chacun possde une vie interne psychique complexe.
Caro et al. investigate cognition and metacognition in wild great tit parents deciding which chick to feed. They found that parents change their minds frequently, and the decision time varies with decision complexity and urgency.
Read now ahead of print!
Bart De Strooper presented at the Copenhagen AD/PD-conference an excellent sketch of the three main inflection points in the pathophysiological evolution of Alzheimer's disease,
My own transition from amyloid plagues to p-tau and tangles was retarded by a four years' anti-amyloid therapy in a clinical reaearch project during 2017-22 (aducanumab). Sadly, the most probable explanation for my rapidly worsening cognitive problems may indeed be the tau-tangles, which I somehow avoided earlier. I know there are experimental therapies around somewhere for those gremlins too, but sadly not within my own reach. With respect to my AD, I'm afraid, it's "too late, my friend".
I encourage anybody with a slowly lethal disease to keep mentally in touch with it as long as you can. That's what we human beings were made for.
fly51fly (fly51fly)
Meta-TTRL . , AI .
Les outils de dtournement de notre attention - MTA SHORT #15
fly51fly (fly51fly)
"Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models" (Seoul National University, 2026). , LLM . arXiv .
We keep shopping for "intelligence" like it's a luxury watch. Bigger vocabulary, faster processing speed. Hot takes delivered at 1.25x playback speed! We want the shiny metrics. Party tricks. The "look how many words I can juggle while being wrong!"
Meanwhile the actual top-shelf stuff, the thing psychologists circle like sharks, doesn't look impressive at brunch. Won't win debates on the internet. No polished newscaster voice.
Thats the laziest, most basic, and navest way to brute-force in the worst way possible, and folk call it a technique Use looping in your architecture dudes. Its not just for context management and retention. It can do so much more.
Maybe if we stopped handing out knowledge soup to LLMs (thanks to the widely accepted solution to combat overfitting) we wouldnt be burning down our planet in a dumpster fire.
Show GN: AI 100
AI AI . (Metacognition) , AI . AI , , , , , , .
Show GN: "AI " SOTA 9
AI ( ) . GPT-5.2, Claude Opus 4.6, Gemini 3 Pro 9 SOTA 15 100 , , .
L'arnaque du coaching de masse - MTA SHORT #14
Healing Trauma!
Unlock the secrets to healing trauma! Explore how emotional coherence and metacognition can help you process difficult experiences and transform your identity. Learn how long to feel emotions and when to move forward.
Visual confidence accurately tracks increasing internal noise with eccentricity in peripheral vision , perceptual confidence
Visual confidence accurately tracks increasing internal noise with eccentricity in peripheral vision , perceptual confidence
La science est-elle une croyance comme les autres  - MTA SHORT #13
Rcit dentre en cole Steiner-Waldorf - MTA SHORT #12
Writing about creativity demonstrated the four stages:
Preparation: Researching notes, AI conversations, layout
Incubation: Stepping away overnight
Illumination: Insights whilst writing (not before!)
Verification: Editing to test if ideas worked
The creative process is fractalapplies at every scale.
Made me more patient. Not everything is linear. Some projects need incubation. Some skip stages.
Les interdits du jhovisme - MTA SHORT #11
Practice what you teach. Because teachings don't function as symbols or metaphorsthey are incarnations of what they advocate.
Build the Life You Want: A Deep Dive into Brooks & Winfreys Blueprint for Lasting Joy
In a world obsessed with the pursuit of perpetual bliss, Arthur C. Brooks and Oprah Winfrey have delivered a much-needed reality check. Their collaborative masterpiece, Build the Life You Want: The Art and Science of Getting Happier, isnt just another self-help bookit is a research-backed... More details
Raz-de-mare sotrique en librairie STREAM #22