"The system learns from feedback" is one of those phrases that sounds better than it is.
Feedback does not help unless the system has somewhere to put it.
If a human edits an answer and the edit disappears into the final document, the system did not learn. A person cleaned up after it. That may still be useful, but it is not a feedback loop.
Job
The feedback layer converts reviewed outcomes into system changes.
It should decide what kind of change the correction represents.
Inputs
The input is not just "thumbs up" or "thumbs down." It should include:
- original request
- retrieved context
- skill or route used
- draft output
- human edits
- reviewer notes
- final accepted output
- whether the failure repeated a known pattern
Without that trace, feedback becomes sentiment.
Processing
Reviewed corrections should land in one of four places:
- Memory update: a fact, decision, preference, or source trail changed.
- Skill update: the system repeatedly misunderstood the task or output shape.
- Eval case: the failure should become a small test before the workflow expands.
- Routing rule: this type of request should go somewhere else next time.
The system should not update all four by default. Most corrections have one primary cause. Treating every edit as a memory problem makes the memory layer noisy. Treating every edit as a prompt problem hides bad retrieval.
Output
The output is a proposed system change.
For personal systems, that might be a memory update waiting for approval. For team systems, it might be a review queue item, a changed skill instruction, or an eval case attached to a workflow.
Review Question
The review question is: what would need to change so this failure is less likely next time?
That is more useful than asking whether the output was good.
Failure Mode
The failure mode is performative feedback.
The interface collects ratings, but no one can say what changed because of them. The system still fails in the same way next week. The humans learn to stop correcting it carefully because correction does not compound.
That is when the agent remains a demo instead of becoming infrastructure.