Repo Quality Eval
The repo quality eval layer is not reinforcement learning and not a single reward score. It is a structured review layer that helps clients decide whether to ask more questions, fix a plan, block unsafe work, or proceed.
Fit In The Architecture
The MCP exposes pure tools and structured context:
build_quality_requirements_profileevaluate_repo_plan_qualityaudit_generated_repo_qualitysuggest_quality_followup_questionsrun_repo_quality_eval_scenarios
The harness or client orchestrator owns the decision loop:
- Interview the user.
- Build a requirements profile.
- Score proposed stack/repo plans.
- Ask one focused follow-up question when confidence is low.
- Generate the repo only after hard gates are clear.
- Audit the generated repo.
- Fix issues or present the final result with proof.
Hard Gates
These block or require fixes before generation/completion:
- Committed or hardcoded secrets.
- Required environment variables without
.env.example. - CI that only echoes success or does not run meaningful checks.
- Trivial, fake, or placeholder tests.
- Destructive commands without explicit approval.
- Unsafe or overbroad permissions.
- Missing setup instructions.
- Missing or vague
AGENTS.md. - Jargon-heavy explanations for non-technical users.
Rubric Dimensions
The scored dimensions are separate so agents cannot hide weakness behind one high number:
- Requirements fit.
- Simplicity.
- Maintainability.
- Security.
- CI and test quality.
- Documentation quality.
- Agent readiness.
- Suitability for non-technical users.
Reward-Hacking Guardrails
The evaluator explicitly warns about shallow success signals:
- CI passing is not enough when tests are weak or fake.
- A simple stack is not good if it cannot meet the user's goals.
- A complex stack must be justified by requirements.
- User preference cannot override security or maintainability gates.
Use this layer alongside /grill-me, review_proposed_file_plan, review_repo_structure, score_agent_artifacts, and review_agent_final_response.