§8.2.9
LLM-as-Judge 的偏见(position / length / verbosity)与缓解?
- §8.2RAGAS 指标(Faithfulness / Answer Relevancy / Context Precision / Context Recall / Context Entities Recall / Noise Sensitivity)?→
- §8.2ARES / RAGBench / CRAG-Bench / LegalBench-RAG?→
- §8.2DeepEval / TruLens / Phoenix 框架?→
- §8.2A/B 测试 + 在线人工标注闭环?→
- §8.2GraphRAG-Bench 评估?→
- §8.1构造检索评估集(人工 / LLM 合成 / Hard Negative 挖掘)?→