§8.2.11
GraphRAG-Bench 评估?
- §8.2RAGAS 指标(Faithfulness / Answer Relevancy / Context Precision / Context Recall / Context Entities Recall / Noise Sensitivity)?→
- §8.2ARES / RAGBench / CRAG-Bench / LegalBench-RAG?→
- §8.2DeepEval / TruLens / Phoenix 框架?→
- §8.2A/B 测试 + 在线人工标注闭环?→
- §8.2LLM-as-Judge 的偏见(position / length / verbosity)与缓解?→
- §8.1构造检索评估集(人工 / LLM 合成 / Hard Negative 挖掘)?→