§8.1.2
构造检索评估集(人工 / LLM 合成 / Hard Negative 挖掘)?
- §8.1BEIR / MTEB / MIRACL / MLDR 基准?→
- §8.1Recall@K / Precision@K / MRR / NDCG / Hit Rate / MAP?→
- §8.1BRIGHT 推理密集检索基准?→
- §8.1LoCoBench / LongEmbed 长文档基准?→
- §8.2RAGAS 指标(Faithfulness / Answer Relevancy / Context Precision / Context Recall / Context Entities Recall / Noise Sensitivity)?→
- §8.2ARES / RAGBench / CRAG-Bench / LegalBench-RAG?→