§8.1.5
LoCoBench / LongEmbed 长文档基准?
- §8.1构造检索评估集(人工 / LLM 合成 / Hard Negative 挖掘)?→
- §8.1BEIR / MTEB / MIRACL / MLDR 基准?→
- §8.1Recall@K / Precision@K / MRR / NDCG / Hit Rate / MAP?→
- §8.1BRIGHT 推理密集检索基准?→
- §8.2RAGAS 指标(Faithfulness / Answer Relevancy / Context Precision / Context Recall / Context Entities Recall / Noise Sensitivity)?→
- §8.2ARES / RAGBench / CRAG-Bench / LegalBench-RAG?→