Suggestions: Benchmark health dashboard

A view for benchmark authors showing how their benchmark lives in the wild: how many models report it, which splits and metric variants people actually use, its metadata completeness against the corpus median, and where reported scores diverge. It tells eval developers exactly what to standardize or document next.

1 vote

Tagged as New feature

Created 10 June by AK

Sign in to comment and vote. Sign in by email
10 June AK created this task