Home Finder Claw

Search/Eval dashboard

Eval and tracing

30 labeled queries across 5 categories (filter accuracy, semantic match, geo, Fair Housing guard, researcher quality). The runner executes them sequentially through the production agents and scores with deterministic rules. Stats below come from traces-v1 in OpenSearch.

Total traces

n/a

Total spans

n/a

Agents seen

Eval cases