NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Show HN: RULER – Easily apply RL to any agent (openpipe.ai)
sadiq 17 hours ago [-]
Excellent, look forward to giving this a go.

I was looking at: https://arxiv.org/abs/2506.18254 but your approach is even more general.

kcorbitt 15 hours ago [-]
I really like RLPR for when you have a known-good answer to compare to as well!
spmurrayzzz 17 hours ago [-]
Might end up being some confusion with the RULER benchmark from NVIDIA given the (somewhat shared) domain: https://github.com/NVIDIA/RULER

EDIT: by shared I only mean the adjacency to LLMs/AI/ML, RL is a pretty big differentiator though and project looks great

kcorbitt 15 hours ago [-]
Dang, hadn't seen that. Namespace collision strikes again.
swyx 10 hours ago [-]
yeah unforutnately for you this is one of the well known long context benchmarks. too late tho, soldier on.
16 hours ago [-]
maxrmk 15 hours ago [-]
Very cool. Do you do anything to mitigate ordering bias in the evaluation function, or do you just expect it to average out over time?
kcorbitt 15 hours ago [-]
No, we don't do anything. Theoretically we could judge several times with different ordering.

We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!

16 hours ago [-]
swyx 10 hours ago [-]
how does o3 on the customer support agent task so dreadfully underperform qwen?
someoneontenet 18 hours ago [-]
Love these write ups!
kcorbitt 17 hours ago [-]
Thank! If there are any topics that you'd find particularly interesting, let me know and I can try to find time. :)
ndgold 16 hours ago [-]
Dope
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 14:27:05 GMT+0000 (UTC) with Wasmer Edge.