NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Why most AI coding benchmarks are misleading (COMPASS paper) (arxiv.org)
jmeaden 230 days ago [-]
Hi, I’m one of the authors. Happy to answer questions about the dataset (LLM coding performance compared to 390k+ human submissions), the scoring approach, or the methodology behind COMPASS. Feedback and critique are welcome.
sieep 230 days ago [-]
Hello, I'm someone who does not have a background in CS, so my apologies for not being able to read the paper in-full. Is there any clear-cut strategy you would recommend to model developers so they can improve in not just correctness, but in quality & efficiency? I'm sure it's in the paper & I wish I could understand it in-depth.

If you don't mind me asking a more personal question, I would love to go back to uni for a master's in computer science & hopefully assist with papers like this one day. Do you have any advice for someone with industry CS experience (SWE) vs. academic to make the leap to the academic side? I genuinely love this kind of stuff and already make a decent living so it's not for money.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:21:04 GMT+0000 (UTC) with Wasmer Edge.