Next.js App Router + React Server Components Demo

NHacker Next

new
past
show
ask
show
jobs
submit

▲Show HN: I created a PoC for live descriptions of the surroundings for the blind (github.com)

73 points by o40 5 days ago | 25 comments

biosboiii 4 days ago [-]

Check out my reverse-engineering/cracking of Microsoft's App just doing that, SeeingAI.

https://altayakkus.substack.com/p/you-wouldnt-download-an-ai

tr33house 4 days ago [-]

This was a great read. At this point, any org should assume on-device models are public

biosboiii 4 days ago [-]

Thanx! Yeah, they should :) Would love to do this with CoreML on Apple devices, but my newest iPhone is a 7.

But if you subscribe, you may see me doing the same with a surveillance camera soon(isch) :)

miki123211 4 days ago [-]

Blind person here.

I don't see a point to this over just using a cell phone app to do this, which are slowly starting to appear.

o40 4 days ago [-]

Yes, apps for this is for sure the best solution. Hopefully something like "Be My AI" in combination with consumer products such as Ray-Ban Meta, where you can get descriptions without telling the world that you are requesting descriptions.

I have not done any app development, and for this project I wanted to keep some things simple to focus on what can be expected from a low quality camera in combination with AI for descriptions.

oulipo 4 days ago [-]

Hi! Could you tell me what are your favorite devices / apps to get descriptions of scenery? Are you a coder? Would you point me to the best setup for coding for a blind person? Thanks

Someone 4 days ago [-]

> It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it

I think you need to triple-check whether users actually find that nice.

Assuming that keeping the text limited to what interests the user will stay an unsolved problem for the foreseeable future, I guesspect that they prefer a middle ground where they aren’t continuously bombarded with text, but it’s easy to get that flow going. For example, having that text feed on only while a button is being held down.

I guesspect that because I think users would soon be fed up with an assistent that says there’s a painting on the wall or a church tower in the distance every time they turn their head.

Both can be useful information, but not when you hear them for the thousandth times while in your own house/garden.

o40 4 days ago [-]

Yes, repeated information is not great in many cases. A more advanced system could possibly keep track of which information is new and which information is already known to the user.

I wanted to create something opposite of needing to say "Hey Google, describe what is in front of me" or similar. Also a point was to see how cheap/simple you can go and still get valuable information.

nels 4 days ago [-]

Nice work! You may be interested in a paper that explored a similar concept and included some interesting ways of dealing with latency called WorldScribe: https://worldscribe.org.

o40 4 days ago [-]

Very interesting. This in combination with something that "tracks" described object not needing to describe them again would be a game changer.

leshokunin 4 days ago [-]

Wonderful effort. Congrats and I hope this keeps developing forward.

three2a88 4 days ago [-]

au revoir

https://www.youtube.com/watch?v=Wuntz3KDIAk

oniony 3 days ago [-]

I love how the descriptions after the prompt was fixed now read like the descriptions of the scenes in the 1982 video game The Hobbit.

rkagerer 4 days ago [-]

Interesting, I had no idea there were "Sight as a Service" offerings.

lionkor 4 days ago [-]

I love abbreviations! Is it a point of care? A piece of crap? A proof of concept? All of them would work :)

rad_gruchalski 4 days ago [-]

Proof of concept. Don’t pay at a POS.

MrVandemar 4 days ago [-]

Did you consult with your target audience — ie. blind or low-vision people — before or during development?

o40 4 days ago [-]

Yes. My partner is visually impaired so that is one of the reasons why I think this is interesting to investigate. The current solution is way to "janky" to actually use, but gives insight in the problem to solve.

My hope is that there will be "cheap" camera glasses that you can use different services for image descriptions. There is a company called "Be My Eyes" that is developing an AI tool for image descriptions, which probably is miles better than anything I can come up with. https://www.bemyeyes.com/blog/introducing-be-my-ai

Be My Eyes seem to support Ray-Ban Meta glasses, so hopefully "Be My AI" will too.

I understand the "not consulting the target audience" all too well, for instance braille signs that are at eye-level and is hard to find. Some workplaces is very keen to make accessibility adjustments, but mostly if they are seen so that they can show others that adjustments have been done, regardless if they actually help or not.

MrVandemar 4 days ago [-]

I commend you then. Accessibility is unfortunatley too often done without consultation with the community it is supposed to benefit.

xnx 4 days ago [-]

Neat. So this is like the free Google Lookout app but more emphasis on the scene than objects.

rusty_venture 4 days ago [-]

Does it say "I am a lamp. I am a lamp."?

smitty1e 4 days ago [-]

PoC means "point of care" in this context?

pockybum522 4 days ago [-]

Proof of Concept

smitty1e 4 days ago [-]

Thanks.

4 days ago [-]

sajb 4 days ago [-]

"You are likely to be eaten by a grue."

Rendered at 10:01:59 GMT+0000 (UTC) with Wasmer Edge.