#vision-language

[ follow ]
fromPyImageSearch
5 days ago

Grounding DINO: Open Vocabulary Object Detection on Videos - PyImageSearch

Imagine asking a friend to find any object in a picture simply by describing it. This is the promise of open-set object detection: the ability to spot and localize arbitrary objects (even ones never seen in training) by name or description. Unlike a closed-set detector trained on a fixed list of classes (say, "cat", "dog", "car"), an open-set detector can handle new categories on the fly, simply from language cues.
Python
Apple
fromInfoQ
2 weeks ago

AnyLanguageModel: Unified API for Local and Cloud LLMs on Apple Platforms

AnyLanguageModel provides a unified Swift API enabling interchangeable use of local Core ML/MLX and remote cloud language models, supporting vision-language prompts and minimizing dependencies.
Artificial intelligence
fromHackernoon
6 months ago

Chameleon Sets New Benchmarks in AI Image-Text Tasks | HackerNoon

Chameleon sets a new standard for multimodal machine learning with a unified token-based architecture, improving reasoning across image and text.
[ Load more ]