Today, the Allen Institute for AI (Ai2) announced the launch of Molmo, a family of state-of-the-art multimodal models. This family includes our best Molmo model, closing the gap between close and open...
Molmo goes beyond today’s most advanced multi-modal models by creating open models that can now point and act in the visual world
SEATTLE: Today, the Allen Institute for AI (Ai2) announced the launch of Molmo, a family of state-of-the-art multimodal models. This family includes our best Molmo model, closing the gap between close and open models, the most open and powerful multimodal model today, and the most efficient model. Currently, most advanced multimodal models can perceive the world and communicate with us, Molmo goes beyond that to enable one to act in their worlds, unlocking a whole new generation of capabilities, everything from sophisticated web agents to robotics.
Key capabilities of Molmo include:
Molmo is accessible to everyone:
Closing the gap between open and closed AI models
The accuracy and capability of Molmo models shows the gap between open and proprietary models is closing. The best in class 72B model within the Molmo family not only outperforms others in the class of open weight and data models, but also compares favorably against proprietary systems like GPT-4V, Claude 3.5 and Gemini 1.5.
Molmo was designed and built in the open and Ai2 will be releasing all model weights, captioning and fine-tuning data, and source code in the near future. Select model weights, inference code, and demo are available starting today. By sharing all data and code Ai2 continues to set the open standards for AI, providing open access to enable continued research and innovation in the AI community.
Smaller models are becoming as powerful as big
The Molmo family demonstrates that even smaller models (7B parameters) can perform as well as proprietary, more expensive alternatives. This approach lowers barriers to development and provides a robust foundation for the AI community to build innovative applications around Molmo’s unique capabilities. The Molmo family includes our most efficient model built with OLMo-E that has only 1 billion active parameters, making it suitable to be deployed to devices.
Molmo’s efficient and open multimodal data
Molmo leapfrogs model performance through efficient and creative use of data. Unlike recent multimodal LLMs that rely on massive webscale language-vision data, Molmo is trained using a meticulously curated set of slightly under 1 million images, demonstrating that a focused, efficient approach can yield superior results without the need for extensive computational resources.
The key innovation is a novel, highly-detailed image caption dataset collected entirely from human annotators using speech-based descriptions. To enable a wide array of user interactions, we also introduce a diverse dataset mixture including innovative 2D pointing data that enhances tasks like counting and creates a foundation for future directions in which VLMs enable agents to act by pointing in their environments. The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and most critically the quality of our newly collected datasets, all of which will be fully released.
“Molmo is an incredible AI model with exceptional visual understanding, which pushes the frontier of AI development by introducing a paradigm for AI to interact with the world through pointing. The model's performance is driven by a remarkably high quality curated dataset to teach AI to understand images through text. The training is so much faster, cheaper, and simpler than what's done today, such that the open release of how it is built will empower the entire AI community, from startups to academic labs, to work at the frontier of AI development,” said Matt Deitke, Researcher at the Allen Institute for AI.
“Multimodal AI models are typically trained on billions of images. We have instead focussed on using extremely high quality data but at a scale that is 1000 times smaller. This has produced models that are as powerful as the best proprietary systems, but with fewer hallucinations and much faster to train, making our model far more accessible to the community,” said Ani Kembhavi, Senior Director of Research at the Allen Institute for AI.
Building Molmo for a Better AI Future
Molmo represents a critical step forward for the AI community. The combined power of capabilities that are actionable in the real-world operating at state-of-the-art performance in a model that is free, openly available, and efficient to deploy opens the possibility for all researchers, developers, and consumers to have access to use, build, and advance safe and openly available AI in our visual world.
Learn more: https://molmo.allenai.org/blog
Try now: https://molmo.allenai.org/
Fonte: Business Wire
Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…
Eni's VC company invest in the Italian drone company to develop new solutions for industrial plants monitoring
Oracle recognizes Technology Reply’s ability to develop and deliver pioneering solutions through partnering with Oracle
Scheduled for October, the world's largest startup event will bring together more than 2,000 exhibitors in Dubai, UAE
Hewlett Packard Enterprise (NYSE: HPE) today announced Gartner has recognized HPE as a Leader in the 2024 Gartner Magic Quadrant for SD-WAN. This is the…
#AI--Versa, the global leader in Universal Secure Access Service Edge (SASE), today announced that it has been ranked highest in the Large Hybrid WAN…
#AI--Versa, the global leader in Universal Secure Access Service Edge (SASE), today announced that Gartner has again positioned it as a Leader in the…
World Wide Technology (WWT), a global technology solutions provider, is celebrating nine of its employees today as they were honored in the 29th Annual…