▾ G11 Media Network: | ChannelCity | ImpresaCity | SecurityOpenLab | Italian Channel Awards | Italian Project Awards | Italian Security Awards | ...
InnovationOpenLab

Introducing Molmo: A Family of State-of-the-Art Open Multimodal Models

Today, the Allen Institute for AI (Ai2) announced the launch of Molmo, a family of state-of-the-art multimodal models. This family includes our best Molmo model, closing the gap between close and open...

Business Wire

Molmo goes beyond today’s most advanced multi-modal models by creating open models that can now point and act in the visual world

SEATTLE: Today, the Allen Institute for AI (Ai2) announced the launch of Molmo, a family of state-of-the-art multimodal models. This family includes our best Molmo model, closing the gap between close and open models, the most open and powerful multimodal model today, and the most efficient model. Currently, most advanced multimodal models can perceive the world and communicate with us, Molmo goes beyond that to enable one to act in their worlds, unlocking a whole new generation of capabilities, everything from sophisticated web agents to robotics.

Key capabilities of Molmo include:

  • Exceptional Image Understanding: Molmo can accurately understand a wide range of visual data, from everyday objects and signs to complex charts, messy whiteboards, clocks, and menus.
  • Actionable Insights: To bridge the gap between perception and action, Molmo models can point to what they perceive, empowering a wide range of capabilities that require spatial knowledge. Molmo can effortlessly point to UI elements on the screen, enabling developers to build web agents or robots that can navigate complex interactions both on screen and within the real-world.

Molmo is accessible to everyone:

  • Open: Molmo’s language and vision training data, fine tuning data, model weights, source code will all be open and available to the community.
  • Efficient: The Molmo training recipe and models are incredibly data efficient, requiring far less compute than before, making it accessible to the entire community.
  • Runs on device: The Molmo-1B model is tiny, fast and performant, small enough to fit on most devices.

Closing the gap between open and closed AI models

The accuracy and capability of Molmo models shows the gap between open and proprietary models is closing. The best in class 72B model within the Molmo family not only outperforms others in the class of open weight and data models, but also compares favorably against proprietary systems like GPT-4V, Claude 3.5 and Gemini 1.5.

Molmo was designed and built in the open and Ai2 will be releasing all model weights, captioning and fine-tuning data, and source code in the near future. Select model weights, inference code, and demo are available starting today. By sharing all data and code Ai2 continues to set the open standards for AI, providing open access to enable continued research and innovation in the AI community.

Smaller models are becoming as powerful as big

The Molmo family demonstrates that even smaller models (7B parameters) can perform as well as proprietary, more expensive alternatives. This approach lowers barriers to development and provides a robust foundation for the AI community to build innovative applications around Molmo’s unique capabilities. The Molmo family includes our most efficient model built with OLMo-E that has only 1 billion active parameters, making it suitable to be deployed to devices.

Molmo’s efficient and open multimodal data

Molmo leapfrogs model performance through efficient and creative use of data. Unlike recent multimodal LLMs that rely on massive webscale language-vision data, Molmo is trained using a meticulously curated set of slightly under 1 million images, demonstrating that a focused, efficient approach can yield superior results without the need for extensive computational resources.

The key innovation is a novel, highly-detailed image caption dataset collected entirely from human annotators using speech-based descriptions. To enable a wide array of user interactions, we also introduce a diverse dataset mixture including innovative 2D pointing data that enhances tasks like counting and creates a foundation for future directions in which VLMs enable agents to act by pointing in their environments. The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and most critically the quality of our newly collected datasets, all of which will be fully released.

Molmo is an incredible AI model with exceptional visual understanding, which pushes the frontier of AI development by introducing a paradigm for AI to interact with the world through pointing. The model's performance is driven by a remarkably high quality curated dataset to teach AI to understand images through text. The training is so much faster, cheaper, and simpler than what's done today, such that the open release of how it is built will empower the entire AI community, from startups to academic labs, to work at the frontier of AI development,” said Matt Deitke, Researcher at the Allen Institute for AI.

“Multimodal AI models are typically trained on billions of images. We have instead focussed on using extremely high quality data but at a scale that is 1000 times smaller. This has produced models that are as powerful as the best proprietary systems, but with fewer hallucinations and much faster to train, making our model far more accessible to the community,” said Ani Kembhavi, Senior Director of Research at the Allen Institute for AI.

Building Molmo for a Better AI Future

Molmo represents a critical step forward for the AI community. The combined power of capabilities that are actionable in the real-world operating at state-of-the-art performance in a model that is free, openly available, and efficient to deploy opens the possibility for all researchers, developers, and consumers to have access to use, build, and advance safe and openly available AI in our visual world.

Learn more: https://molmo.allenai.org/blog

Try now: https://molmo.allenai.org/

Fonte: Business Wire

If you liked this article and want to stay up to date with news from InnovationOpenLab.com subscribe to ours Free newsletter.

Related news

Last News

Sparkle and Telsy test Quantum Key Distribution in practice

Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…

Dronus gets a strategic investment by Eni Next

Eni's VC company invest in the Italian drone company to develop new solutions for industrial plants monitoring

Technology Reply wins the 2024 Oracle Partner Awards - Europe South Innovation

Oracle recognizes Technology Reply’s ability to develop and deliver pioneering solutions through partnering with Oracle

25 Italian Startups Will Be Present at Expand North Star 2024

Scheduled for October, the world's largest startup event will bring together more than 2,000 exhibitors in Dubai, UAE

Most read

HPE Positioned as a Leader for Seven Years Running in 2024 Gartner® Magic…

Hewlett Packard Enterprise (NYSE: HPE) today announced Gartner has recognized HPE as a Leader in the 2024 Gartner Magic Quadrant for SD-WAN. This is the…

Versa Again Scores Highest for Large Hybrid WAN Use Case in 2024 Gartner®…

#AI--Versa, the global leader in Universal Secure Access Service Edge (SASE), today announced that it has been ranked highest in the Large Hybrid WAN…

Versa Recognized as a Leader in the Gartner® Magic Quadrant™ for SD-WAN…

#AI--Versa, the global leader in Universal Secure Access Service Edge (SASE), today announced that Gartner has again positioned it as a Leader in the…

Nine World Wide Technology Employees Honored as Technology All-Stars and…

World Wide Technology (WWT), a global technology solutions provider, is celebrating nine of its employees today as they were honored in the 29th Annual…

Newsletter signup

Join our mailing list to get weekly updates delivered to your inbox.

Sign me up!