Tonic.ai Launches World’s First Secure Unstructured Data Lakehouse for LLMs

#AI--Tonic.ai, the San Francisco-based company pioneering data synthesis solutions for software and AI developers, today announced the launch of the world’s first secure data lakehouse for LLMs, Ton...

Business Wire

Tonic Textual eliminates data integration and data privacy challenges hindering enterprise adoption of generative AI.

SAN FRANCISCO: #AI--Tonic.ai, the San Francisco-based company pioneering data synthesis solutions for software and AI developers, today announced the launch of the world’s first secure data lakehouse for LLMs, Tonic Textual, to enable AI developers to seamlessly and securely leverage unstructured data for retrieval-augmented generation (RAG) systems and large language model (LLM) fine-tuning. Tonic Textual is an all-in-one data platform designed to eliminate integration and privacy challenges ahead of RAG ingestion or LLM training—two of the biggest bottlenecks hindering enterprise AI adoption. Leveraging its expertise in data management and realistic synthesis, Tonic.ai has developed a solution to tame and protect siloed, messy, and complex unstructured data into AI-ready formats ahead of embedding, fine-tuning, or vector database ingestion.

The Untapped Value of Unstructured Data

Enterprises are rapidly expanding investments in generative AI initiatives across their businesses, motivated by its transformational potential. Optimal deployments of the technology must leverage enterprises’ proprietary data, often stored in messy unstructured formats across various file types and containing sensitive information about customers, employees, and business secrets. The IDC estimates that approximately 90% of data generated by enterprises is unstructured, and, in 2023 alone, organizations were expected to generate upwards of 73,000 exabytes of unstructured data. To use unstructured data for AI initiatives, it must be extracted from siloed locations and standardized, a time-consuming process that monopolizes developer time. According to a 2023 IDC survey, 50% of companies have mostly or completely siloed unstructured data, and 40% of companies are still manually extracting information from the data.

“We’ve heard time and again from our enterprise customers that building scalable, secure unstructured data pipelines is a major blocker to releasing generative AI applications into production,” said Adam Kamor, Co-Founder and Head of Engineering, Tonic.ai. “Textual is specifically architected to meet the complexity, scale, and privacy demands of enterprise unstructured data and allows developers to spend more time on data science and less on data preparation, securely.”

The Importance of Privacy in AI

Particularly when using third-party model services, data privacy is paramount among enterprise decision makers—the same IDC survey reported that 46% of companies cite data privacy compliance as a top challenge in leveraging proprietary unstructured data in AI systems. Organizations must protect sensitive information in the data from model memorization and accidental exfiltration, or risk costly compliance violations.

“AI data privacy is a challenge the Tonic.ai team is uniquely positioned to solve due to their deep experience building privacy-preserving synthetic data solutions,” said George Mathew, Managing Director at Insight Partners. “As enterprises make inroads implementing AI systems as the backbone of their operations, Tonic.ai has built an innovative product in Textual to supply secured data that protects customer information and enables organizations to leverage AI responsibly.”

Introducing the Secure Data Lakehouse for LLMs

Tonic Textual is a first-of-its-kind data lakehouse for generative AI that can be used to seamlessly extract, govern, enrich, and deploy unstructured data for AI development. With Tonic Textual, you can:

Build, schedule, and automate unstructured data pipelines that extract and transform data into a standardized format convenient for embedding, ingesting into a vector database, or pre-training and fine-tuning LLMs. Textual supports the leading formats for unstructured free-text data out-of-the-box, including TXT, PDF, CSV, TIFF, JPG, PNG, JSON, DOCX and XLSX.

Automatically detect, classify, and redact sensitive information in unstructured data, and optionally re-seed redactions with synthetic data to maintain the semantic meaning of your data. Textual leverages proprietary named entity recognition (NER) models trained on a diverse data set spanning domains, formats, and contexts to ensure that sensitive data is identified and protected in any form it may take.

Enrich your vector database with document metadata and contextual entity tags to improve retrieval speed and context relevance in RAG systems.

Looking ahead, our roadmap includes plans to add capabilities that further simplify building generative AI systems on proprietary data without compromising privacy for utility, including:

Native SDK integrations with popular embedding models, vector databases, and AI developer platforms to create fully automated, end-to-end data pipelines that fuel AI systems with high-quality, secure data.
Additional capabilities for data cataloging, data classification, data quality management, data privacy and compliance reporting, and identity and access management to ensure organizations can utilize generative AI responsibly.
An expanded library of data connectors, including native integrations with cloud data lakes, object stores, cloud storage and file-sharing platforms, and enterprise SaaS applications, enabling AI systems to connect to data across the entire organization.

“Companies have amassed a staggering amount of unstructured data in the cloud over the last two decades; unfortunately, its complexity and the nascency of analytical methods have prevented its use,” said Oren Yunger, Managing Partner at Notable Capital. “Generative AI has finally unlocked the use case for that data, and Tonic.ai has stepped in to solve the complexity problem in a way that reflects its core mission to transform how businesses handle and leverage sensitive data while still enabling developers to do their best work.”

About Tonic.ai

Tonic.ai empowers developers while protecting customer privacy by enabling companies to create safe, synthetic versions of their data for use in software development, testing, and MLOps. Founded in 2018, with offices in San Francisco, Atlanta, New York, and London, the company is pioneering enterprise solutions for data de-identification, subsetting, and synthesis. Thousands of developers use data generated with Tonic.ai’s products on a daily basis to build their products faster in industries as wide ranging as healthcare, financial services, logistics, edtech, and e-commerce. Working with customers like eBay, Walgreens, Texas Capital Bank, and the NHL, Tonic.ai innovates to advance their goal of advocating for the privacy of individuals while enabling companies to do their best work. For more information, visit https://www.tonic.ai or follow @tonicfakedata on Twitter.

Notes to Editors

Source: IDC White Paper, Sponsored by Box Inc., “Untapped Value: What Every Executive Needs to Know About Unstructured Data,” Doc. US51128223, August 2023

Fonte: Business Wire

Last News

Sparkle works on environmentally sustainable content distribution

The Italian company partners with MainStreaming for high-performance, energy-efficient video streaming

Libraesva: being specialized is ok again in cybersecurity

Software vendors developing vertical solutions against specific attack vectors are 'cool' again. And when it comes to email security, all companies now…

Fintech: Links tests the use of exponential technologies in the banking…

Links Management and Technology just concluded the testing phase of a research project focused on banking transformation

Axyon AI: Italian Artificial Intelligence for Finance applications

Axyon AI offers an AI platform specifically designed for asset management, with several interesting strengths for those approaching machine/deep learning…

G11 Media Networks

InnovationOpenLab is a channel of BitCity, a newspaper registered at the court of Como ,
n. 21/2007 del 11/10/2007- Registration ROC n. 15698

G11 MEDIA S.R.L. Registered office Via NUOVA VALASSINA, 4 22046 MERONE (CO) - P.IVA/C.F.03062910132 Como business register n. 03062910132 - REA n. 293834 CAPITALE SOCIALE Euro 30.000 i.v.

ISG Announces Finalists for 2024 Women in Digital Awards

LambdaTest Unveils Live Inspect for Enhanced App Automation Testing

South Africa Data Center Colocation Market Supply & Demand Analysis 2024-2029 Featuring Digital Parks, OADC, Teraco, Vantage, Business Connexion, MTN, NTT, Vodacom Among Others - ResearchAndMarkets.com

Perficient to Announce Second Quarter 2024 Results on August 8

Introducing LEDGER FLEX — Easy, Secure Self-Custody, Free From Compromise

Bahrain Data Center Market Investment Analysis Report 2024-2029: Lucrative Growth Opportunities in IT, Electrical, Mechanical Infrastructure, General Construction, and tier standards - ResearchAndMarkets.com

Greece Prepaid Card and Digital Wallet Business Databook 2024: Market Size and Forecasts, Consumer Attitude, Behaviour, Retail Spend 2019-2028 - ResearchAndMarkets.com

2024 Recruitment Marketplaces Annual Report: Top 50 List Identifies the Largest Recruitment Marketplace and Classified Sites Worldwide - ResearchAndMarkets.com

Tonic.ai Launches World’s First Secure Unstructured Data Lakehouse for LLMs

Related news

Last News

Sparkle works on environmentally sustainable content distribution

Libraesva: being specialized is ok again in cybersecurity

Fintech: Links tests the use of exponential technologies in the banking…

Axyon AI: Italian Artificial Intelligence for Finance applications

Most read

Switzerland Existing & Upcoming Data Center Database 2024 - Emerging Data…

Supply Wisdom Risk Management Expert Available for Comment on Global IT…

Eaton names Tiffany Hanisch senior vice president, Internal Audit

Transact Campus Rolls Out Mobile Credential Technology at the University…

G11 Media Networks

Tonic.ai Launches World’s First Secure Unstructured Data Lakehouse for LLMs

Related news

Last News

Most read

Newsletter signup

G11 Media Networks