New Off-the-Shelf (OTS) Datasets from Appen Accelerate AI Deployment

High-quality datasets include scripted speech, images with text, body movement and human audio

SYDNEY & SAN FRANCISCO–(BUSINESS WIRE)–Appen Limited (ASX:APX), the leading provider of high-quality training data for organizations that build effective AI systems at scale, today announced new off-the-shelf (OTS) datasets. These datasets are designed to make it easier and faster for businesses to acquire the high-quality training data needed to accelerate their artificial intelligence (AI) and machine learning (ML) projects. The new OTS datasets include human body movement and innovative baby crying sounds, as well as scripted speech and images with text suitable for optical character recognition (OCR) for high-demand but hard-to-acquire languages, such as Arabic, Croatian, Greek, Hungarian, Thai and more. With the expanded datasets, Appen’s total OTS offering includes over 250 datasets, comprising of over 11,000 hours of audio, over 25,000 images and over 8.7 million words across 80 languages and multiple dialects.

Appen’s OTS datasets are a fast, cost-effective tool to jumpstart an AI or ML project with consistent high-quality training data. Teams expanding their AI capabilities can also leverage OTS datasets to effectively improve accuracy, develop new model skills and incorporate other improvements into their AI models. An OTS dataset is often delivered in one week, for example, compared to the eight to twelve weeks for a new dataset collection and annotation project – or even longer, depending on complexity. All Appen datasets are developed using a fully transparent, opt-in methodology, so AI specialists can be assured their data is clean and compliant, eliminating the potential risk of backlash and reputation damage.

“AI teams around the world working on projects with tight deadlines and flexible data requirements can benefit from using off-the-shelf datasets,” said Wilson Pang, CTO of Appen. “OTS datasets shorten time to value and provide access to high-quality data at a lower total cost than using traditional methods. We at Appen take the necessary steps to ensure that all our datasets are ethically sourced and demographically balanced, enabling companies to maintain responsible AI practices by minimizing bias in their models and ensuring fair treatment of data annotators. You always know the precise quality of an OTS dataset, which helps build better AI that works in the real world.”

MediaInterface has delivered language technology solutions to healthcare-related institutions in Germany and other parts of Europe for over 20 years. When the company was expanding into France, it had fully localized software but lacked French lexicon data, especially French names and places, which are often referenced in patient health information. Using Appen OTS datasets, MediaInterface acquired approximately 21,000 French names and 14,000 place names. “The critical data from Appen has been incorporated into our background lexicon to successfully launch in a new market, and this helps us build out new vocabularies for our clients and strengthen our approach for future market launches as well,” said Ines Wendler, product manager at MediaInterface.

The most experienced AI experts combine OTS datasets with on-demand data collection and annotation projects to meet their complex AI model training data needs. Appen is the leader in offering continued support through a range of specific data collection services, such as ongoing data annotation and smart labeling, through AI-powered tools and automated workflows to maximize efficiency.

“We interact with AI from the moment we wake up to the moment we go to bed – through virtual assistants, chatbots, search engines, social networks, medical devices, smart cars and other applications,” said Judith Bishop, Appen’s senior director of AI specialists, who leads a team of 100 AI linguists and language experts. “Language is often the primary interface for many of these compelling AI use cases, so to guarantee a great experience, the model needs to be trained to work for everyone. Appen’s commitment to high-quality data and responsible, ethical AI development allows companies purchasing our off-the-shelf datasets to accelerate their AI projects with complete confidence in their data.”

Joining the existing hundreds of datasets already live on appen.com, the list of new Appen OTS datasets that are now available includes:

Scripted speech for Arabic (Egypt), Arabic (Saudi Arabia), Arabic (United Arab Emirates), Central Khmer (Cambodia), Croatian, Greek, Hungarian, Polish, Spanish (Spain), and Turkish
Image OCR for Simplified Chinese printed text, Thai printed text, and Finnish printed text
- Includes pre-recorded billboards, outer packaging, signs, magazines, and menus to train and update computer vision OCR models
Human body movement (China)
- Includes annotated videos of people moving, tracked at pixel level, suitable for game development, fitness apps and more
Baby crying audio (China)
- Includes pre-recorded and annotated baby sounds that can be used to train AI models to recognize different crying sounds and alert parents

Availability

For more information and to request an Appen OTS dataset sample, visit https://appen.com/off-the-shelf-datasets/

About Appen Limited

Appen collects and labels images, text, speech, audio, video, and other data used to build and continuously improve the world’s most innovative artificial intelligence systems. Our expertise includes having a global crowd of over 1 million skilled contractors who speak over 235 languages, in over 70,000 locations and 170 countries, and the industry’s most advanced AI-assisted data annotation platform. Our reliable training data gives leaders in technology, automotive, financial services, retail, healthcare, and governments the confidence to deploy world-class AI products. Founded in 1996, Appen has customers and offices globally.

Contacts

Titus Capilnean

Director, Corporate Marketing

tcapilnean@appen.com

New Off-the-Shelf (OTS) Datasets from Appen Accelerate AI Deployment

Tencent Brings Together AI and Games to Help Preserve and Share Cultural Heritage of New UNESCO Site in Jingdezhen

Metavista3D Appoints Gregory Agostinelli as Chief Business Development Officer

Orbis Marks 30 Years of Advancing Eye Health in Vietnam Through Long-Term Partnership and Training

SIGGRAPH 2026 Unites Global Computer Graphics Community in Los Angeles With Landmark Keynotes, Inaugural Games Summit, and AI Innovation

X-Rite Pantone and Fashion Institute of Technology (FIT) Partner to Advance Digital Color Education

SimX Launches Scenario Editor, a No-Code Tool for Customizing VR Medical Simulations

Tencent Brings Together AI and Games to Help Preserve and Share Cultural Heritage of New UNESCO Site in Jingdezhen

Metavista3D Appoints Gregory Agostinelli as Chief Business Development Officer

Orbis Marks 30 Years of Advancing Eye Health in Vietnam Through Long-Term Partnership and Training

SIGGRAPH 2026 Unites Global Computer Graphics Community in Los Angeles With Landmark Keynotes, Inaugural Games Summit, and AI Innovation

X-Rite Pantone and Fashion Institute of Technology (FIT) Partner to Advance Digital Color Education

SimX Launches Scenario Editor, a No-Code Tool for Customizing VR Medical Simulations

THE LEGO GROUP INTRODUCES THE LEGO® SMART PLAY™ GATEWAY AT SAN DIEGO COMIC-CON 2026

iQIYI Premieres “Mystic Tales: The Spider Lady’s Vendetta”, Bringing China’s Supernatural Folklore to Life Through AIGC

Sell-It.Media Launches Groundbreaking Eyewitness Video Service to North American Newsrooms

Sony Electronics Launches the FX5, a Cinema Line Camera with a Newly Developed Image Sensor, Open Gate Recording, Internal RAW Recording, and Enhanced Operability

Constant Contact Launches “Great Needs Great,” Defining the Company as the AI Partner for Small Business Growth

Hollyland Launches Pyro 5 4K, Simplifying On-Set Setups and Improving Production Efficiency

Advanced AI Rigging and Motion Capture Pipeline Accelerates 3D Animation Production

Neil Anderson, Ph.D., Named Chief Revenue Officer at Myrias Optics

PARAMOUNT TO REPORT SECOND QUARTER 2026 FINANCIAL RESULTS ON AUGUST 4, 2026

Sonilo and fal Launch Sound Effects 1.0 for Realistic Sound Effects from Video and Text

The Secret to Healthy Video Gaming at Home: Asking Better Questions

Meshy Raises Nearly $400 Million at a $1.5 Billion Valuation, the Largest Round to Date in AI 3D

LG ELECTRONICS TEAMS UP WITH PRIME VIDEO TO POWER AN EPIC HOME PREMIERE OF MASTERS OF THE UNIVERSE

Alocity Announces Series A Funding, Fueled by Growing Demand for AI Operating Experiences in Physical Spaces

You may have missed

Tencent Brings Together AI and Games to Help Preserve and Share Cultural Heritage of New UNESCO Site in Jingdezhen

Metavista3D Appoints Gregory Agostinelli as Chief Business Development Officer

Orbis Marks 30 Years of Advancing Eye Health in Vietnam Through Long-Term Partnership and Training

SIGGRAPH 2026 Unites Global Computer Graphics Community in Los Angeles With Landmark Keynotes, Inaugural Games Summit, and AI Innovation

X-Rite Pantone and Fashion Institute of Technology (FIT) Partner to Advance Digital Color Education

More Stories

You may have missed