AWS Announces General Availability of Amazon Textract
Amazon Textract uses machine learning to automatically extract text
and data, including from tables and forms, in virtually any document –
with no machine learning experience required
The Globe and Mail, MET Office, PwC, Healthfirst, UiPath, Teradact,
Ripcord, Kablamo, Vidado, BluePrism, and Alfresco among customers and
partners using Amazon Textract
SEATTLE–(BUSINESS WIRE)–Today, Amazon Web Services, Inc. (AWS), an Amazon.com company (NASDAQ:
AMZN), announced the general availability of Amazon Textract, a fully
managed service that uses machine learning to automatically extract text
and data, including from tables and forms, in virtually any document
without the need for manual review, custom code, or machine learning
experience. Amazon Textract goes beyond simple optical character
recognition (OCR) to identify the contents of fields in forms,
information stored in tables, and the context in which the information
is presented, such as a name or social security number from a tax form
or the product SKU or quantity in a warehouse from an inventory report.
The extracted text and data can be easily used to build smart searches
on large archives of documents, or can be loaded into a database for use
by applications, such as accounting, auditing, and compliance software.
Amazon Textract’s API supports multiple image formats like scans, PDFs,
and photos, and customers can use it with database and analytics
services like Amazon Elasticsearch Service, Amazon DynamoDB, and Amazon
Athena and other machine learning services like Amazon Comprehend,
Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker to
derive deeper meaning from the extracted text and data. To get started
with Amazon Textract, visit https://aws.amazon.com/textract.
Many companies extract text and data from files such as contracts,
expense reports, mortgage guarantees, fund prospectuses, tax documents,
hospital claims, and patient forms through manual data entry or simple
OCR software. This is a time-consuming and often inaccurate process that
produces an output requiring extensive post-processing before it can be
put in a format that is usable by other applications. That’s because
existing OCR technologies are unable to recognize common layouts like
forms and tables, and only generate a lengthy and often inaccurate text
dump. What organizations want instead is the ability to accurately
identify and extract text and data from forms and tables in documents of
any format and from a variety of file types and templates. Amazon
Textract analyzes virtually any type of document, automatically
generating highly accurate text, form, and table data. Amazon Textract
identifies text and data from tables and forms in documents – such as
line items and totals from a photographed receipt, tax information from
a W2, or values from a table in a scanned inventory report – and
recognizes a range of document formats, including those specific to
financial services, insurance, and healthcare, without requiring any
customization or human intervention. Amazon Textract makes it easy for
customers to accurately process millions of document pages in just a few
hours, significantly lowering document processing costs, and allowing
customers to focus on deriving business value from their text and data
instead of wasting time and effort on post-processing. Results are
delivered via an API that can be easily accessed and used without
requiring any machine learning experience.
“The power of Amazon Textract is that it accurately extracts text and
structured data from virtually any document with no machine learning
experience required. Subsequently, developers can analyze and query the
extracted text and data using our database and analytics services like
Amazon Elasticsearch Service, Amazon DynamoDB, and Amazon Athena and
integrate with other machine learning services like Amazon Comprehend,
Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker to
help customers derive deeper meaning from the extracted text and data,”
said Swami Sivasubramanian, Vice President, Amazon Machine Learning. “In
addition to the integration with other AWS services, the rich partner
community developing around Amazon Textract makes it possible for
customers to gain real meaning from their file collections, operate more
efficiently, improve security compliance, automate data entry, and
facilitate faster business decisions.”
Amazon Textract takes scanned files stored in an Amazon S3 bucket, reads
them, and returns data in the form of JSON text annotated with the page
number, section, form labels, and data types. This data can then be used
for a range of applications (e.g. generating smart search indexes,
redacting text in a massive collection of forms, creating automated loan
approval workflows, using the data for regulatory compliance, and
flagging fraud risk for insurance claims). Customers can load the data
into business software, such as spreadsheets, databases, and payroll
systems, or they can analyze and query the data using Amazon
ElasticSearch, Amazon DynamoDB, Amazon Redshift, or Amazon Athena.
Amazon Textract is available today in US East (Ohio), US East (N.
Virginia), US West (Oregon), EU (Ireland), and will expand to additional
regions in the coming year.
The Globe and Mail is a national icon and Canada’s most recognized media
brand. “As a news media company, we rely on many PDF or scanned-source
documents such as FOIs (freedom of information requests) that have
important information contained in tables that we previously couldn’t
access,” said Michael O’Neill, Managing Director of Digital and Data
Science at The Globe and Mail. “These documents have been under-utilized
because journalists were not able to access them easily or didn’t know
they existed. Using Amazon Textract, we are able to extract information
from tables in PDFs and easily output that data to CSV and offer easy
access to these documents by making them available for search queries by
our journalists. This increases efficient access to information for our
journalist by tenfold.”
Met Office is the UK’s national weather service, and is a world leader
in providing weather and climate services. “We hope to use
AmazonTextract to digitize millions of historical weather observations
from document archives,” said Philip Brohan, Climate Scientist at Met
Office. “Making these observations available to science will improve our
understanding of climate variability and change.”
PwC helps organizations and individuals create value by delivering
quality in assurance, tax, and advisory services. “At PwC, we work to
provide our customers with intelligent automation tools that help
transform previously manual processes. We’ve integrated Amazon Textract
into our solution for the pharmaceutical industry to automate document
processing for various FDA forms like MedWatch and CIOMS,” said
Siddhartha Bhattacharya of PwC. “Previously, people would manually
review, edit, and process these forms, each one taking hours. Amazon
Textract has proven to be the most efficient and accurate OCR solution
available for these forms, extracting all of the relevant information
for review and processing, and reducing time spent from hours to down to
minutes.”
Healthfirst is a not-for-profit managed care organization and one of the
fastest growing health plans in New York with over 1.4M diverse members
and a network of more than 35,000 providers and 4,500 employees. “At
Healthfirst, we are building data pipelines to turn scanned medical
charts into useful clinical information to improve care coordination,
drive quality outcomes, and ensure appropriate reimbursement for members
under our coverage,” said Steve Prewitt, Chief Analytics Officer at
Healthfirst. “We use Amazon Textract and Amazon Comprehend Medical to
glean real value from unstructured data sources in an efficient way,
resulting in revenue savings 10-20 times more than our usual downstream
operation. By scaling up to analyze over 50,000 charts, we can find
undocumented diagnoses and refer around 5,000 members for the care
management they need.”
Informed, Inc. automates how financial institutions originate loans and
open bank accounts. “We have already used Amazon Textract to analyze
tens of thousands of loan documents on behalf of financial institutions,
and our own software-as-a-service offering has been enhanced by the
service, enabling us to identify 95% of the defects in loan application
packages and help banks reduce their manual data entry,” said Justin
Wickett, Founder and CEO, Informed Inc. “Using Amazon Textract, our
software gives financial institutions real-time visibility into an
applicant’s income based off of their pay stubs, bank statements, tax
returns, and other financial documents. We plan to expand the types of
documents we analyze using Amazon Textract in order to enable financial
institutions to take advantage of our machine learning models and bring
real-time decision-making efficiency to today’s slow and manual process.”
Candor’s mission is to transform the archaic, time consuming process
that burdens the mortgage industry. “We use OCR to extract data from a
wide variety of lender-required documents to verify income, assets,
property value, and more. Until now, the best OCR solution read one page
at the rate of 38.4 seconds, but Amazon Textract achieves this in a
fraction of that time,” said Tom Showalter, Founder & CEO of Candor.
“We’ve been able to use Textract to accurately read complex, diverse
documents such as bank statements, pay stubs, and tax documents without
additional training or machine learning expertise, allowing our clients
to underwrite and close a loan in days, as opposed to weeks.”
UiPath is a leading Robotic Process Automation vendor providing a
complete software platform to help organizations efficiently automate
business processes. “Amazon Textract will further differentiate UiPath’s
robotic process automation platform by enhancing UiPath’s document
understanding capabilities, enabling our customers to unlock critical
business data from documents, transform that data into actionable
business insights, and deliver those insights into line-of-business and
operational systems,” said Param Kahlon, Chief Product Officer of UiPath.
TeraDact allows customers to transform stored images and paper documents
into privacy-compliant, usable digital formats at scale. “Amazon
Textract’s smart docs platform feeds TeraDact’s patented redaction
services to automatically remove and secure sensitive data. TeraDact
customers can permanently remove this data so that it can never be
recovered or opt to replace sensitive data with patented tokens which
can be recovered by individuals with the appropriate permissions. This
is particularly useful in complying with government mandates surrounding
individual data privacy such as GDPR,” said Tom Trobridge, COO, TeraDact.
Ripcord’s mission is to digitize and extract knowledge from paper
documents using vision-guided robotics, machine learning, and advanced
AI. This knowledge automates business processes and workflows. “We’ve
had tremendous success utilizing Amazon Textract to augment our advanced
entity extraction to benefit many industries and uncover $4 billion in
new pay. We look forward to expanding our use of Amazon Textract across
financial and government services, healthcare and legal,” said Alex
Fielding, CEO of Ripcord.
Blue Prism develops Robotic Process Automation software to provide
businesses and organizations with a more agile virtual workforce. “Blue
Prism’s connected-RPA can automate and perform mission-critical
processes, allowing customers the freedom to focus on more creative,
meaningful work. By using Amazon Textract, we’ve given our digital
workforce another powerful tool for automation. Amazon Textract
accurately analyzes data from various document types using machine
learning, which enhances the digital transformation journey for our
customers. Using additional AWS AI services like Amazon Comprehend and
Amazon Rekognition, we can tackle challenges from added secure customer
authentication processes to fraud detection capabilities. The
intelligence and flexibility of Amazon Textract’s form data extraction
can elevate OCR to new levels in industries like financial services,
retail, manufacturing and transportation to name a few,” said Dave Moss,
CTO and Co-Founder of Blue Prism.
About Amazon Web Services
For 13 years, Amazon Web Services has been the world’s most
comprehensive and broadly adopted cloud platform. AWS offers over 165
fully featured services for compute, storage, databases, networking,
analytics, robotics, machine learning and artificial intelligence (AI),
Internet of Things (IoT), mobile, security, hybrid, virtual and
augmented reality (VR and AR), media, and application development,
deployment, and management from 66 Availability Zones (AZs) within 21
geographic regions, spanning the U.S., Australia, Brazil, Canada, China,
France, Germany, Hong Kong Special Administrative Region, India,
Ireland, Japan, Korea, Singapore, Sweden, and the UK. Millions of
customers including the fastest-growing startups, largest enterprises,
and leading government agencies—trust AWS to power their infrastructure,
become more agile, and lower costs. To learn more about AWS, visit aws.amazon.com.
About Amazon
Amazon is guided by four principles: customer obsession rather than
competitor focus, passion for invention, commitment to operational
excellence, and long-term thinking. Customer reviews, 1-Click shopping,
personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle
Direct Publishing, Kindle, Fire tablets, Fire TV, Amazon Echo, and Alexa
are some of the products and services pioneered by Amazon. For more
information, visit amazon.com/about
and follow @AmazonNews.
Contacts
Amazon.com, Inc.
Media Hotline
Amazon-pr@amazon.com
www.amazon.com/pr