Recently AMA has released a number of datasets
being used to support the document analysis community. These
datasets are available for licensed research purposes. In the
tables below, you will find information about the datasets including
a description of the content, cost, availability and how to purchase
a license.
We offer three types of licenses: Academic: An academic license is available to students, researchers,
and faculty of academic institutions and entitles the holder
to use that data at a single facility or campus. Government: A government license will be granted to any member
of the US Federal government actively engaged in research related
to the dataset contents. If government personnel desire to use
the data in their funded research programs, they should contact
AMA for terms. Standard: All other individuals and organizations using the data
for research purposes.
Upon receiving your payment, we will mail physical media containing
the dataset to you. You will also
be entitled to all dataset updates and tools for viewing and
manipulating the metadata (these will be distributed electronically). Available datasets:
If you are interested in additional datasets or a custom dataset,
please Contact Us
Dataset Name
Arabic-Handwritten-1.0
Version
1.0
Release Date
September 1, 2007
Number of Images/Pages
5000 Images
Ground Truth
Part-of-Word (PAW) level
Description
200 unique Arabic handwritten documents transcribed by
25 writers using various writing utensils. Documents included:
memos, diagrams, poems, forms, and number lists in English and
Indic
10,000 pages of complex documents and labeled at the zone level.
Includes Machine/Handwritten, Logos, Signatures, Stamps, Forms,
Tables, Figures, Graphics, Images, Rule Lines, and Markup