Datasets  
What's New
 

Applied Media Analysis, Inc Releases Arabic Dataset 1.0 September 2007

Applied Media Analysis, Inc Releases V-Code Trial Software & Demos April 2007

Applied Media Analysis, Inc Releases Next Generation Barcode Scanner February 2007

Applied Media Analysis, Inc Releases Mobile Barcode SDK June 2006

Search

AMA Datasets:

Recently AMA has released a number of datasets being used to support the document analysis community. These datasets are available for licensed research purposes. In the tables below, you will find information about the datasets including a description of the content, cost, availability and how to purchase a license.

We offer three types of licenses:
Academic: An academic license is available to students, researchers, and faculty of academic institutions and entitles the holder to use that data at a single facility or campus.
Government: A government license will be granted to any member of the US Federal government actively engaged in research related to the dataset contents. If government personnel desire to use the data in their funded research programs, they should contact AMA for terms.
Standard: All other individuals and organizations using the data for research purposes.

Upon receiving your payment, we will mail physical media containing the dataset to you. You will also be entitled to all dataset updates and tools for viewing and manipulating the metadata (these will be distributed electronically). Available datasets:

If you are interested in additional datasets or a custom dataset, please Contact Us

Dataset Name Arabic-Handwritten-1.0
Version 1.0
Release Date September 1, 2007
Number of Images/Pages 5000 Images
Ground Truth Part-of-Word (PAW) level
Description 200 unique Arabic handwritten documents transcribed by 25 writers using various writing utensils. Documents included: memos, diagrams, poems, forms, and number lists in English and Indic
Samples Arabic Dataset Samples
Additional Information Arabic Dataset Spec Sheet
Cost
    Academic: $500
    Standard: $1500
Purchase

Dataset Name ComplexDocuments-1.0
Version 1.0
Release Date October 1, 2007
Number of Images/Pages 10,000 Images
Ground Truth Zone Level Labels
Description 10,000 pages of complex documents and labeled at the zone level. Includes Machine/Handwritten, Logos, Signatures, Stamps, Forms, Tables, Figures, Graphics, Images, Rule Lines, and Markup
Samples  
Additional Information
Cost TBD
Purchase

Dataset Name MultilingualDocuments-1.0
Version 1.0
Release Date November 1, 2007
Number of Images/Pages 100,000
Ground Truth Line, Word, and Character Zones for Machine Printed Data
Description Handwritten and Machine printed source documents with a variety of live and synthetic degradations
Samples  
Additional Information
Cost TBD
Purchase
Home | Technology | Datasets | Services | Commercialization | About Us | Contacts | Site Map
Copyright © 2006-2007 Applied Media Analysis, Inc - All Rights Reserved.