BlogMatrix
 

Aperture - a Java framework for getting data and metadata

edit David P. Janes 2006-09-20 10:38 UTC add comment  ·  ·  ·

This is probably of interest to Enterprise 2.0:

Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems.

features:

  • Crawl information systems such as file systems, websites, mail boxes and mail servers
  • Extract full-text and metadata from many common file formats
  • View files in their native applications
  • Ease of use: easy to learn, easy to code, easy to deploy in industrial projects
  • Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms
  • Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)
file formats:
  • Plain text
  • HTML, XHTML
  • XML
  • PDF (Portable Document Format)
  • RTF (Rich Text Format)
  • Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher
  • Microsoft Works
  • OpenOffice 1.x: Writer, Calc, Impress, Draw
  • StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
  • OpenDocument (OpenOffice 2.x, StarOffice 8.x)
  • Corel WordPerfect, Quattro, Presentations
  • Emails (.eml files)