This is probably of interest to Enterprise 2.0:
Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems.
features:
file formats:
- Crawl information systems such as file systems, websites, mail boxes and mail servers
- Extract full-text and metadata from many common file formats
- View files in their native applications
- Ease of use: easy to learn, easy to code, easy to deploy in industrial projects
- Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms
- Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)
- Plain text
- HTML, XHTML
- XML
- PDF (Portable Document Format)
- RTF (Rich Text Format)
- Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher
- Microsoft Works
- OpenOffice 1.x: Writer, Calc, Impress, Draw
- StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
- OpenDocument (OpenOffice 2.x, StarOffice 8.x)
- Corel WordPerfect, Quattro, Presentations
- Emails (.eml files)

