From OSF Wiki
Jump to: navigation, search


For installation instructions and basic tagging concepts, see the Installing GATE.

GATE Intro


Guides / Tutorials


Web Services

Projects Using GATE

Digital Pebble

Open Sahara

  • Open Sahara is a framework and infrastructure to grab information from the web, to classify it's content, to search it semantically and to distribute the results. It's components are driven by the best technology available and completely open source. The open data structure enable users to modify Open Sahara to their own needs and standards.
  • Open Sahara delivers several interfaces for application builders to use the annotated content or to add new content streams or annotation sets to the backbone. As a result of the unique approach of Open Sahara, all relevant information is linked on the fly and presented as a fully standardized information stream. Ready to use as a feed for totally new information products.
  • Open Sahara started to harvest all relevant content about Amsterdam, the capital of the Netherlands. All content available from the city council, citizen service desk, public transportation, the police, news and user-generated content has been gathered, indexed, annotated and related with Open Street View, DBPedia and other relevant source in the linked open data cloud.
  • In March we will present the first results of our work. The first release of our 'fun app' is scheduled for the first week of April.
  • flow.jpg

American National Corpus

Java Access

Other Related

  • Tika: Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
   * HyperText Markup Language
   * XML and derived formats
   * Microsoft Office document formats
   * OpenDocument Format
   * Portable Document Format
   * Electronic Publication Format
   * Rich Text Format
   * Compression and packaging formats
   * Text formats
   * Audio formats
   * Image formats
   * Video formats
   * Java class files and archives
   * The mbox format