Our search-mad scientists have toiled away to bring...
Hypertable is a high performance distributed data storage system designed to support applications requiring maximum performance, scalability, and reliability.
The Hypertable Open-Source Project is for the design and implementation of a high performance, scalable, distributed storage and processing system for structured and unstructured data. The goal is to bring new levels of both performance and scale to many data-driven organizations, which are currently limited by previous generation platforms.
Modeled after Google's well known Bigtable project, Hypertable is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures. Hypertable seeks to set the open-source standard for highly available, petabyte scale, database systems.
Hypertable is based on published best practices and our own experience in solving large-scale data-intensive tasks. Zvents has sponsored the Hypertable project with the initial authors and contributors for its own local search and local ad targeting solutions.
Visit Hypertable.org for more information.
Heritrix Hadoop DFS Writer Processor
Heritrix Hadoop DFS Writer Processor is an extension to the Internet Archive's "Heritrix" crawler that enables it to store crawled content directly into the Hadoop Distributed FileSystem (HDFS) in SequenceFile format. This is directly supported by the Map-Reduce framework and has support for compression. The source and binary package comes with Java classes to parse the crawled document format and contains several working Map-Reduce programs, including one to perform link counting.
Heritrix Hadoop DFS Writer Processor: