This week Amazon has released HBase on Amazon Elastic Map Reduce (EMR). As Jeff Barr explains:
AWS has already given you a lot of storage and processing options to choose from, and today we are adding a really important one. You can now use Apache HBase to store and process extremely large amounts of data (think billions of rows and millions of columns per row) on AWS.
Amazon’s decision to include HBase support into EMR is predicated on a following list of important features, cited by Barr:
1. Strictly consistent reads and writes.
2. High write throughput.
3. Automatic sharding of tables.
4. Efficient storage of sparse data.
5. Low-latency data access via in-memory operations.
6. Direct input and output to Hadoop jobs.
7. Integration with Apache Hive for SQL-like queries over HBase tables, joins, and JDBC support.
The version of HBase available on EMR is 0.92. According to Barr, the use cases, that where driving adoption of HBase in EMR include:
1. Support for Reference Data for Hadoop Analytics - Because HBase provides rapid access to stored data; it is a great way to store reference data that can be used by Hadoop jobs on either a single or across multiple Hadoop clusters.
2. Alternative Data Storage option for data Ingestion and Batch Analytics - due to its high write throughput and efficient storage of sparse data, HBase can handle real-time ingestion of large data volumes. Combined with support for sequential reads and highly optimized scans HBase provides a powerful tool for "close to real time" analytics.
3. Implementation of High Frequency Counters and Summary Data - build in support for strictly consistent reads and writes makes it an ideal platform. for storing counters and summary data. Map Reduce jobs can be used for calculation of complex aggregations such as max-min, sum, average, and group-by and the results of these jobs can be piped back into an HBase.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/301743/viewspace-733069/，如需转载，请注明出处，否则将追究法律责任。