ITPub博客

首页 > Linux操作系统 > Linux操作系统 > Cassandra-0.7.0-beta1中的新特性

Cassandra-0.7.0-beta1中的新特性

原创 Linux操作系统 作者:gpcuster 时间:2011-06-16 10:58:35 0 删除 编辑

前一阵子Cassandra-0.7.0-beta1发布了,今天把代码拿下来粗略浏览了一下,发现主要有以下几点变化:


1 数据模型中的Keyspace和ColumnFamily可以动态修改:

之前的版本中,如果想在Cassandra中修改Keyspace和ColumnFamily,必须先停掉Cassandra,然后修改配置文件,最后再重启Cassandra才能生效。

在现在的版本中,我们只需要定义新的Keyspace和ColumnFamily,然后再调用Thrift接口将新的Keyspace和ColumnFamily定义发送给Cassandra即可。

相关的结构体和接口定义可以在cassandra.thrift文件中找到:

/* 相关结构体定义. */

/* describes a column in a column family. */
struct ColumnDef {
    1: required binary name,
    2: required string validation_class,
    3: optional IndexType index_type,
    4: optional string index_name
}

/* describes a column family. */
struct CfDef {
    1: required string keyspace,
    2: required string name,
    3: optional string column_type="Standard",
    4: optional string clock_type="Timestamp",
    5: optional string comparator_type="BytesType",
    6: optional string subcomparator_type="",
    7: optional string reconciler="",
    8: optional string comment="",
    9: optional double row_cache_size=0,
    10: optional bool preload_row_cache=0,
    11: optional double key_cache_size=200000,
    12: optional double read_repair_chance=1.0
    13: optional list column_metadata
    14: optional i32 gc_grace_seconds
}

/* describes a keyspace. */
struct KsDef {
    1: required string name,
    2: required string strategy_class,
    3: optional map strategy_options,
    4: required i32 replication_factor,
    5: required list cf_defs,
}

/* 相关接口定义. */

/** adds a column family. returns the new schema id. */
string system_add_column_family(1:required CfDef cf_def)
throws (1:InvalidRequestException ire),

/** drops a column family. returns the new schema id. */
string system_drop_column_family(1:required string column_family)
throws (1:InvalidRequestException ire),

/** renames a column family. returns the new schema id. */
string system_rename_column_family(1:required string old_name, 2:required string new_name)
throws (1:InvalidRequestException ire),

/** adds a keyspace and any column families that are part of it. returns the new schema id. */
string system_add_keyspace(1:required KsDef ks_def)
throws (1:InvalidRequestException ire),

/** drops a keyspace and any column families that are part of it. returns the new schema id. */
string system_drop_keyspace(1:required string keyspace)
throws (1:InvalidRequestException ire),

/** renames a keyspace. returns the new schema id. */
string system_rename_keyspace(1:required string old_name, 2:required string new_name)
throws (1:InvalidRequestException ire),

2 增加二级索引,提供对Column的value进行查询的功能:

和几乎所有的K/V系统一样,Cassandra只能提供对key的查询,如果我们希望查询某一个key下的value值为一个特定值的情况,只能是将所有的数据取出来,然后遍历,或者使用一些其他的方案提供查询效率避免全表扫描。如:我之前的文章《反转Cassandra索引》,还有一个叫做Lucandra。

如果希望在新的版本中使用二级索引的功能,需要在ColumnFamily中指定要对哪个Column建立索引。同时指定的建立索引方式(目前只支持IndexType.KEYS)。

当包含索引的ColumnFamily在Cassandra建立的时候,Cassandra会额外为ColumnFamily中每一个需要建立索引的Column再建立独立的IndexedColumnFamily。

当写入数据的时候,数据不仅会出存储和数据相关的ColumnFamily中,IndexedColumnFamily中也会存储所有和本索引相关的数据。

当按照索引查询数据的时候,Cassandra将直接从IndexedColumnFamily查询相应的数据。

相关的结构体和接口定义可以在cassandra.thrift文件中找到:

/* 相关结构体定义. */

enum IndexType {
    KEYS,
}

/* describes a column in a column family. */
struct ColumnDef {
    1: required binary name,
    2: required string validation_class,
    3: optional IndexType index_type,
    4: optional string index_name
}

/* 相关接口定义. */

/** Returns the subset of columns specified in SlicePredicate for the rows matching the IndexClause */
list get_indexed_slices(1:required ColumnParent column_parent,
				2:required IndexClause index_clause,
				3:required SlicePredicate column_predicate,
				4:required ConsistencyLevel consistency_level=ONE)
throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),

3 配置文件格式修改

新版本的Cassandra采用了yaml格式来进行配置,好处是可读性更好。

我们可以对比一下配置集群的名称这个选项,2中不同格式的区别:

老版本(storage-conf.xml):

<!-- ~ The name of this cluster. This is mainly used to prevent machines in ~ one logical cluster from joining another. --> <ClusterName>Test ClusterClusterName>

新版本(cassandra.yaml):

# name of the clustercluster_name: 'Test Cluster'

除此之外。还有大量的修改:

 

0.7-beta1 * sstable versioning (CASSANDRA-389) * switched to slf4j logging (CASSANDRA-625) * add (optional) expiration time for column (CASSANDRA-699) * access levels for authentication/authorization (CASSANDRA-900) * add ReadRepairChance to CF definition (CASSANDRA-930) * fix heisenbug in system tests, especially common on OS X (CASSANDRA-944) * convert to byte[] keys internally and all public APIs (CASSANDRA-767) * ability to alter schema definitions on a live cluster (CASSANDRA-44) * renamed configuration file to cassandra.xml, and log4j.properties to   log4j-server.properties, which must now be loaded from   the classpath (which is how our scripts in bin/ have always done it)   (CASSANDRA-971) * change get_count to require a SlicePredicate. create multi_get_count   (CASSANDRA-744) * re-organized endpointsnitch implementations and added SimpleSnitch   (CASSANDRA-994) * Added preload_row_cache option (CASSANDRA-946) * add CRC to commitlog header (CASSANDRA-999) * removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065) * add truncate thrift method (CASSANDRA-531) * http mini-interface using mx4j (CASSANDRA-1068) * optimize away copy of sliced row on memtable read path (CASSANDRA-1046) * replace constant-size 2GB mmaped segments and special casing for index    entries spanning segment boundaries, with SegmentedFile that computes    segments that always contain entire entries/rows (CASSANDRA-1117) * avoid reading large rows into memory during compaction (CASSANDRA-16) * added hadoop OutputFormat (CASSANDRA-1101) * efficient Streaming (no more anticompaction) (CASSANDRA-579) * split commitlog header into separate file and add size checksum to   mutations (CASSANDRA-1179) * avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219) * revise HH schema to be per-endpoint (CASSANDRA-1142) * add joining/leaving status to nodetool ring (CASSANDRA-1115) * allow multiple repair sessions per node (CASSANDRA-1190) * optimize away MessagingService for local range queries (CASSANDRA-1261) * make framed transport the default so malformed requests can't OOM the    server (CASSANDRA-475) * significantly faster reads from row cache (CASSANDRA-1267) * take advantage of row cache during range queries (CASSANDRA-1302) * make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276) * keep persistent row size and column count statistics (CASSANDRA-1155) * add IntegerType (CASSANDRA-1282) * page within a single row during hinted handoff (CASSANDRA-1327) * push DatacenterShardStrategy configuration into keyspace definition,   eliminating datacenter.properties. (CASSANDRA-1066) * optimize forward slices starting with '' and single-index-block name    queries by skipping the column index (CASSANDRA-1338) * streaming refactor (CASSANDRA-1189) * faster comparison for UUID types (CASSANDRA-1043) * secondary index support (CASSANDRA-749 and subtasks)

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/18773750/viewspace-699998/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2011-06-16

  • 博文量
    4
  • 访问量
    16322