ITPub博客

首页 > 数据库 > 数据库开发技术 > Spanning Hash Tree

Spanning Hash Tree

原创 数据库开发技术 作者:eggcom 时间:2006-12-09 19:48:36 0 删除 编辑

As I know myself, I am not good at coding; however, there are lots of funs in coding.

Here, I created a MFC c++ class, it's called CSpanHashTree, spanning hash tree, it's developed for my new app webpicture over MFC, not C# any more, in which things are really easy but really ugly at the same time, especially when you try to distribute your software, its seems that only few computers today are pre-install .net platform.

[@more@]

As I know myself, I am not good at coding; however, there are lots of funs in coding.

Here, I created a MFC c++ class, it's called CSpanHashTree, spanning hash tree, it's developed for my new app webpicture over MFC, not C# any more, in which things are really easy but really ugly at the same time, especially when you try to distribute your software, its seems that only few computers today are pre-install .net platform.

This biggest problem not solved in the C# webpicture, is that, the comparison of new coming URL list is not implement

smartly but just direct comparing the new one with all existing one, this is the most easy way, just make a loop, and works ok, even the urlist goes to almost 20K,...:), thanksto my good cpu.

Commonly to solve this problem is using hash table, which works like index, or TOC. There should be very nice implementations in stl, but, I抎 like to write hash table class, for my special reason, -- I'm going to build a great application, want it to be perfect on all sides, so there is no comprise, from keystroke to keystroke, I made this code.

Easily to say, I want the memory usage of this class is variable, when the items count are small, and it can grow to very large, and occasionally, it could be shrink to small size, or even cleaned--- none, at this moment, I want this class takes almost zero number of memory.

Under this requirement, I began to think of distributing Hash keys, on a tree, put the data in a link list of the leave the tree. When the data item count is small, this tree is small, and even the required hash key length could be small, thus the hash table could be put into a continuous memory, no need to put these entries into a linked list. When more item are added into, if no more hash key is used, then the linked lists will grow quickly, and make thing bad when trying to add, delete, and find a data item.

OK, it's time to make a second level hash table, it is put under a level 1 tree node. all the data items of level 1 with partially same hash key are redistributed into a second level table. To locate a data item, you get its hash code, on the level hash table; you locate it with the first part of the hash key, on the second level with the second part of hash key.

Great! Lets do it.

For a string, I build two hash keys for it, 2 UINT data, Level 1 hash table contains 256 items which means 8 bit hash key is use to identify the path to located a data inside the SpanHashTree, level 2~8 each using 8 bit. Wonderful, if the hash keys are well calculated, this hash tree can grow very slow, adding, remove, find a data item is so quick.

There is test I made, using CList, and CSpanHashTree to Load 55K url into memory. CSpanHashTree is a little slow in loading the data, amostly since it tries to find and reject duplicate url, and ate release data stage, the compress of tree takes time, I hope it could make it better.

Another issue is, if I want access data in this tree, it's will be wonderful, but is it possible? quick and safe? Hope I could deal with this problem in next few days. Check the code below, you comment are welcomed!

__________________________________________________________

#define CSpanHashTree_BITS_PERLEVEL 8

#define CSpanHashTree_NODES_PER_LEVEL 256

#define CSpanHashTree_MAX_LIST_NUMBER 256

#define CSpanHashTree_MIN_HASH_NUMBER 128

#define CSpanHashTree_DEFAULT_HASH_TYPE 0x4F9D123A

#define CSpanHashTree_EXTRA_HASH_TYPE 0xA321D9F4

#define SYSTEM_ADDRESS_WIDTH 4 //byte

#define CSpanHashTree_NODE_HASH_PTRS_SIZE SYSTEM_ADDRESS_WIDTH * CSpanHashTree_NODES_PER_LEVEL

#define CSpanHashTree_MAX_LEVEL 7

_________________________________________________________

#pragma once

#include "afxtempl.h"

#include "afx.h"

#include

#include "ConfigurationData.h"

class CAppUrl;

typedef CArray CAppUrlArray;

typedef CAppUrl* CAppUrlPtr;

class CAppUrl :

public CUrl

{

..................

bool operator == (const CAppUrl& url) const;

.............

public:

unsigned int HashKey, ExtraHashKey;

BYTE GetKeyByLevel(BYTE level) const

{

ASSERT(level<=CSpanHashTree_MAX_LEVEL);

switch (level)

{

case 0: return LOBYTE(LOWORD(HashKey));

case 1: return HIBYTE(LOWORD(HashKey));

case 2: return LOBYTE(HIWORD(HashKey));

case 3: return HIBYTE(HIWORD(HashKey));

case 7: return LOBYTE(LOWORD(ExtraHashKey));

case 6: return HIBYTE(LOWORD(ExtraHashKey));

case 5: return LOBYTE(HIWORD(ExtraHashKey));

default: return HIBYTE(HIWORD(ExtraHashKey));

}

}

};

________________________________________________________________

#pragma once

#include "afx.h"

#include "Url.h"

#include "ConfigurationData.h"

typedef CList CAppUrlList;

class CSpanHashTree;

typedef CSpanHashTree* CSpanHashTreePtr;

class CSpanHashTree :

public CObject

{

public:

CSpanHashTree(BYTE level = 0);

~CSpanHashTree(void);

bool Add(const CAppUrl& url, int expand_list_maxength = CSpanHashTree_MAX_LIST_NUMBER);

bool Delete(const CAppUrl& url, int compress_list_minlength = CSpanHashTree_MIN_HASH_NUMBER);

bool Clean(bool destruct = false);

bool Compress(int compress_list_minlength = CSpanHashTree_MIN_HASH_NUMBER);

bool Expand(int expand_list_maxength = CSpanHashTree_MAX_LIST_NUMBER);

bool GetSubUrls(CAppUrlList& urls, bool remove = false);

#ifdef DEBUG_CONSOLE

void Report(bool detail);

#endif

private:

bool ListToHash(void);

bool HashToList(void);

int GetSubCount(void);

bool GetIsList() { return IsList;};

bool IsList;

BYTE Level;

void * pData;

#ifdef DEBUG_CONSOLE

void Print(CString& cstr);

#endif

};

_________________________________________________________

#include "StdAfx.h"

#include "HashTree.h"

CSpanHashTree::CSpanHashTree(BYTE level)

:Level(level), IsList(true), pData(NULL)

{

pData = (void*) new CAppUrlList();

}

CSpanHashTree::~CSpanHashTree(void)

{

Clean(true);

}

bool CSpanHashTree::Add(const CAppUrl& url, int maxlength)

{

if(IsList)

{

CAppUrlList * pList = (CAppUrlList *) pData;

if(NULL == pList->Find(url)) // consume time to compare.

{

pList->AddTail(url);

if(Level

{

Expand(maxlength);

}

return true;

}

return false;

}

else

{

CSpanHashTreePtr* pptree = (CSpanHashTreePtr*)pData;

pptree += url.GetKeyByLevel(Level);

if(NULL!= *pptree)

{

return (*pptree)->Add(url);

}

else

{

*pptree = new CSpanHashTree(Level+1);

return (*pptree)->Add(url);

}

}

}

bool CSpanHashTree::Delete(const CAppUrl& url, int minlength)

{

if(IsList)

{

CAppUrlList * pList = (CAppUrlList *) pData;

POSITION ps = pList->Find(url);

if(NULL != ps)

{

pList->RemoveAt(ps);

return true;

}

return false;

}

else

{

CSpanHashTreePtr* pptree = (CSpanHashTreePtr*)pData;

pptree += url.GetKeyByLevel(Level);

if(NULL!= *pptree)

{

bool rs = (*pptree)->Delete(url);

if(rs)

{

if((*pptree)->Compress(minlength))

{

this->Compress(minlength); // so unsmart to do so, level is not deep,

it's not bad.

}

}

return rs;

}

return false;

}

}

bool CSpanHashTree::Clean(bool destruct)

{

if(NULL == pData)

{

return false;

}

if(IsList)

{

CAppUrlList* pList = (CAppUrlList*) pData;

if(!destruct)

{

pList->RemoveAll();

}else

{

delete pList;

pData = NULL;

}

}

else

{

CSpanHashTreePtr* pptree = (CSpanHashTreePtr*)pData;

for(int i=0; i

{

if(NULL!=(*pptree))

{

delete (*pptree);

}

pptree ++;

}

free(pData);

if(!destruct)

{

pData = (void *) new CAppUrlList();

IsList = true;

}else

{

pData = NULL;

}

}

return true;

}

bool CSpanHashTree::Compress(int minlength)

{

if((GetSubCount() < minlength)

&& !IsList)

{

return HashToList();

}

return false;

}

bool CSpanHashTree::Expand(int maxlength)

{

if((GetSubCount() > maxlength)

&& IsList)

{

return ListToHash();

}

return false;

}

bool CSpanHashTree::ListToHash(void)

{

if(!IsList)

{

return false;

}

CAppUrlList* pList = (CAppUrlList*) pData;

pData = malloc(CSpanHashTree_NODE_HASH_PTRS_SIZE); //

if(pData == NULL)

{

return false;

}

memset(pData, 0, CSpanHashTree_NODE_HASH_PTRS_SIZE);

IsList =false;

POSITION ps = pList->GetHeadPosition();

while(ps!=NULL)

{

CAppUrl url = pList->GetNext(ps);

if(!Add(url))

{

Clean(true); // clean up for restore...

IsList =true;

pData = pList;

return false;

}

}

delete pList;

return true;

}

bool CSpanHashTree::HashToList(void)

{

if(IsList)

{

return false;

}

CAppUrlList * pList = new CAppUrlList();

GetSubUrls(*pList, true);

// Clean(true);

IsList = true;

pData = (void*) pList;

return true;

}

int CSpanHashTree::GetSubCount(void)

{

if(NULL == pData)

{

return 0;

}

if(IsList)

{

return ((CAppUrlList*)pData)->GetCount();

}else

{

int rs = 0;

CSpanHashTreePtr* pptree = (CSpanHashTreePtr*)pData;

for(int i=0; i

{

if(NULL!=(*pptree))

{

rs += (*pptree)->GetSubCount();

}

}

return rs;

}

}

bool CSpanHashTree::GetSubUrls(CAppUrlList& urls, bool remove)

{

if(NULL == pData)

{

return false;

}

if(IsList)

{

CAppUrlList* pList = (CAppUrlList*) pData;

POSITION ps = pList->GetHeadPosition();

while(ps!=NULL)

{

urls.AddTail(pList->GetNext(ps)); // what if, it fails, and restor erros too.

}

if(remove)

{

delete pList;

pData = NULL;

}

}

else

{

CSpanHashTreePtr* pptree = (CSpanHashTreePtr*)pData;

for(int i=0; i

{

if(NULL!=(*pptree))

{

(*pptree)->GetSubUrls(urls,remove);

if(remove)

{

delete (*pptree);

}

}

pptree ++;

}

if(remove)

{

free(pData);

pData = NULL;

}

}

return true;

}

#ifdef DEBUG_CONSOLE

void CSpanHashTree::Report(bool detail)

{

if(NULL == pData)

{

return ;

}

CString strx;

if(IsList)

{

CAppUrlList* p = (CAppUrlList*)pData;

strx.Format(CString(_T("|LIST ---+ ( %d )")),p->GetCount());

Print(strx);

if(detail)

{

POSITION ps = p->GetHeadPosition();

while(NULL != ps)

{

CAppUrl url = p->GetAt(ps);

strx.Format(_T("t |->%08X : %08X %s"),url.HashKey, url.ExtraHashKey,

url.GetStandardUrl());

Print(strx);

p->GetNext(ps);

}

}

}

else

{

Print(CString(_T("|HASH --+")));

CSpanHashTreePtr* pptree = (CSpanHashTreePtr*)pData;

for(int i=0; i

{

if(NULL!=(*pptree))

{

(*pptree)->Report(detail);

}

pptree ++;

}

}

}

#endif

#ifdef DEBUG_CONSOLE

void CSpanHashTree::Print(CString& cstr)

{

for(int i=0; i

{

std::cout<<"t";

}

std::cout<

}

#endif

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24834/viewspace-883132/,如需转载,请注明出处,否则将追究法律责任。

上一篇: what's life?
请登录后发表评论 登录
全部评论

注册时间:2008-09-28

  • 博文量
    68
  • 访问量
    627674