TY - GEN
T1 - A multi-level hierarchical index structure for supporting efficient similarity search on tag sets
AU - Koh, Jia Ling
AU - Shongwe, Nonhlanhla
AU - Cho, Chung Wen
PY - 2012
Y1 - 2012
N2 - Social communication websites has been an emerging type of a Web service that helps users to share their resources. For providing efficient similarity search of tag set in a social tagging system, we propose a multi-level hierarchical index structure to group similar tag sets. Not only the algorithms of similarity searches of tag sets, but also the algorithms of deletion and updating of tag sets by using the constructed index structure are provided. Furthermore, we define a modified hamming distance function on tag sets, which consider the semantically relatedness when comparing the members for evaluating the similarity of two tag sets. This function is more applicable to evaluate the similarity search of two tag sets. A systematic performance study is performed to verify the effectiveness and the efficiency of the proposed strategies. The experiment results show that the proposed MHIB approach further improves the pruning effect of the previous work which constructs a two-level index structure. Especially, the MHIB approach is well scalable with respect to the three parameters when using either the hamming distance or the modified hamming distance for similarity measure. Although the insertion operation of the MHIB approach requires higher cost than the naïve method, with the assistant of the constructed inverted list of clusters, it performs faster than the previous work. Besides, the cost of performing deletion operation by using the MHIB approach is much less than the other two approaches and so is the update operation.
AB - Social communication websites has been an emerging type of a Web service that helps users to share their resources. For providing efficient similarity search of tag set in a social tagging system, we propose a multi-level hierarchical index structure to group similar tag sets. Not only the algorithms of similarity searches of tag sets, but also the algorithms of deletion and updating of tag sets by using the constructed index structure are provided. Furthermore, we define a modified hamming distance function on tag sets, which consider the semantically relatedness when comparing the members for evaluating the similarity of two tag sets. This function is more applicable to evaluate the similarity search of two tag sets. A systematic performance study is performed to verify the effectiveness and the efficiency of the proposed strategies. The experiment results show that the proposed MHIB approach further improves the pruning effect of the previous work which constructs a two-level index structure. Especially, the MHIB approach is well scalable with respect to the three parameters when using either the hamming distance or the modified hamming distance for similarity measure. Although the insertion operation of the MHIB approach requires higher cost than the naïve method, with the assistant of the constructed inverted list of clusters, it performs faster than the previous work. Besides, the cost of performing deletion operation by using the MHIB approach is much less than the other two approaches and so is the update operation.
KW - Social tagging
KW - index structure
KW - similarity search
UR - http://www.scopus.com/inward/record.url?scp=84865043589&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865043589&partnerID=8YFLogxK
U2 - 10.1109/RCIS.2012.6240436
DO - 10.1109/RCIS.2012.6240436
M3 - Conference contribution
AN - SCOPUS:84865043589
SN - 9781457719387
T3 - Proceedings - International Conference on Research Challenges in Information Science
BT - 6th International Conference on Research Challenges in Information Science, RCIS 2012 - Conference Proceedings
T2 - 6th International Conference on Research Challenges in Information Science, RCIS 2012
Y2 - 16 May 2012 through 18 May 2012
ER -