Mining browsing behaviors for objectionable content filtering

Lung Hao Lee, Yen Cheng Juan, Wei Lin Tseng, Hsin Hsi Chen, Yuen Hsien Tseng

研究成果: 雜誌貢獻文章

8 引文 斯高帕斯(Scopus)

摘要

This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.

原文英語
頁(從 - 到)930-942
頁數13
期刊Journal of the Association for Information Science and Technology
66
發行號5
DOIs
出版狀態已發佈 - 2015 五月 1

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences

指紋 深入研究「Mining browsing behaviors for objectionable content filtering」主題。共同形成了獨特的指紋。

  • 引用此