TY - GEN
T1 - Objectionable content filtering by click-through data
AU - Lee, Lung Hao
AU - Juan, Yen Cheng
AU - Chen, Hsin Hsi
AU - Tseng, Yuen Hsien
PY - 2013
Y1 - 2013
N2 - This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web. Copyright is held by the owner/author(s).
AB - This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web. Copyright is held by the owner/author(s).
KW - Click-through mining
KW - Collaborative filtering
KW - Internet censorship
UR - http://www.scopus.com/inward/record.url?scp=84889607153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889607153&partnerID=8YFLogxK
U2 - 10.1145/2505515.2507849
DO - 10.1145/2505515.2507849
M3 - Conference contribution
AN - SCOPUS:84889607153
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1581
EP - 1584
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 27 October 2013 through 1 November 2013
ER -