Search results

1 – 1 of 1
Per page
102050
Citations:
Loading...
Access Restricted. View access options
Article
Publication date: 15 June 2015

Bundit Manaskasemsak and Arnon Rungsawang

This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms…

279

Abstract

Purpose

This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms are proposed to construct rule-based classifiers to distinguish between non-spam and spam hosts. Moreover, the paper also proposes an adaptive learning technique to enhance the spam detection performance.

Design/methodology/approach

The Trust-ACO algorithm is designed to let an ant start from a non-spam seed, and afterwards, decide to walk through paths in the host graph. Trails (i.e. trust paths) discovered by ants are then interpreted and compiled to non-spam classification rules. Similarly, the Distrust-ACO algorithm is designed to generate spam classification ones. The last Combine-ACO algorithm aims to accumulate rules given from the former algorithms. Moreover, an adaptive learning technique is introduced to let ants walk with longer (or shorter) steps by rewarding them when they find desirable paths or penalizing them otherwise.

Findings

Experiments are conducted on two publicly available WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets. The results show that the proposed algorithms outperform well-known rule-based classification baselines. Especially, the proposed adaptive learning technique helps improving the AUC scores up to 0.899 and 0.784 on the former and the latter datasets, respectively.

Originality/value

To the best of our knowledge, this is the first comprehensive study that adopts the ACO learning approach to solve the problem of Web spam detection. In addition, we have improved the traditional ACO by using the adaptive learning technique.

Details

International Journal of Web Information Systems, vol. 11 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 1 of 1
Per page
102050