Search results

1 – 2 of 2
Per page
102050
Citations:
Loading...
Access Restricted. View access options
Article
Publication date: 17 November 2020

Deepti Sisodia and Dilip Singh Sisodia

Analysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user…

248

Abstract

Purpose

Analysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user click data with missing values pose a challenge in analyzing the conduct of publishers. The presence of high cardinality in categorical attributes with multiple possible values has further aggrieved the issue.

Design/methodology/approach

In this paper, gradient tree boosting (GTB) learning is used to address the challenges encountered in learning the publishers' behavior from raw user click data and effectively classifying fraudulent publishers.

Findings

The results demonstrate that the GTB effectively classified fraudulent publishers and exhibited significantly improved performance as compared to other learning methods in terms of average precision (60.5 %), recall (57.8 %) and f-measure (59.1%).

Originality/value

The experiments were conducted using publicly available multiclass raw user click dataset and eight other imbalanced datasets to test the GTB's generalizing behavior, while training and testing were done using 10-fold cross-validation. The performance of GTB was evaluated using average precision, recall and f-measure. The performance of GTB learning was also compared with eleven other state-of-the-art individual and ensemble classification models.

Details

Data Technologies and Applications, vol. 55 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Access Restricted. View access options
Article
Publication date: 6 January 2022

Deepti Sisodia and Dilip Singh Sisodia

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's…

135

Abstract

Purpose

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.

Design/methodology/approach

To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.

Findings

Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.

Originality/value

The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 2 of 2
Per page
102050