Visualization and machine learning techniques to support web traffic analysis
MetadataShow full item record
Web Analytics (WA) services are one of the main tools that marketing experts use to measure the success of an online business. Thus, it is extremely important to have tools that support WA analysis. Nevertheless, we observed that there has not been much change in how services display traffic reports. Regarding the trustworthiness of the information, Web Analytics Services (WAS) are facing the problem that more than half of Internet traffic is Non-Human Traffic (NHT). Misleading online reports and marketing budget could be wasted because of that. Some research has been done, yet, most of the work involves intrusive methods and do not take advantage of information provided by current WAS. In the present work, we provide tools that can help the marketing expert to get better reports, to have useful visualizations, and to ensure the trustworthiness of the traffic. First, we propose a new Visualization Tool. It helps to show the website performance in terms of a preferred metric and enable us to identify potential online strategies upon that. Second, we use Machine Learning Binary Classification (BC) and One-Class Classification (OCC) to get more reliable information by identifying NHT and abnormal traffic. Then, marketing analysts could contrast NHT against their current reports. Third, we show how Pattern Extraction algorithms (like PBC4cip's miner) could help to conduct traffic analysis (once visitor segmentation is done), and to propose new strategies that may improve the online business. Later on, the patterns can be used in the Visualization Tool to analyze the traffic in detail. We confirmed the usefulness of the Visualization Tool by using it to analyze bot traffic we generated. NHT traffic shared a very similar linear navigation path, contrasted with the more complex human path. Furthermore, BC and OCC (BaggingTPMiner) worked successfully in the detection of well-known bots and abnormal traffic. We achieved a ROC AUC of 0.844 and 0.982 for each approach, respectively.
The following license files are associated with this item: