CactoScale is a component of CACTOS toolkit which provides the cloud operator with the necessary tools to collect and analyse data traces from different sources. The framework of CactoScale allows for scalable and parallel data analysis.

Management of cloud applications and data centre resources has become increasingly complex, due in large part to a substantial increase in the degree of heterogeneity and scale. Topology optimisation tools and algorithms currently in use for optimal placement of applications across such heterogeneous resources are typically trial-and-error based. Predictions enable the evaluation of such optimisation algorithms and their complex interactions towards better reasoning at decision time, e.g. optimising for a specific trade-off.

CactoScale integrates multiple sources of performance and error monitoring data into a consolidated architecture with a unified interface and storage architecture. Specifically, data agents co-located with the data centre servers collect log traces from user-defined data sources and optionally process this data in place, or forward this data for processing on dedicated CactoScale servers. The CactoScale interface enables both in-situ and off-loaded data processing, with most of the simple filtering operations being feasible in-situ, and only more advanced data correlation algorithms off-loaded to high-throughput processing servers. First mining algorithms have been established to trace the I/O behaviour of applications.

CactoScale utilises an agent-based monitoring scheme capable of gathering and extracting information from application and error logs. An agent installed in each Virtual Machine is able to collect log files and transmit these data to the collectors for processing. The agents can also be paired with in situ analytics modules to cover the cases where high sampling rates of numerical indicators (e.g. utilisation) are needed, but also to filter the data that flows to the database for post-processing. An in situ analytic module is a process designed to run locally in a node. This tactic provides the advantage of data locality. The data are pre-processed by the local node before being collected by a remote distributed service for further processing. In this way, the hierarchical design of data analysis allows for an additional level of real-time processing which is much closer to the data source.

CactoScale is based on utilising parallel data processing frameworks such as Hadoop mapreduce and Spark. The tool is designed to perform parallel filtering, correlation and analysis of traces. Parallelism is tackled on various dimensions. In this video we describe CactoScale monitoring capabilities and we also present how parallel correlation analysis is utilised to detect possible system anolies in the cloud platform.

Watch CactoScale in action

Get CactoScale

Use the following link to download CactoScale Intermediate Prototype which is made available under the Eclipse Public License, EPL Version 1.0.

Read the Guide Document

The following link of CactoScale Guide provides the reader with information on acquiring and setting up CactoScale.