Efficient operation of large-scale distributed systems such as
enterprise networks, grid systems, and sensor networks is a grand
challenge. One of the crucial reasons is the lack of current and
accurate knowledge of the global state of the individual components in
the system including both network and individual machine
attributes. Most commercial network management systems monitor only a
subset of system metrics and also at a relatively coarse grain
timescales to be able to collect and process the measurements at a
central location. This impedes decision making at very fine
timescales, which is important for several emerging applications like
interactive multi-media services and early-anomaly detection
systems. We provide a Scalable Sensing Service (S3) that the network
management subsystem as well as individual applications/services can
subscribe to and securely get customized information at very fine
timescales suitable for their purposes. For scalable operation, S3
provides the sensing service in a decentralized manner, eliminates
unnecessary duplicate measurements by consolidating sensing
requirements of different applications, and provides inference engines
to estimate network metrics with high accuracy while avoiding
quadratic all-pair measurements load.
S3 can be used for a number of management tasks such as
failure or anomalous behavior detection,
resource location and placement, and network routes setup for optimal performance.
This work is supported in part by DARPA Contract N66001-05-9-8904.
|| Planet-Lab data || Tools || Papers || People ||