In an Infrastructure-as-a-Service (IaaS) environment, it is paramount to perform intelligent allocation of shared resources. Placement is the problem of choosing which virtual machine (VM) should run on which physical machine (PM), whereas Scheduling is the problem of sharing resources between multiple co-located VMs. An efficient placement and scheduling is one, that in addition to satisfying all constraints, increases the overall utilization of physical resources such as CPU, storage, or network. Determining an efficient placement and scheduling is a very challenging problem, especially in face of conflicting goals and partially available information about workloads.
In order to reason about placement, we first tackle the problem of performance interference that may affect co-located VMs—when there is more demand by multiple VMs for a resource than is available at a given instant of time. We thus characterize the performance of Hadoop in a shared and virtualized setting.