Over the last decade, cloud computing has revolutionized the way the world computes. Many companies and organizations have moved from dedicated managed servers at properties they own to flexible solutions that can scale up or down based on the amount of power and storage they need at any given moment. It’s changed the way applications are written, built, and deployed, enormously increasing automation and coordination between programs.
Even still, estimates suggest that 50% to 60% of workloads are still running on on-premises servers. While more and more of that figure is expected to shift to the cloud in the coming years, there are reasons why organizations might choose to keep their data and computing on-premises, or as a hybrid of cloud services and their own managed servers. There are security concerns around highly sensitive data, and some types of data, like health care information, often has regulations around how it can be handled.
Using cloud-based resources adds new types of security threats. With on-premises servers, attacks typically come from outside the infrastructure, but with cloud deployments, threats can also originate from inside the infrastructure. Confidential computing has recently emerged as a solution to the added security issues of working with the cloud. In its most strict definition, it means ensuring the confidentiality of a workload. We like to view this as a broader term, however, that encompasses three main aspects:
Confidentiality
The customers’ data needs to be properly isolated, so that they are the only group able to access it. Data protection is not a new concept, and there are widely accepted mechanisms to protect data. Current mechanisms focus on data at rest, which is data that is not currently being used, that can rely on encrypted data and/or disk images with a key only known by the tenant, and data in motion, which is data being transferred over the network, which can be protected by encrypting data being sent out of an application. In this case, the key can be randomly generated when the program is running and the connection is set-up between sender and receiver.
With Confidential computing, a third type of data needs to be protected, called data in use. This means offering mechanisms to protect the physical memory (such as RAM) being used by a customer, so that no other tenants on that cloud have any way to access it. This is generally done by hardware mechanisms that provide protection to virtual machines (VMs). It can be done either by partitioning, where the CPU places hardware checks on the memory allocated to each VM and ensures these boundaries are not crossed, or with memory encryption, where the CPU automatically encrypts VM memory with different keys for different VMs. Some operations, like IBM Z Secure Execution, offer both.
Integrity
Customer data cannot be modified or tampered with by anyone — other than the tenant. Some early versions of the mechanisms used to protect data in use did not protect against data tampering. This allowed the use of a class of attacks called replay attacks, that rely on providing modified information to an application to trick it into willingly revealing secrets. Newer implementations of these technologies are therefore aiming at stopping data tampering.
Attestation
Even with confidential computing, the system needs to be trustworthy. The customer needs to be provided proof that their application is running in an environment that is built around confidentiality and integrity. To do this in a traditional environment, we need to start with a safe root of trust, a foundational component that is cryptographically secure. This normally takes the form of a secure hardware module like a A trusted platform module (TPM) is the global standard for secure, dedicated, cryptographic processing. It’s a dedicated microcontroller that secures systems through a built-in set of cryptographic keys.trusted platform module, however we are studying different approaches to attestation.
In most confidential computing implementations, the CPU becomes a trusted entity itself, so it (or a security processor attached to it) attests that the contents of the VM and its encryption are set up correctly. In this case, there’s usually no need to attest the hypervisor (or host operating system), which can be untrusted. However, a fully attested environment may still be preferred in some cases, especially to prevent replay attacks and possible vulnerabilities in CPUs. In these cases, we want to attest the entire hardware and software infrastructure that’s running the customer’s application. Attestation of the underlying hardware, however, requires rethinking some of the main building blocks of a processing system, with a more complex root of trust than a TPM, that can better attest the entire platform.
Where does confidential computing stand now?
It’s our belief that confidential computing will become a ubiquitously adopted mechanism to strengthen security boundaries and enable increasingly sensitive workloads to be effectively deployed on public clouds. There are, however, considerable technology gaps that need to be addressed to get there.
Some of the main questions that are still unanswered are things like figuring out how to attest to the trustworthiness of components inside secure enclaves, as well as the components that manage them. We’re also working on how to implement a secure mechanism to exchange decryption keys and other secrets, and generally provide automation that simplifies the user experience of using the latest hardware capabilities.
An example use case for confidential computing: a client in the healthcare industry wants to use a proprietary AI model that analyzes confidential patient information data. Their workload is already designed as a set of containers, and can leverage the confidential container project to run the workload securely. The entire software stack on the physical machine is measured and verified to guarantee the integrity of the infrastructure. The workload itself will be measured at deployment and continuously at runtime, and data will be kept secure by using hardware-provider Trusted Execution Environments.
From a software point of view, we’re working across the entire cloud infrastructure stack to address these gaps. Some of the projects we’re contributing to are Keylime for TPM-based attestation and the confidential-container project, passing through the Linux kernel , OVMF firmware , and GRUB .
For hardware, we’re actively pursuing projects to extend trust to important, but often neglected, components of modern servers, such as the baseboard management controller (BMC) which manages the entire server. We’re experimenting with OpenBMC and are actively working with the community to enhance the existing ecosystem, as well as extending the concept of secure and measured boot to the BMC firmware, and leveraging the same frameworks used for the operating system attestation (such as Keylime).
We’re also defining an architecture for a “platform root of trust,” to attest entire servers, including peripherals and accelerators. And, as part of the Open Compute Project we’re exploring a pluggable management card (called a data center secure control module, or DC-SCM), along with other techniques. We’re working to improve security and isolation between client-facing resources and internal infrastructure, as well as limit the potential blast radius of possible attacks.
To learn more about the work our team is doing and how it could help shore up your enterprise’s security, be sure to visit the cloud security team page.