AIMultiple ResearchAIMultiple Research
Accelerator
Updated on Dec 20, 2024

Top 30 Cloud GPU Providers & the GPUs They Offer

GPU procurement complexity has been increasing with more providers adding GPU cloud to their offering. AIMultiple analyzed GPU cloud providers across most relevant dimensions to facilitate cloud GPU procurement. If you

CloudBrands*Models**Combinations***Comments

AWS

AWS chips like Trainium

7

19

Cloud market leader

Azure

Working on own chips

6

14

#2 cloud player

GCP

Google Cloud tensor processing units (TPUs)

8

30

#3 cloud player

OCI

6

17

Bare-metal GPUs

Alibaba Cloud

Alibaba chips like Hanguang 800

5

6

Cloud market leader in China

Nvidia DGX

23

23

Sole focus: High-scale enterprise workloads

Vast.ai

25

50

GPU marketplace valued at $9bn

CoreWeave

13

13

Focus: AI workloads. Valued at $23bn

AceCloud

9

17

TensorDock

10

20

Lambda Labs

10

20

Sole focus: Cloud GPUs

Datacrunch.io

8

32

Sole focus: Cloud GPUs

RunPod

8

8

Cirrascale

Cerebras, Graphcore, SambaNova

7

9

Focus: Research workloads

Paperspace CORE

Graphcore

7

19

Sole focus: Cloud GPUs

LeaderGPU

7

20

Jarvis Labs

6

6

Sole focus: Cloud GPUs

Crusoe Cloud

7

20

FluidStack

5

15

IBM Cloud

5

9

Seeweb

5

18

Focus: Serving EU customers from EU data center

Latitude.sh

4

6

Bare metal available in US & EU data centers

Scaleway

4

11

Nebius AI

4

12

Linode

2

8

OVHcloud

2

4

Vultr

2

5

Ranking: Sponsors have links and are highlighted at the top. After that, hyperscalers are listed by US market share. Then, providers are sorted by the number of models that they offer.

* All providers offer Nvidia GPUs. In addition, some cloud providers provide hardware from other AI chip makers as indicated in this column.

** Distinct Nvidia GPU models offered. For example, “A100 40 GB” and “A100 80 GB” are counted as separate models.

*** Distinct multi-GPU combinations offered. For example, “1 x A100 40 GB” and “2 x A100 40 GB” are counted as separate multi-GPU combinations.

GPUs can be delivered in a serverless manner, as virtual GPUs or as bare metal. While serverless offers the easiest way to manage workloads, bare metal offers the highest level of control of the hardware. If you are specifically looking for these, see relevant sections:

While listing pros and cons for each provider, we relied on our GPU benchmark and online reviews.

What are Virtual GPU providers?

Virtual GPUs (vGPUs) are virtual machines that allow multiple users share GPUs over the cloud. They are the most commonly offered form of cloud GPUs. Leading providers include:

Hyperscalers (AWS, Azure, GCP)

Hyperscalers have some common aspects:

Pros

Pre-loaded drivers & apps: Configuring an instance with the right drivers is time consuming due to the dependencies between the GPU chip, its drivers, the operating system and the applications. For example, if Ubuntu 25.0 doesn’t support the drivers for NVIDIA Tesla K80, then you will need to choose an older version of Ubuntu to work with K80.

All top 3 hyperscalers allow users to manage machine images which facilitates this process. Service names are:

  • Amazon Machine Images (AMI)
  • Azure Extensions
  • GCP Custom Images

Cons

  • Quota approval is necessary for most GPUs. Don’t expect to start a cloud account and start using GPUs immediately.
  • Latest cards like H100 are frequently unavailable on demand.
  • It is hard to identify GPU availability. During our benchmark, we could check the GPU cards that we can launch by region. For example, AWS price calculator provides this functionality. However, we couldn’t find data on capacity. Therefore, we needed to try initiating instances many times.

Amazon Web Services (AWS)

AWS is the largest cloud platform provider and a leading cloud GPU provider.1 Amazon EC2 (Elastic Compute Cloud) offers GPU-powered virtual machine instances facilitating accelerated computations for deep learning tasks. 

Pros

Offers seamless integration with other popular AWS solutions like:

  • Straightforward quota process: We received the quota for all types of GPUs on AWS in about a day after our application.
  • SageMaker, used for creating, training, deploying, and large-scale application of ML models
  • Amazon S3 (Simple Storage Service), Amazon RDS (Relational Database Services) or other AWS storage services, which can serve as storage solutions for training data

Cons

  • Shutting down GPUs took hours during our benchmark. Other providers complete this within minutes.
  • Fewer GPU options than some GPU focused providers like Coreweave.
  • Steep learning curve: As the first and largest cloud, it has comprehensive capabilities which can make the UI seem cluttered.

Pricing

  • Spot Instances can offer significant discounts, sometimes up to 90% off the on-demand prices.

Microsoft Azure

Microsoft Azure, the second largest cloud provider, provides a cloud-based GPU service known as Azure N-Series Virtual Machines, which leverages NVIDIA GPUs like other providers to deliver high-performance computing capabilities. This service is particularly suited for demanding applications such as deep learning, simulations, rendering and the training of AI models.

Microsoft is also rumored to have started producing its own chips.2

Pros

  • Straightforward quota process: We received the quota for all types of GPUs on AWS in about a day after our application.
  • Less steep UI learning curve compared to providers like AWS.

Cons

  • Some users find that certain advanced features within Azure require a high level of technical expertise to configure and manage effectively3

Pricing

See all Azure GPU prices & compare with other providers.

Google Cloud Platform (GCP)

Google Cloud Platform (GCP) is the third biggest cloud platform.4 GCP offers GPU instances that can be attached to existing virtual machines (VMs) or can be part of a new VM setup.

Pros

  • Provides the most flexibility (among top 3 hyperscalers) in CPU, GPU and storage combinations: We could select a CPU and storage and then attach a GPU to this instance. This provides more flexibility compared to buying specific packages.
  • Easier-to-use UI compared to AWS
  • Offers some free GPU options for Kaggle and Colab users
  • Customers can use 20+ products for free, up to monthly usage limits

Cons

  • Configuring the right CPU, GPU and storage combination is more complex since almost any combination is possible. Users also need to add together the pricing of different components (e.g. GPU, storage) to calculate the total price for the instance.
  • Quota process requires filling out complex forms and took us days.

Pricing

See all GCP GPU prices in all regions

NVIDIA DGX Cloud

NVIDIA is the leader in the GPU hardware market. NVIDIA launched its GPU cloud offering, DGX Cloud, by leasing space in leading cloud providers’ (e.g. OCI, Azure and GCP) data centers.

DGX cloud offers NVIDIA Base Command™, NVIDIA AI Enterprise and NVIDIA networking platforms. DGX Cloud instances featured 8 NVIDIA H100 or A100 80GB Tensor Core GPUs at launch.

An initial customer’s, Amgen’s, research team claims 3x faster training of protein LLMs with BioNeMo and up to 100x faster post-training analysis with NVIDIA RAPIDS.5

The offering is enterprise focused with the list price of DGX Cloud instances starting at $36,999 per instance per month at launch.

Pros

  • Support from NVIDIA engineers
  • Multi-node scaling that can support training across up to 256 GPUs, enabling faster large-scale model training
  • Pre-configured with NVIDIA AI software for quick deployment, reducing setup time

Cons

  • Offering is not suitable for firms with limited GPU needs
  • The service is provided on top of cloud providers’ physical infrastructure. Therefore buyer needs to pay for the margins of both the cloud provider and NVIDIA.

IBM Cloud

The GPU offered by IBM Cloud allows for a flexible process of selecting servers, and it has a seamless integration with the architecture, applications, and APIs of IBM Cloud. This is accomplished via a globally distributed network of data centers that are interconnected.

Pros

  • Powerful integration with IBM Cloud architecture and applications
  • Worldwide distributed data centers increases data protection

Cons

  • Limited adoption compared to the top 3 providers.6

Oracle Cloud Infrastructure (OCI)

Oracle ramped up its GPU offering after formalizing its partnership with NVIDIA.7

Oracle provides GPU instances in both bare-metal and virtual machine formats for quick, cost-effective, and high-efficiency computing. Oracle’s Bare-Metal instances offer customers the capability to execute tasks in non-virtualized settings. These instances are accessible in regions such as the United States, Germany, and the United Kingdom, with availability under both on-demand and interruptible pricing models.

Customers

Oracle serves some of the leading LLM providers like Cohere, a company that Oracle also invested in.8

Pros

  • Wide range of cloud products and services. Among the tech giants’ cloud services, only OCI offers bare metal GPUs.9 For GPU cluster users, only OCI offers RoCE v2 for its cluster technology among the tech giants’ cloud services.10
  • Cost-effective compared to other major cloud providers
  • Offers provision for free trial period and some free-forever products

Cons

  • User interface perceived as clunky and slow by users11
  • Some users find the documentation difficult to understand12
  • The process of starting to use Oracle Cloud compute services was viewed as bureaucratic, complicated, and time-consuming by some users

CoreWeave

CoreWeave is a specialized GPU cloud provider. NVIDIA is one of CoreWeave’s investors. CoreWeave claims to have 45,000 GPUs and to be selected as the first first Elite level cloud services provider by NVIDIA.13

Jarvis Labs

Jarvis Labs, established in 2019 and based in India, specializes in facilitating swift and straightforward training of deep learning models on GPU compute instances. With its data centers located in India, Jarvis Labs is recognized for its user-friendly setup that enables users to start operations promptly.

Jarvis Labs claims to serve 10,000+ AI practitioners.14

Pros

  • No credit card required to register
  • A simple interface for beginners

Cons

  • Although Jarvis Labs is gaining momentum, its suitability for your business’ enterprise-level tasks would need to be validated. It seems to be catering to small workloads since it is not offering multi-GPU instances.

Lambda Labs

Originally, Lambda Labs was a hardware company offering GPU desktop assembly and server hardware solutions. Since 2018, Lambda Labs offers Lambda Cloud as a GPU platform. The virtual machines they offer are pre-equipped with predominant deep learning frameworks, CUDA drivers, and a dedicated Jupyter notebook. Users can connect to these instances through the web terminal in the cloud dashboard or directly using the given SSH keys.

Lambda Labs claims to be used by 10,000+ research teams and has a purely GPU focused offering.15

Paperspace CORE

Paperspace is a cloud computing platform that offers GPU-accelerated virtual machines, among other services. The company is well-regarded for its focus on GPU-intensive workloads and provides a cloud platform for developing, training, and deploying machine learning models.

Paperspace claims to have served 650,000 users.16

Pros

  • Offers a wide range of GPUs compared to other providers
  • Users find the prices fair for the computing power provided
  • Users find the customer service to be friendly and responsive

Cons

  • Some users complain about machine availability, both in terms of the free virtual machines and specific machine types not being available in all regions17
  • The integrated Jupyter interface is criticized and lacks some keyboard shortcuts, although a native Jupyter Notebook interface is offered
  • Longer loading or creation times for machines
  • Monthly subscription fee on top of machine costs can be a downside, and multi-GPU training can be expensive

What are serverless GPU providers?

Serverless is a new cloud computing approach that facilitates cloud management. Many cloud providers are starting to offer serverless GPU offerings. We will be sharing a list here soon.

Explore more on Serverless GPUs.

What are bare-metal GPU providers?

Bare metal is not as commonly provided as GPU VMs. The providers include:

  • Latitude.sh offers bare-metal A100 and H100 GPUs.
  • Oracle Cloud Infrastructure

For more, see AIMultiple’s bare-metal GPU provider list.

What are cloud GPU cloud providers based in Europe?

European businesses may prefer to keep their data in Europe for

  • GDPR compliance and data security
  • Offering faster AI inference services to European users

This is possible with some of the global cloud providers but there are also European based cloud GPU providers.

Seeweb

Seeweb is a public cloud provider headquartered in Italy that runs 100% on renewable energy. Seeweb supports IaC via Terraform and offers 5 different GPU models.

Datacrunch.io

Datacrunch provides Nvidia’s A100, H100 RTX6000, V100 models in groups of 1, 2, 4 or 8. The company is based in Helsinki, Finland and relies on 100% renewable energy.

OVHcloud

OVHcloud is a public cloud provider headquartered in France. It started offering Nvidia GPUs in 2023 and plans to expand its offering.18

Scaleway

Scaleway offers H100 instances, provides 3 European regions (Paris, Amsterdam, Warsaw) and relies 100% of renewable energy. For high scale users, Nabu 2023 supercomputer with its 1,016 Nvidia H100 Tensor Core GPUs is available.

What are upcoming GPU cloud providers?

These providers have limited reach or scope or recently launched their offerings. Therefore they were not included in the top 10:

Alibaba Cloud

Alibaba’s offering may be attractive for businesses operating in China. It is also available across 20 regions including those in Australia, Dubai, Germany, India, Japan, Singapore, the USA and the UK.19

However, a US or EU organization with access to top secret data in domains such as state, defense or telecom may not prefer to work with a cloud service provider headquartered in China.

Cirrascale

Cirrascale is specialized in providing different AI hardware to research teams. Though they are one of the smallest teams in this domain with about ~20 employees, they offer AI hardware from 4 different AI hardware producers.20

Voltage Park

Voltage Park is a non-profit that spent funds including ~$500 million with NVIDIA to set up 24,000 cloud H100 GPUs. 21 22 It offers low-price GPU rental to AI focused companies like Character AI.

The most cost-effective cloud GPUs

Hover over each dot to see the most cost-effective cloud GPUs:

We benchmarked all cloud GPUs on AWS with common text and image-related tasks. Performance of the same GPU on all clouds was assumed to be the same.

How to start the correct instance for your cloud GPU needs

Making the right decisions when setting up a cloud GPU instance is essential for streamlining the initial setup phase. Without careful attention to compatibility between the model, OS and GPU, this process can take hours, significantly increasing costs since GPU providers charge by the hour. By following these steps, you can avoid unnecessary delays and ensure cost efficiency for your project:

  1. Select the Model: Start by selecting the model you plan to use (e.g., YOLOv9). 
  2. Identify its dependencies: The model choice directly influences the framework and libraries (e.g., PyTorch, TensorFlow) you’ll need to build and deploy your solution.
  3. Identify the appropriate CUDA version: CUDA is necessary to run NVIDIA GPUs in an optimized manner. For example, the PyTorch version that you need will dictate a certain CUDA version.
  4. Use our benchmark to choose the most cost-effective GPU: Leverage benchmark data to select the GPU that provides the best balance of price and performance for your specific workload.
  5. Check if the GPU is offered in the region that you prefer: Cloud providers often have varying hardware inventories across regions, and certain GPUs might not be offered in some areas. Checking whether the GPU is offered helps avoid deployment delays. However, even if a GPU is offered, it may not be available when you request it since it may be overbooked. You can check GPUs offered per region on:
    1. AWS: Price calculator
    2. Azure: Pricing calculator
    3. GCP: GPU Availability docs
  6. Choose the right operating system: While you select your setup on the cloud provider, you will need to choose the operating system (OS) and its version. The OS needs to support the CUDA version that you require and drivers for the GPU.
  7. Deploy the drivers and dependencies or choose a system where they are preloaded: You can either manually install the necessary drivers and dependencies or use pre-configured environments provided by cloud providers, such as Azure’s extensions or AWS’s AMIs, to simplify the setup process.

FAQ

What is a cloud GPU platform?

A cloud GPU platform is a service offered by cloud gpu providers that allows users to access and utilize GPU technology remotely. Instead of having physical GPUs installed in local machines, users can use the power of cloud GPUs hosted on efficient cloud GPU platforms. These platforms, like Google Cloud GPUs and NVIDIA GPU instances, harness the high-performance capabilities of GPUs such as the NVIDIA Tesla series, making them accessible to users through the cloud.

Why do you need cloud GPU services?

Cloud GPU services are essential for individuals and businesses that require immense computational power without the capital expense of buying and maintaining physical GPUs. As the demand for high-performance computing increases in areas like artificial intelligence, deep learning, and graphics rendering, an efficient cloud GPU platform can offer scalable and cost-effective solutions. 

Moreover, with the emergence of best cloud GPU platforms, users can now rent GPU power on-demand, suitable for short-term intensive tasks or projects. This way, users can leverage the cutting-edge capabilities of services like Google Cloud GPUs or NVIDIA GPU instances without committing to a significant hardware investment.

How secure are cloud GPU services?

Security is a top priority for any cloud GPU provider. The best cloud GPU platforms implement stringent security measures, ensuring that users’ data and applications remain protected. This includes data encryption during transit and at rest, secure access controls, regular security audits, and more. Providers of services like NVIDIA GPU instances and Google Cloud GPUs invest heavily in maintaining the integrity and confidentiality of user data. 

As with any cloud service, while the provider takes measures to secure the infrastructure, users should also follow best practices in data management and access control to ensure optimal security.

What is GPU quota?

Cloud providers allocate specific quotas for GPU instances, which can vary by type and region. To request a quota increase, developers must specify the instance type (e.g., P3XL) and the region (e.g., Oregon). Providers often evaluate the developer’s intended usage and current consumption patterns before approving a quota adjustment, ensuring resources are allocated efficiently. The procedure and processing time for quota increases vary by provider.

Share This Article
MailLinkedinX
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Next to Read

Comments

Your email address will not be published. All fields are required.

4 Comments
Alisdair
Oct 22, 2024 at 05:36

Nice article, Cem! Could you add Koyeb and a few other serverless GPU providers?

Cem Dilmegani
Nov 10, 2024 at 07:13

Sure, thank you for the suggestion, we will consider it in the next edit.

Jesper
Oct 06, 2024 at 03:58

Hi Cem, please also check out Dataoorts at https://dataoorts.com. We’d greatly appreciate being listed here.

Cem Dilmegani
Oct 22, 2024 at 03:18

Sure, we’ll review to see if we can include Dataoorts in the next edit.

Jerry
Jul 24, 2024 at 09:56

Hi Cem, we just launched Atlascloud.ai with the lowest H100 pricing on internet 2.48 on demand. Would love to get on your list.

Cem Dilmegani
Jul 28, 2024 at 10:24

Sure, we’ll be reaching out to understand what Atlascloud.ai is offering.

Evgenii Pavlov
Jun 14, 2024 at 15:23

Where is Nebius.ai ???

Cem Dilmegani
Jul 14, 2024 at 08:45

Thank you! It is added now.

Related research