How to Train a Neural Network on a GPU in the Cloud with coiled functions
We recently pushed out two new and experimental features coiled jobs
and coiled functions
which is a deviation of coiled jobs
. We are excited about both of them because they:
- Allow users to scale up any given program on any hardware in the cloud.
- Make GPUs easily accessible without going through the pains of setting up environments in the cloud.
This post will provide an example how to utilize coiled functions
to seamlessly train a
neural network on a GPU that is hosted in the cloud.
Getting started
We have to start with creating a model on our local machine before we can start worrying about
training it. This blog post is not dedicated to figuring out a fancy mode, we will utilize the
Net
model that is given in the PyTorch tutorials.
We can simply add the model definition to our python script. There is no need to do anything different. Similarly, we will use the transformer that is given there as well.
The next step is creating a function that we can use to train the model:
def train(transform):
device = torch.device("cpu")
net = Net()
net = net.to(device)
trainset = torchvision.datasets.CIFAR10(
root='./data', train=True, download=True, transform=transform,
)
trainloader = torch.utils.data.DataLoader(
trainset, batch_size=4, shuffle=True, num_workers=2,
)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for i, data in enumerate(trainloader, 0):
inputs, labels = data[0].to(device), data[1].to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
return net
We can now train our model:
if __name__ == "__main__":
train()
This will train our model on the CPU of our local machine. The training is reasonably quick for such
a small model, but training time will grow exponentially as our model gets larger or if we are
using a significantly bigger dataset. Training the model on the CPU won't be sufficient anymore.
Additionally, there are a lot of machines out there that don't have GPUs built into them. For example, I'm using a MacBook Pro with an M2 CPU, which means my machine doesn't support cuda
.
Consequently, we need a different solution to make these steps accessible for folks who don't have
access to a local GPU.
Using coiled functions
to train the model on a cloud-hosted GPU
Coiled functions come into the equation if you need access to resources that aren't available locally. Coiled can connect to AWS or GCP and thus, use all resources that are available there. We will go through the necessary steps to train our model on a GPU that is hosted on AWS instead of our local CPU.
The first step includes defining a Python environment to run our computations. We simply include PyTorch, CUDA, and Coiled, that's it. Generally, you should use the same Python version that is installed locally.
import coiled
coiled.create_software_environment(
name="pytorch",
conda={
"channels": ["pytorch", "nvidia", "conda-forge", "defaults"],
"dependencies": [
"python=3.11",
"coiled",
"pytorch",
"torchvision",
"torchaudio",
"cudatoolkit",
"pynvml",
],
},
gpu_enabled=True,
)
Coiled will create a Python environment for you. This step is only necessary when running your script for the first time. The resulting environment is cached which makes further runs more efficient.
The next step is adding the @coiled.run
decorator to our training
function that tells our program we want to execute said function on a machine in the cloud.
@coiled.run(
worker_vm_type="g5.xlarge", # GPU instance type
region="us-west-2",
software="pytorch",
)
Additionally, we have to tell PyTorch that we want to train the model on the GPU.
def train():
import torch
# tell PyTorch to use the GPU
device = torch.device("cuda:0")
...
return net.to(torch.device("cpu"))
Putting this all together:
@coiled.run(
worker_vm_type="g5.xlarge",
region="us-west-2",
software="pytorch",
)
def train(transform):
import torch
device = torch.device("cuda:0")
net = Net()
net = net.to(device)
trainset = torchvision.datasets.CIFAR10(
root='./data', train=True, download=True, transform=transform,
)
trainloader = torch.utils.data.DataLoader(
trainset, batch_size=4, shuffle=True, num_workers=2,
)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for i, data in enumerate(trainloader, 0):
inputs, labels = data[0].to(device), data[1].to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
return net.to(torch.device("cpu"))
if __name__ == "__main__":
train()
Let's take a brief look at the arguments to coiled.run()
:
worker_vm_type
: This specifies the type of EC2 instance. We are looking for an instance that has a GPU attached to it. The G5 family has Nvidia GPUs attached to it. The smallest version is sufficient for our example, but you can choose instances with up to 8 GPUs.region
: The region specifies the AWS region that our VM is started in. We observed that"us-west-2"
is a region where GPUs are easier to get.software
: This specifies the coiled software environment that is installed. This corresponds to the environment that we previously created.
coiled.run()
will now start a VM in AWS with the specified EC2 instance. The VM is normally up
and running in 1-2 minutes. The previously specified Python environment is installed automatically.
Coiled executes the function on said VM. Inputs of your function are serialized and sent to the VM
as well. It makes sense to download the training data on the VM to reduce time that is spent sending
data to AWS. The function returns our model back to our local machine so that we can use it locally
without depending on AWS.
Coiled will shut down the VM immediately after the Python interpreter finishes. This is mostly to
reduce costs. You can specify a certain amount of time that the VM is kept alive through
keepalive="5 minutes"
. This ensures that new local runs can connect to the same VM avoiding
the boot time of up to 2 minutes; we call this a warm start.
Conclusion
coiled functions
enables you to seamlessly port the training process for a neural network
from your local machine to AWS or GCP. This grants everyone access to multiple GPUs or huge
machines independent of the local machine that is actually used. Training a neural network on a
GPU becomes as easy as adding a decorator to the training function.