federated: Code doesn't run on GPU

Describe the bug

I am building my own project with tensorflow federated learning API. When running my code, the GPU is visible (as shown below), but the computation of federated learining was not done on GPU.

System output to show that the gpu is indeed added

2020-04-20 21:36:49.347491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-20 21:36:49.347853: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-20 21:36:49.355351: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499685000 Hz
2020-04-20 21:36:49.356094: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55957a6a65e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-20 21:36:49.356124: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-20 21:36:49.421125: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55957a6c9c90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-20 21:36:49.421170: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX TITAN X, Compute Capability 5.2
2020-04-20 21:36:49.422796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX TITAN X computeCapability: 5.2
coreClock: 1.076GHz coreCount: 24 deviceMemorySize: 11.93GiB deviceMemoryBandwidth: 313.37GiB/s
2020-04-20 21:36:49.422898: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-20 21:36:49.422959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-20 21:36:49.423015: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-20 21:36:49.423068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-20 21:36:49.423182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-20 21:36:49.423287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-20 21:36:49.423351: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-20 21:36:49.427345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-20 21:36:49.427454: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-20 21:36:49.430516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-20 21:36:49.430573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-04-20 21:36:49.430624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-04-20 21:36:49.434605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11498 MB memory) -> physical GPU (device: 0, name:
 GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2)

Environment (please complete the following information):

Linux Ubuntu 16.04
Python package versions

When running pip freeze | grep tensorflow, I got:

tensorflow==2.1.0
tensorflow-addons==0.8.3
tensorflow-estimator==2.1.0
tensorflow-federated==0.13.1
tensorflow-model-optimization==0.2.1
tensorflow-privacy==0.2.2

When running pip3 freeze | grep tensorflow, I got:

tensorflow==2.1.0
tensorflow-estimator==2.1.0
tensorflow-gpu==2.1.0

Python version: Python3.7
CUDA/cuDNN version: CUDA Version 10.2.89

Expected behavior Here is my code to perform federated training:

    federated_train_data = [dataset, dataset]
    trainer = tff.learning.build_federated_averaging_process(model_fn,\
            client_optimizer_fn=lambda: keras.optimizers.Adam(lr=args.lr, clipnorm=0.001),
            server_optimizer_fn=lambda: keras.optimizers.Adam(lr=args.lr, clipnorm=0.001))

    state = trainer.initialize()
    for i in range(1000):
        state, metrics = trainer.next(state, federated_train_data)
        print('round  {}, metrics={}'.format(str(i), metrics))

The training can be done successfully, but not on GPU. Training speed is very slow.

Other information I install the environment following the instruction, by typing ‘pip install --upgrade tensorflow_federated’.

An interesting thing is that when I use ‘fit()’ to perform model training shown above rather than using federated learning, the computation can be done on GPU.

Many thanks for your help!

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 34

Most upvoted comments

@aqibsaeed The default of TFF is similar to TF: if GPU is provided, TFF/TF will try to use it. But TFF on GPUs is under development and have not been broadly tested. We have been working on the performance and made some progresses, but we are not at a point to confidently advertise GPU usage for now.

nightldj on Apr 28, 2020