Session

Redis for Scaling Distributed Deep Learning

Abstract:
------------
At Eydle, we are reimagining distributed deep learning technology to optimize training speed and cost. We are reinventing the technology to handle fault tolerance, variable network latency and heterogeneity of devices leading to 70-90% reduction in cost. By using Redis as an eventual consistency key-value store for model parameters, we have achieved 1.5x faster transaction times.

Background:
-----------------
Distributed deep learning has gained a lot of interest in the past few years due to its cost effectiveness in scaling large deep neural net training jobs. A typical distributed deep learning setup has client and server architecture. Several clients work independently, in parallel, on a smaller portion of the training job, and create local copies of the neural net’s weights (parameters). The parameter server facilitates combining these individual client weights into a central copy. The parameter server in turn may have several sub-processes (threads) for vertical scaling.

Shared memory lock-free data structures or file-locking are the commonly used ways for storing the weights centrally on a parameter server, but they prevent horizontal scaling of parameter servers to shared-nothing compute nodes. A natural choice to circumvent this problem is to use a database to store the central weights, and let multiple parameter servers access them simultaneously. Conventional databases provide ACID properties, but their strong consistency in transaction processing can mar the speed of updates. In the prior published work on using lock-free data structures for distributed deep learning, it has been shown that deep learning training can tolerate some loss of updates without a significant impact on the accuracy of training. This makes eventually consistent main-memory databases a good choice for distributed deep learning for horizontal scaling.

Using Redis in distributed deep learning:
-----------------------------------------------------
In our work, we have used Redis for storing central parameter server weights accessed through multiple parameter services. Each service processes neural network weights received from clients, and updates the copy of weights in Redis. Because of using Redis, the individual parameter services do not have to worry about connecting to a shared memory or locking a file to make concurrent updates. Using Redis also relieves the parameter services from handling crashes themselves, which become necessary when using shared memory or file locking solutions. Using Redis’ “save” frequency configuration parameter, we can tune the frequency of disk writes, which is not possible in the traditional databases. It also avoids the complexity of managing shared memory in the event of a crash.

Experimental Results:
-----------------------------
We have used Redis 6.0.8 server to work with up to 6 simultaneous parameter services for updating Tensorflow weight objects of 21 MB size in individual “set” operations. Over a long distributed deep learning job, we typically do upwards of 2000 such “set” operations. As we increase the number of parameter services from 1 to 3, we notice that the average time of one Redis “set” operation goes up from 0.74 second to 0.87 second. With 5 parameter services the Redis “set” operation on average takes 1 second. As opposed to this, when we used MySQL to store and update weights with 3 simultaneous parameter services, it takes 1.29 seconds per SQL UPDATE transaction, which is even higher than Redis’ performance with 5 simultaneous parameter services.

Additionally, as we increase the simultaneous parameter services from 1 to 5, we do not notice any deterioration in the training accuracy, due to any loss of updates. On the other hand, there is a significant decrease in the total training time with the higher number of parameter services and clients, for achieving the same training accuracy. For example, going from 1 parameter service with 3 clients, to 5 parameter services with 5 clients, there is a marked decrease of more than 8 hours in the total training time.

Medha Atre

Scientific Consultant, Eydle Inc.

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top