multi node nematus progress

Multi-node, multi-GPU training for NEMATUS

See running examples in ./test/train/


  1. Introducing new interface, which uses .platoonrc to manage the training nodes, and GPU cards (see ./test/train/.platoonrc).
  2. Compatible with former Single-node, multi-GPU scenario.

Single-node scenario

  1. Set .platoonrc
  2. Running, which is compatible with former version.
  3. Using to kill jobs, for example, ./ single-exp/log/pids.txt

Multi-node scenario

  1. Set .platoonrc
  2. There is a little different in training script, which is: We SHOULD NOT set the platoon-launcher command in to background, which might cause mpi spawning error. Instead, we can set to background with screen, like screen sh &
  3. Using to kill jobs, for example, ./ multi-exp/log/pids.txt


  1. Training stability and speed test
  2. Fixing MPI-related spawning exception when setting the platoon-launcher to background in

Published: April 01 2017

  • tags:
blog comments powered by Disqus