Parallels: The Effect of Width on the Performance of Large-batch Training
Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training. Training with large batches can reduce these overheads; however, large batches can affect the convergence properties and generalization performance of SGD. In this work, we take a first step towards analyzing how the structure (width and depth) of a neural network affects the performance of large-batch training. We present new theoretical results which suggest that–for a fixed number of parameters–wider networks are more amenable to fast large-batch training compared to deeper ones. We provide extensive experiments on residual and fully-connected neural networks which suggest that wider networks can be trained using larger batches without incurring a convergence slow-down, unlike their deeper variants.
DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times, to orders of magnitude faster than median-based approaches.
Morpheus: Linear Algebra over Normalized Data
Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all the ML tools require the input data to a single table, many datasets are multi-table, forcing data scientists to join those tables first, leading to data redundancy and runtime waste. Recent works on "factorized" ML mitigate this issue for a few specific ML algorithms. But their approaches require a manual rewrite of ML implementations, which create a massive development overhead when extending such ideas to other ML algorithms. In this project, we show that it is possible to mitigate this overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra. We introduce a new logical data type to represent normalized data and devise a framework to convert a large set of linear algebra operations over denormalized data into operations over normalized data. We show how this enables us to automatically "factorize" several popular ML algorithms, thus unifying and generalizing several prior works. We prototype our framework in the popular ML environment R and an industrial R-over-RDBMS tool. Extensive experiments show that our framework also yields significant speed-ups, up to 36x on real data.
Scorpio: Scheduling for Visible Light Communication Networks
Visible light communication (VLC) networks, consisting of multiple light-emitting diodes (LEDs) acting as optical access points (APs), can provide low-cost high-rate data transmission to multiple users simultaneously in indoor environments. However, the performance of VLC networks is severely limited by the interference between different users. In this paper, we establish a distributed user-centric scheduling framework based on stable marriage theory, and propose a novel decentralized scheduling method to manage interference by forming flexible amorphous cells for all users. The proposed scheduling method has provable low computational complexity and requires only the exchange of a few 1-bit messages between the APs and the users but not the feedback of the channel state information of the entire network. We further show that the proposed method can achieve both user-wise and system-wise optimality as well as a certain level of fairness. Simulation results indicate that our decentralized user-centric scheduling method outperforms existing centralized approaches in terms of throughput, fairness, and computational complexity.