## Poster Presentations

### Wednesday, July 25, 2018 – 4:00PM – 5:00PM

Presenter's Name
Affiliated Organization
Poster Title
Revised Abstract
David M Groppe
Krembil Research Institute
Time series artifact correction via recurrent neural networks
Elaheh Hosseini Iraj
Concordia University
An Experimental Study of Machine Learning and Inverse Optimization for Predicting Behaviour of Optimizers
Given a data set of observations consisting of prices and the corresponding levels of demand, we consider the problem of imputing the parameters of a utility function that the customer aims to optimize. This study positions inverse optimization as a learning method along with machine learning algorithms. We experimentally compare the performance of machine learning and inverse optimization methods for imputing utility functions. In our experiments, we vary the size of the training data set and the level of prior information. Analysis of these results gives insight into the respective strengths and weaknesses of machine learning algorithms and inverse optimization approaches in the context of data that is generated by an optimization process. We believe this study builds a bridge between machine learning and inverse optimization literatures.
Toronto Rehabilitation Institute
Machine learning for personalized fall risk prediction in dementia
Dynamic changes in human gait are associated with increased risk of falls. The objective of this study is to demonstrate the feasibility of using machine learning and deep learning methods to build a predictive model of future falls for older adults with dementia. Our approach incorporates the individual’s gait pattern and clinical records captured longitudinally during their stay at a hospital or a residential facility for people with dementia. We propose a low dimensional representation of patients’ sequence of time-ordered walking bouts captured during their stay. Patient’s clinical records including demographics, medications, and clinical assessments were also captured and embedded. The sequences of embeddings from different categories were concatenated and were averaged over a time window of hours (as a hyperparameter) and formed each patient’s personalized rehabilitation record (PRR). A variety of machine learning and deep learning algorithms including recurrent neural network models (RNN), boosted trees (BS) and feed forward (FF) models were applied on PRR to predict the probability of occurrences of falls at different time points (days to weeks prior to actual fall). To optimize model accuracy, our final model was an ensemble of predictions from all the predictive models.
Gavin Weiguang Ding
Borealis AI
On the Input Data Distributions’ Influences on Adversarial Robustness
Neural networks are vulnerable to imperceptible adversarial perturbations. However, the varying degrees of vulnerability across datasets are rarely examined in the existing literature. Here, we analyze how input data distributions influence adversarial robustness of neural network classifiers. Specifically, we find that comparable test accuracies on numerous MNIST and CIFAR10 variants are achieved by different neural nets, but their adversarial robustness differs drastically. This phenomenon indicates that input data distribution alone can affect the adversarial robustness of trained neural networks, not necessarily the tasks themselves. We then examine and discuss potential hypotheses to explain these phenomena.
Jared Keown
University of Victoria
Classifying Star-Forming and AGN galaxies in the absence of infrared observations: A machine learning approach
The Active Galactic Nuclei (AGN) found at the centres of many galaxies are among the most powerful sources of energy production in the universe. Fuelled by the accretion of material onto a supermassive black hole, AGN are ~10^13 times more luminous than the Sun. Understanding the dynamics and evolution of AGN requires accurately separating AGN-host galaxies, which have older stars and lack large gas reservoirs, from the other dominant galaxy population: Star-Forming (SF) galaxies that are young, gas-rich, and actively forming new stars. Traditionally, AGN & SF galaxies have been classified based on features obtained from both their optical & infrared light. We present a new method for classifying AGN & SF galaxies using a neural network trained with only optical spectral features collected by the Sloan Digital Sky Survey. Our model shows that accurate classifications can be made without the need to collect infrared observations, making it easier to identify/compare AGN & SF galaxies.
Magdalini Paschali
Technical University of Munich
Generalizability vs. Robustness: Investigating medical imaging networks using adversarial examples
Deep learning frameworks are being used for a variety of medical imaging tasks in order to aid the physicians and decrease time required to analyze medical data. However, the robustness of deep neural networks for medical applications is not being properly investigated, in cases of extreme ambiguity and outliers, since these cases are hard to model, the amount of data is limited and the annotations costly. We propose to utilize adversarial examples as a benchmark to investigate and analyze the performance of state-of-the-art medical networks and compare how different architectures react to adversarial attacks. Benchmarking networks on the worst case scenario can be utilized to comprehend which one has a better understanding of the underlying manifold of the training data and should be preferred for clinical practice. Experiments on full brain segmentation and fine-grained skin lesion classification show the variance in robustness of networks that perform equally good on clean data.
Mahmoud Afifi
Lassonde School of Engineering, York University
Semantic White Balance: Semantic Color Constancy Using Convolutional Neural Network
The goal of computational color constancy is to preserve the perceptive colors of objects under different lighting conditions by removing the effect of color casts caused by the scene’s illumination. With the rapid development of deep learning based techniques, significant progress has been made in image semantic segmentation. In this work, we exploit the semantic information together with the color and spatial information of the input image in order to remove color casts. We train a convolutional neural network (CNN) model that learns to estimate the illuminant color and gamma correction parameters based on the semantic information of the given image. Experimental results show that feeding the CNN with the semantic information leads to a significant improvement in the results by reducing the error by more than 40%.
Martin Magill
University of Ontario Institute of Technology
Neural Networks Trained to Solve Differential Equations Learn General Representations
In this work, we introduce a technique based on the singular vector canonical correlation analysis (SVCCA) for measuring the generality of neural network layers across a continuously-parametrized set of tasks. We illustrate this method by studying generality in neural networks trained to solve parametrized boundary value problems based on the Poisson partial differential equation. Specifically, each neural network in our experiment is trained to solve for the electric potential produced by a localized charge distribution on a square domain with grounded edges. We find that the first hidden layer of these networks is general, in that the same representation is consistently learned across different random initializations and across different problem parameters. Conversely, deeper layers are successively more specific. We validate our method against an existing technique that measures layer generality using transfer learning experiments. We find excellent agreement between the two methods, and note that our method is much faster, particularly for continuously-parametrized problems. Finally, we visualize the general representations that we discovered in the first layers of these networks, and interpret them as generalized coordinates over the input domain.
Nabiha Asghar
University of Waterloo
Transfer Learning for Neural Conversational Agents using Non-Parametric Memory
For generative neural dialogue systems, domain adaptation is a challenging task. While generic dialogue data is available in abundance, usually there is little conversational training data available for narrow and specific domains, necessitating the need for domain adaptation techniques. Most of the existing neural dialogue domain adaptation methods either train on the source and target datasets sequentially (known as fine-tuning), or train on the combined data. However, these methods suffer from catastrophic forgetting and over-fitting of one of the two domains. To overcome these issues in neural dialogue domain adaptation, we propose a transfer learning approach based on non-parametric memory. Concretely, a non-parametric memory bank for each domain is added to the encoder-decoder recurrent neural architecture and trained end-to-end, enabling the model to learn to store domain-independent features in the main network and domain-specific features in the respective memory banks. Experimental evaluation suggests that our method can adapt to target domains while mitigating the effects of catastrophic forgetting and over-fitting, and has little overhead in terms of training time.
Nouha Dziri
Amii, University of Alberta
Response Generation For An Open-Ended Conversational Agent
Conversation plays a key role in maintaining humans well-being. It constitutes the most natural way of interacting verbally with each other. Over the past decade, dialogue systems have become omnipresent in our daily lives, assisting our daily schedule and routine. Recently, the emergence of neural network models has shown promising results in solving problems such as scalability and language-independence that conventional dialogue systems fail to cope with. In particular, Sequence-to-Sequence (Seq2Seq) models have witnessed a notable success in generating natural conversational exchanges by sampling words sequentially conditioned on previous words. However, these models still lag far behind human capabilities in terms of the conversations that they can perform. Notwithstanding the syntactically well-formed responses generated by Seq2Seq models, they are prone to be generic, dull and off-context such as "i don't know" or "i'm not sure what you're talking about". In this work, we introduce a Topical Hierarchical Recurrent Encoder Decoder (THRED), a novel, fully data-driven, multi-turn response generation system intended to produce contextual and topic-aware responses. Our model is built upon the basic Seq2Seq model by augmenting it with a hierarchical joint attention mechanism that incorporates topical concepts and previous interactions into the response generation. We demonstrate that incorporating conversation history and topic information with our novel method improves generated conversational responses. To train our model, we provide a clean and high-quality conversational dataset mined from Reddit comments. Additionally, we propose two novel quantitative metrics for measuring the quality of the generated responses, dubbed Semantic Coherence and Response Echo Index. Our experiments on these quantitative metrics along with human evaluation demonstrate that the proposed model is able to generate more diverse and contextually relevant responses compared to the strong baselines. Furthermore, we show that both quantitative metrics agree reasonably with human judgment, making a step towards a good automatic evaluation procedure.
University of British Columbia
A nonlinear optimization method with focus
In this presentation an optimization approach with the capability of moving its focus on the variables' domain will be presented. In this approach, the regularizer can be imposed on the problem with a dual optimization method. Furthermore, explicit constraints can be imposed on the variables. By changing the focus on the domain of the variable, this approach has better chance in optimizing the problem to a better solution, surpassing local minimas. It will also be shown that by imposing constraints on the variables, the weight factors can be penalized to converge to specific predefined values. The theories of this approach will be presented and its behavior in optimizing neural nets will be demonstrated.
Sanjay Kumar Thakur
McGill University
Uncertainty in Multi-Task Learning from Demonstrations
Deep Learning based learned controllers have achieved impressive results on many complex non-linear tasks. However, these controllers break easily on deployment due to the numerous unavoidable discrepancies between the training and deployment task distributions. One possible solution would be to infuse a notion of uncertainty in such controllers that should reflect its confidence to do well in a given situation and leverage it to learn only when it is necessary. In this work, we show how a Bayesian Neural Network technique called Bayes-by-Backprop can be used to learn policy in Learning from Demonstrations framework in a way that generates uncertainty on a given situation of current state and environment dynamics. A higher value of the uncertainty would correspond to a lower familiarity of the learned policy with the given situation and hence a subpar performance. We then leverage this uncertainty to solicit demonstrations for learning multiple tasks only when the current task is non-generalizable. Our experiment on a real robotic pendulum swing-up task shows that such uncertainty on a given situation can indeed be generated. Furthermore, more experiments on complex MuJoCo tasks like HalfCheetah and Swimmer show that our mechanism can leverage this uncertainty to learn multiple tasks using a low training budget.
SiQi Zhou
University of Toronto Institute for Aerospace Studies
Deep Neural Networks as Add-on Blocks for Improving Impromptu Trajectory Tracking
High-accuracy trajectory tracking is a typical control problem involved in a wide range of applications (e.g., advanced manufacturing and autonomous driving). In the control literature, various techniques have been developed to design optimal controllers; however, these techniques often require a sufficiently accurate system dynamic model, which is not always available. In previous work, a deep neural network (DNN)-based approach is proposed to enhance the tracking performance of conventional control systems on arbitrary trajectories. In this approach, the DNN acts as a reference generator that adapts the reference signal to a stabilized, closed-loop control system based on experience. For tracking 30 hand-drawn trajectories with a quadrotor, the DNN-based approach reduces the tracking error of a baseline controller by an average of 43%. In this work, we provide a theoretical formulation of the DNN-based approach, which includes deriving the underlying function learned by the DNN module, identifying necessary conditions for the approach to be effective, and providing insights on the efficient selection of the inputs to the DNN module. Based on the formulation, we show that the input dimension of the DNN module can be reduced by 2/3 as compared to previous work, while achieving comparable performance on the 30 hand-drawn trajectories.
Sleiman Bassim
University Health Network
Detection of high risk lymphoma patients using machine learning
With an incidence of over 4,000 new cases per year in Canada, diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma. Relapse occurs in 40% of DLBCL patients and is often fatal. We have a particular interest in relapse that occurs in the brain (5% of patients). While potentially preventative treatments exist, prophylaxis of brain relapse is toxic and is best reserved to those patients who are at highest risk of relapse. Unfortunately, no accurate prediction model of brain relapse exists. We leveraged gene expression data that were generated from the diagnostic tissue biopsies from 240 patients. These patients were selected to fall into either of 3 groups: those that presented with brain relapse, those that presented with relapse outside of the brain, and those that were cured. From these available data we created gene-gene networks to identify patterns in gene interactions. Subsequently, we used over 20 different machine learning models and several deep networks to test prediction accuracy and identify those genes that are most representative of each group. Presently, we are able to predict clinical outcomes with a cross-validated accuracy of 0.81 (AUROC, 0.77-0.83 at 95% CI). Support vector machines, gradient boosting, and neural nets ranked as top classifiers. Based on the performance measurements of these models, around 100 unique genes were representative in brain relapse. They range from immunoglobulin isoforms and proto-oncogenes like MYB to lymphocyte associated genes like BTLA. In summary, our approach allows us to identify patients who are at high risk of brain relapse. As a next step, we are going to validate our model in external datasets, prior to potential implementation studies in the routine clinical setting. Our study also opens the possibility for functional exploration of the biological underpinnings of relapse in the brain.
Yagmur Gizem Cinar
Univ. Grenoble Alpes
Period-aware content attention RNNs for time series forecasting
Recurrent neural networks (RNNs) recently received considerable attention for sequence modeling and time series analysis. Many time series contain periods, e.g. seasonal changes in weather time series or electricity usage at day and night time. Here, we first analyze the behavior of RNNs with an attention mechanism with respect to periods in time series and illustrate that they fail to model periods. Then, we propose an extended attention model for sequence-to-sequence RNNs designed to capture periods in time series. This extended attention model can be deployed on top of any RNN, and is shown to yield state-of-the-art performance for time series forecasting on several univariate and multivariate time series.

## Saturday, July 28, 2018 – 4:00PM – 5:00PM

Presenter's Name
Affiliated Organization
Poster Title
Revised Abstract
Blanca Miller
Inverse Reinforcement Learning for Model Predictive Control of a Self-Driving Car
Advances in autonomous systems’ navigation have the potential to improve the safety and efficiency of ground and aerial transportation. However, reliable and accurate control remains a challenge for dynamic and unpredictable environments due to a wide range of operating conditions. Model Predictive Control (MPC) is a fast online control method with a capacity for explicit constraint compliance that solves a quadratic programming problem to produce optimal trajectories in real time. We use inverse reinforcement learning (IRL) to learn the controller’s cost function and capture latent features of human driving. IRL uses a Markov Decision Process to represent the environment. Further, it operates under the assumptions that training data represent expert behavior and that there exists an underlying optimal process that governs decision making. We implemented MPC for a Lincoln MKZ autonomous driving system using IRL to learn the cost function from driving data collected in the Reno-Tahoe area. This poster will discuss findings, compare previous works, and discuss next steps for optimal trajectory generation.
Callie CO Federer
University of Colorado Anschutz Medical Campus
A self-organizing short-term dynamical memory network
Working memory relies on us being able to retain information about stimuli even after they go away. Stimulus information is encoded in the activities of neurons, which change over timescales of milliseconds. Information in working memory can be retained for tens of seconds, leaving the question of how time-varying neural activities keep representing the same information. Prior work shows that, if the neural dynamics are in the “null space” of the representation – so that changes to neural activity do not affect the downstream read-out of stimulus information – then information can be retained for periods much longer than the time-scale of individual-neuronal activities. The prior work, however, has a fine-tuning problem. To identify mechanisms through which biological networks can self-organize to support memory function, we derived biologically plausible synaptic plasticity rules that dynamically organize the connectivity matrix to enable retention of stimulus information.
Davit Buniatyan
Princeton University
Weakly Supervised Deep Metric Learning for Template Matching
Template matching by normalized cross correlation (NCC) is widely used for finding image correspondences. We improve the robustness of this algorithm by transforming image features with "siamese" convolutional networks trained to maximize the contrast between NCC values of true and false matches. Our main technical contribution is a weakly supervised learning algorithm for training siamese networks. Unlike fully supervised approaches to metric learning [10], our method can improve upon vanilla NCC without being given locations of true matches during training. The improvement is quantified using patches of brain images from serial section electron microscopy. Relative to a parameter-tuned bandpass filter, siamese convolutional networks significantly reduce false matches. The improved accuracy of our method could be essential for connectomics, because emerging petascale datasets may require billions of template matches during assembly. Our method is also expected to generalize to other computer vision applications that use template matching to find image correspondences.
Hesam Salehipour
University of Toronto, Autodesk Research
Deep learning of ocean mixing
Current global climate models are essentially blind to the turbulent processes in the oceans that mix deep cold waters with shallower warm waters. Therefore these models must rely on somewhat ad-hoc “parameterizations” of such processes. Recently, a significantly improved understanding of the turbulence induced by two hydrodynamic instabilities that are known to be omnipresent in the oceans (i.e. two “atoms” of ocean turbulence) has been achieved thanks to the availability of an unprecedented volume of highly-resolved simulation data. In this poster, we will discuss our predictive model of turbulent ocean mixing based on a simple CNN. We will demonstrate, that a network trained on one “atom” of turbulence is capable of revealing some of the significant characteristics of the turbulence generated by a dramatically different mechanism, suggesting that through the application of appropriate networks, significant universal abstractions of density stratified turbulence have been recognized.
James Lucas
University of Toronto; Vector Institute
Aggregated Momentum: Stability Through Passive Damping
Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed in low curvature directions. Its performance depends crucially on a damping coefficient. Large coefficients can potentially deliver much larger speedups but are prone to oscillations and instability; hence one typically resorts to small values such as 0.5 or 0.9. We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different damping coefficients. AggMo is trivial to implement, but significantly dampens oscillations, enabling it to remain stable even for aggressive damping coefficients such as 0.999. We reinterpret Nesterov's accelerated gradient descent as a special case of AggMo and provide theoretical convergence bounds for online convex optimization. Empirically, we find that AggMo is a suitable drop-in replacement for other momentum methods, and frequently delivers faster convergence.
Jens Behrmann
University of Bremen, Germany
On the Invertibility of ReLU networks
Studying the invertibility of deep networks provides a principled approach to better understand the behavior of these powerful models. In particular, understanding which perturbations do not (or only little) affect the model can be of similar importance to studying the models sensitivity to adversarial examples. A natural way to address these properties is by studying the invertibility: If the network is locally invariant to some perturbations, then both input and perturbed input lie in the pre-image of the output of the original input. Hence, the network is not uniquely invertible. Furthermore, robustness towards large perturbations induces an instable inverse mapping as small changes in the output can be due to large changes in the input. In this poster, we will summarize a theoretically motivated approach to explore the pre-images of ReLU-layers and mechanism affecting the stability of the inverse of ReLU-networks.
Jiawei He
Simon Fraser University
Probabilistic Video Generation Using Holistic Attribute Control
Videos express highly structured spatio-temporal patterns of visual data. We propose a generative framework for probabilistic video generation and future prediction. The proposed framework generates a video by decoding samples sequentially drawn from a latent space distribution into full video frames. Variational Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from the latent space and RNN as a way to model the dynamics in the latent space. We improve the video generation consistency through temporally-conditional sampling and quality by structuring the latent space with attribute controls; ensuring that attributes can be both inferred and conditioned on during learning/generation. As a result, given attributes and/or the first frame, our model is able to generate diverse but highly consistent sets of video sequences, accounting for the inherent uncertainty in the prediction task.
Julie J Lee
University College London
Flexibility to contingency changes distinguishes habitual and goal-directed strategies in humans
Decision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. We developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these changes, allowing their relative influence to be inferred. Human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning.
Luca Celotti
Université de Sherbrooke
Embedding Prior Grammatical Knowledge for Reinforcement-learning based Dialogue Systems
Goal-oriented dialogue games are typically difficult for learning agents, especially when the language domain is large (i.e. formulating complex sentences for conversational systems) or when they involve grounding language in a multimodal context (i.e. asking discriminative questions about the content of a visual scene). Recently, artificial agents learning with self-play has led to superhuman performance on games such as chess and Shogi. However, one common problem in learning language from self-play or interaction between artificial agents is language drifting. It occurs when the agents drift from using a comprehensible natural human language to their own artificial language for communication. This can be partially mitigated by data-driven approaches. with the caveat that any small change to the task often requires gathering a whole new datasets. To avoid human data we can incorporate semantic and grammatical knowledge in one of the interacting artificial agents, acting as an oracle, to mitigate the language drifting problem. We can design it to control the language bias, a common problem in perceptually-grounded language games occurring when the agent completely ignores the contextual input (e.g. image of the scene) and only exploits prior language statistics (e.g. most frequent words of sentences) during dialogue. Curriculum learning can ease learning, of both grammar and semantics, by providing first easy experiences to solve, and increasing the difficulty of the dialogue game as the proficiency of the agent increases.
Margot Yann
ICES / University of Toronto
NLP Framework to Analyze Primary Care Clinical Notes – a Case Study on Congestive Heart Failure
A number of challenges exist in analyzing unstructured free text data in electronic medical records (EMRs). EMR text data generate gigabytes of free text information each year, and are difficult to represent and model due to their high dimensionality, heterogeneity, sparsity, incompleteness and random errors. Moreover, standard NLP tools make errors when applied to clinical notes due to physician use of unconventional written language in medical notes, including polysemy, abbreviations, ambiguity, misspelling, variations, temporality and negation. This research presents a novel framework - Clinical Learning On Natural Expression, to automatically learn from a large primary care EMR database, analyzing free text clinical notes from primary care practices. To demonstrate the performance, we evaluate our model in a case study to identify patients with cardiovascular disease.
Matthew Ng
Sunnybrook Research Institute and Dept of Medical Biophysics, University of Toronto
Estimating Uncertainty in Neural Networks for Cardiac MRI Segmentation
Segmentation of cardiac structures from magnetic resonance images is required in order to extract clinical indices and diagnose diseases. Neural networks achieve great success in this task but often produce overconfident and incorrect predictions especially when tested on out-of-distribution images. Accurate estimation of uncertainty in the segmentation is important in a large scale automated analysis pipeline to detect images which need manual corrections. In this project, we train and compare an ensemble of U-Nets and a Bayesian U-Net for segmentation of the left ventricle (LV), myocardium (Myo), and right ventricle (RV) using the UK Biobank dataset. The maximum softmax probability for each pixel was used to create confidence maps. We obtained a Dice coefficient of 93.3%, 87.2%, and 88.9% for the LV, Myo, and RV, respectively, using an ensemble of 10 U-Nets. The Bayesian U-Net showed similar results in terms of Dice coefficients and confidence maps. Low confidence was observed around the edges of the ventricles and near the base and apex of the heart, where segmentation is poor. Future work includes using these results to predict segmentation quality metrics.
Mengye Ren
University of Toronto / Uber ATG
Learning to Reweight Examples for Robust Deep Learning
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
Michael Skinnider
University of British Columbia
Revealing the unknown human metabolome with deep learning
Metabolomics is the discipline that allows scientists to quantify the abundance of small molecules at a large scale, typically using mass spectrometry (MS). Metabolomic studies have shed light on the pathophysiology of many human diseases. However, a central challenge in metabolomics is matching MS signals to chemical structures: fewer than 2% of signals identified in a typical metabolomics experiment can be linked to known molecules. One reason for this gap is that unknown metabolites are overlooked entirely by existing approaches. Generative modelling of chemical structures by deep learning could enable the automatic detection of unknown molecules in human tissues or biofluids. Multiple advances in generative modelling of chemical structures using either graphical or textual models in recent years present opportunities to build generative models of the human metabolome to reveal the structures of unknown human metabolites.
Terry Taewoong Um
University of Waterloo
Deep kinematics autoencoder
Although wearable band data are observed in 6D space, i.e., 3D for the positions and 3D for the orientation of the wristband-worn hand, the driving force to generate the data comes not from the 6D space, but the N-D space which is equivalent to the degrees of freedom (DOF) of the human body or the human arm. In robotics, 6D and N-D spaces are called joint space and task space, respectively. The hierarchical structure of the arm transforms the joint movements to the hand movements, which are well-known in robotics as forward and inverse kinematics. In this research, I would like to present how we can consider the hierarchical structure of the arm (human body) and their forward/inverse kinematics in the learning process. In particular, I take the hierarchical structure of the arm into account for designing a variational autoencoder for wearable sensor data (task-space data). As a result, the proposed method will learn the meaningful latent space, which is the joint space.
Wonmin Byeon
NVIDIA
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional net- works, recurrent networks, and their combinations often result in blurry predictions. We identify an important con- tributing factor for imprecise predictions that has not been studied adequately in the literature: blind spots, i.e., lack of access to all relevant past information for accurately predicting the future. To address this issue, we introduce a fully context-aware architecture that captures the entire available past context for each pixel using Parallel Multi- Dimensional LSTM units and aggregates it using blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of- the-art performance for next step prediction. Moreover, it does so with fewer parameters than several recently pro- posed models, and does not rely on deep convolutional net- works, multi-scale architectures, separation of background and foreground modeling, motion flow learning, or adversarial training. These results highlight that full awareness of past context is of crucial importance for video prediction.

## Monday, July 30, 2018 – 4:00PM – 5:00PM

Presenter's Name
Affiliated Organization
Poster Title
Revised Abstract
York University
USSPicker: Unbiased single-shot particle picking in cryo-EM
Particle picking in cryo-EM is a form of object detection for noisy, low contrast, and out-of-focus microscopy images, taken of different (unknown) structures. This work provides a fully automated approach which, for the first time, explicitly considers training on multiple structures, while simultaneously learning both specialized models for each structure used for training and a generic model that can be applied to unseen structures. The CNN architecture used is fully convolutional and divided into two parts: (i) a portion which shares its weights across all structures and (ii) N+1 parallel sets of sub-architectures, N of which are specialized to the structures used for training and a generic model whose weights are tied to the layers for the specialized models. Finally, parameters are learned using both synthetic and scarcely available real data. Experiments reveal the characteristics of the new approach and improved results compared to previous work.
Amir H. Ashouri
University of Toronto
Retraining-free Sparsification of CNNs
In this poster, we explore sparsification of CNNs with no retraining. We create a framework, built around TensorFlow, that sparsifies a CNN and we propose three model-independent methods for introducing sparsity in a CNN's layers. We evaluate the effectiveness of the methods using pretrained Inception-v3 and MobileNet-v1 models, both 32-bit floating point and 8-bit quantized. Our evaluation shows that the models' weights can be reduced by up to 55% while preserving at least 90% of the inference accuracy of the unsparsified model. However, no single method is best across the models. Our evaluation also shows that the choice of the layers to sparsify and the extent to which each layer is sparsified may be fine-tuned to increase sparsity while maintaining the inference accuracy.
Ankesh Anand
MILA
Meta-learning hierarchical policies with FiLM
In this paper, we tackle the problem of learning hierarchical deep neural network policies for reinforcement learning in a multi-task setup. We propose a novel architecture to autonomously learn hierarchical policies via a generic conditioning method called FiLM(Feature-wise Linear Modulation). Our approach involves two layers of hierarchy operating on different temporal resolutions, namely a manager and a worker module. We demonstrate the efficacy of our approach on several environments.
Benjamin Franck Christophe Scellier
Mila
Equivalence of Equilibrium Propagation and Recurrent Backpropagation
Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase. In the first phase, both algorithms converge to a fixed point which corresponds to the configuration where the prediction is made. In the second phase, Equilibrium Propagation relaxes to another nearby fixed point corresponding to smaller prediction error, whereas Recurrent Backpropagation uses a side network to compute error derivatives iteratively. In this work we establish a close connection between these two algorithms. We show that, at every moment in the second phase, the temporal derivatives of the neural activities in Equilibrium Propagation are equal to the error derivatives computed iteratively by Recurrent Backpropagation in the side network. This work shows that it is not required to have a side network for the computation of error derivatives, and supports the hypothesis that, in biological neural networks, temporal derivatives of neural activities may code for error signals.
Christopher Ing
University of Toronto
Biomolecular Design of Simplified Protein Models using Deep Reinforcement Learning
Proteins are large, complex biomolecules with crucial roles in the functioning of living organisms. They are composed of a linear chain of amino acids that spontaneously fold into three-dimensional structures that define their function. Predicting the structure of a protein from sequence is referred to as the protein folding problem. For short amino acid sequences, less than 20 amino acids, in conjunction with a simplified models of the protein chain, all folds can be enumerated and the folding problem can be solved deterministically. In this work, we employ reinforcement learning to solve the related problem of "inverse protein folding" or protein design; the determination of sequences that maximize structural design objectives. Specifically, we aim to obtain sequences which optimize for either fold geometry or chemical complimentary to a surface, driven by sequence modification actions made by a reinforcement learning agent. This agent surpasses baseline methods in the number of iterations required to reach the optimal solution; demonstrating that a structure-to-sequence mapping has been learned. Our work provides the foundation for policy-driven molecular design in higher complexity protein models with applications in drug discovery.
Elahe Arani
Reverse Engineering Neural Networks From Many Partial Recordings
Much of neuroscience aims at reconstructing brain function, but we only record a small number of neurons at a time. We do not currently know if simultaneous recording of most neurons is required for successful reconstruction, or if multiple recordings from smaller subsets suffice. This is made even more important as novel techniques allow recording from selected subsets of neurons. To get at this question, we analyze a neural network, trained on the MNIST dataset, using only partial recordings and characterize the dependency of the quality of our reverse engineering on the number of simultaneously recorded ”neurons”. We find that prediction in the nonlinear neural network is meaningfully possible if a sufficiently large number of neurons is simultaneously recorded but that this number can be considerably smaller than the number of neurons. Moreover, recording many times from small random subsets of neurons yields surprisingly good performance. This type of analysis we perform here can be used to calibrate approaches that can dramatically scale up the size of recorded data sets in neuroscience.
Elliot Creager
Vector Institute, University of Toronto
Gradient-based Optimization of Neural Network Architecture
Neural networks can learn relevant features from data, but their predictive accuracy and propensity to overfit are sensitive to the values of the discrete hyperparameters that specify the network architecture (number of hidden layers, number of units per layer, etc.). Previous work optimized these hyperparmeters via grid search, random search, and black box optimization techniques such as Bayesian optimization. Bolstered by recent advances in gradient-based optimization of discrete stochastic objectives, we instead propose to directly model a distribution over possible architectures and use variational optimization to jointly optimize the network architecture and weights in one training pass. We discuss an implementation of this approach that estimates gradients via the Concrete relaxation, and show that it finds compact and accurate architectures for convolutional neural networks applied to the CIFAR10 and CIFAR100 datasets.
Estefany Kelly Buchanan
Columbia University
Quantifying the behavioral dynamics of C. elegans with autoregressive hidden Markov models
We quantify the behavioral dynamics of the Caenorhabditis elegans with autoregressive hidden Markov models (AR-HMMs), a class of models that has recently yielded some insight into mouse behavior (Wiltschko et al. 2015). These models explicitly encode three hypotheses: (i) while the instantaneous worm posture is represented as a high-dimensional vector of points along the body, the first four principal components, or eigenworms, capture a significant fraction of the postural variance; (ii) within this space, the postural dynamics are well-approximated with linear AR models; and (iii) the linear AR model switches over time as the worm transitions between different discrete behaviors, like forward/reverse crawling, pausing, and turning. We show how AR-HMMs segment recordings of freely crawling C. elegans into meaningful discrete behaviors, providing a quantitative description of postural dynamics and a rigorous framework for assessing, comparing, and simulating worm behavior.
Ethan Fetaya
University of Toronto
Neural Relational Inference for Interacting Systems
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.
Farzaneh Mahdisoltani
Vector Institute, TwentyBN
The more fine grained, the better for transfer learning
We describe a DNN for fine-grained action classification and video captioning. It gives state-of-the-art performance on the challenging Something-Something dataset, with over 220, 000 videos and 174 fine-grained actions. Classification and captioning on this dataset are challenging because of the subtle differences between actions, the use of thousands of different objects, and the diversity of captions penned by crowd actors. The model architecture shares features for classification and captioning, and is trained end-to-end. It performs much better than the existing classification benchmark for Something-Something, with impressive fine-grained results, and it yields a strong baseline on the new Something-Something captioning task. Our results reveal that there is a strong correlation between the degree of detail in the task and the ability of the learned features to transfer to other tasks.
Franck Tchuente
University of Ottawa
Classification of Aggressive Movements with Unilateral or Bilateral Smartwatches
Recognizing aggressive behaviour is a human activity recognition task that could be implemented using wearable technology, such as smartwatches. Wrist-worn wearable sensors could be on the dominant, non-dominant, or both wrists. This research compared unilateral and bilateral smartwatches for classifying aggressive behaviour classification. Participants donned two Microsoft Band 2 smartwatches and performed an activity circuit of similar aggressive and non-aggressive movements. Smartwatch accelerometer and gyroscope sensors captured data that were used to extract features. Three situations were evaluated: two smartwatches (one per wrist), dominant wrist smartwatch, and non-dominant wrist smartwatch. A Random Forest machine learning classifier coupled with three machine learning feature selectors (ReliefF, InfoGain, Correlation) was used to evaluate performance metrics from each situation. Bilateral smartwatches performed the best with 99.2% accuracy, 96% sensitivity, 99.7% sensibility.
Hoa Thien Le
Laboratory LORIA
Variational Sequence-to-sequence Learning
Recently, variational latent variable is gradually considered as a natural candidate to incorporate rich nuances of context to generate fluent and humanlike text. In this work, we explore various ways to incorporate VAE into sequence-to-sequence (seq2seq) framework. This integration will thus make seq2seq become more stochastic. Precisely, the model could consist of variational variable on encoder, on decoder, on attention and on the middle hidden state. Latent variable can also be fed per time-step and linked together in a recurrent way to make it learn a very complex data pattern. Attention model can be built on top of these stochastic variables instead of deterministic hidden state. We also explore the effect of prior distributions to the model capacity. Our goal is to incorporate the most variability into seq2seq model while still keep it relatively simple.
Jeff Wintersinger
Vector Institute
Subpoplar: Reconstructing the evolutionary history of cancer
Tumours contain multiple subpopulations of cells that have evolved over many years. Each subpopulation has genetic mutations relative to a patient's normal cells that render it cancerous. These subpopulations have a common ancestral cell and thus share its mutations, but each also has its own mutations that may grant fitness advantages, allowing that population to expand over time. By computationally reconstructing the ancestry of these subpopulations, we hope to identify key steps in a cancer's progression, such as drug resistance development and ability to metastasize. Subpoplar is a novel probabilistic algorithm for propagating pairwise constraints between mutations to define the set of mutation trees that describe how a patient's cancer developed over time. This algorithm can approximate the posterior distribution over mutation trees without resorting to computationally costly MCMC, and so can scale to many more mutations and tissue samples than existing algorithms.
Jinliang Wei
Carnegie Mellon University
Towards An Efficient Simulation System for Reinforcement Learning
It's often the simulations that consume most of the computational power when training an RL agent. While a typical practice is to run massive simulations in parallel independently, sharing information between simulations may lead to a faster discovery of better actions with less computation. For example, by using an offline MCTS to search for good actions and training a model to predict MCTS's outcome, Guo et. al. were able to outperform the previous approach that trains a DQN with random, independent simulations. However, parallelizing MCTS faces a bottleneck due to sharing tree states among simulation threads/processes. We found that an MCTS implemented on the recent RL framework Ray spends 75% of its time on sharing tree states. We intend to use this poster to ask what constitutes a good simulation system for RL. Is sharing information between different simulations really important? How long does each simulation take? What else is important for efficiency and usability?
Judy Borowski
University of Tuebingen
Reproducing Decision-Making with Constrained Networks to Understand Deep Neural Networks
Deep neural networks (DNNs) have surpassed humans on tasks such as object recognition in static images. However, their inner workings remain unclear, limiting their explanatory power. Here, we explore new directions to increase the understanding of DNNs using suitably constrained network architectures that are trained to match intermediate representations of DNNs. As a first constraint, we used scale-restricted networks to show that DNNs trained on object classification primarily act as bag-of-features classifiers as opposed to recognizing global shapes to which humans are sensitive. Secondly, we approximate DNNs with shallow networks to facilitate a direct understanding of the decision-making process. In contrast to widespread visualization methods, the faithfulness of this approach can be directly quantified by suitable similarity measures between the constrained and the original DNN.
Julia Kreutzer
Heidelberg University
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of a reward estimator, and the effect of the quality of reward estimates on the overall RL task. Our analysis of cardinal (5-point ratings) and ordinal (pairwise preferences) feedback shows that their intra- and inter-annotator α-agreement is comparable. Best reliability is obtained for standardized cardinal feedback, and cardinal feedback is also easiest to learn and generalize from. Finally, improvements of over 1 BLEU can be obtained by integrating a regression-based reward estimator trained on cardinal feedback for 800 translations into RL for NMT. This shows that RL is possible even from small amounts of fairly reliable human feedback, pointing to a great potential for applications at larger scale.
Justin Jean Beland
University of Toronto
High-Dimensional Bayesian Optimization with Supervised Manifold Learning
We propose an efficient Bayesian optimization strategy that can be efficiently scaled to high-dimensional black-box minimization problems. The central idea is to exploit manifold learning techniques to minimize high-dimensional functions on low-dimensional subspaces. We introduce a novel supervised manifold learning approach to identify linear and nonlinear low-dimensional representations. In addition, we adopt a trust region approach to adaptively restrict the search to a local region where the minima is likely to be found. We demonstrate the efficacy of our approach for a variety of test functions. In addition to this, we also optimize 15 hyperparameters of a two-layer convolutional neural network on the MNIST dataset and we show that our method outperforms conventional Bayesian optimization.
Matthew Kyle Schlegel
Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta
General Value Function Networks
We show that restricting the representation-layer of a Recurrent Neural Network (RNN) improves accuracy and reduces the depth of recursive training procedures in partially observable domains. Artificial Neural Networks have been shown to learn useful state representations for high-dimensional visual and continuous control domains. If the the tasks at hand exhibits long dependencies back in time, these instantaneous feed-forward approaches are augmented with recurrent connections and trained with Back-prop Through Time (BPTT). This unrolled training can become computationally prohibitive if the dependency structure is long, and while recent work on LSTMs and GRUs has improved upon naive training strategies, there is still room for improvements in computational efficiency and parameter sensitivity. Here we explore a simple modification to the classic RNN structure: restricting the state to be comprised of multi-step General Value Function predictions. We formulate an architecture called General Value Function Networks (GVFNs), and corresponding objective that generalizes beyond previous approaches. We show that our GVFNs are significantly more robust to train, and facilitate accurate prediction with no gradients needed back-in-time in domains with substantial long-term dependences.
Meng Tang
University of Waterloo
Regularized Losses for Weakly-supervised CNN Segmentation
Minimization of regularized losses is a principled approach to semi-supervised deep learning, in general. However, it is largely overlooked in semantic segmentation currently dominated by methods mimicking full supervision via "fake" fully-labeled training masks (proposals) generated from available partial input. To obtain such full masks the typical methods explicitly use standard regularization techniques for "shallow" segmentation, e.g. graph cuts or dense CRFs. In contrast, we integrate such standard regularizers directly into the loss functions over partial input. This approach simplifies weakly-supervised training by avoiding extra MRF/CRF inference steps or layers explicitly generating full masks, while improving both the quality and efficiency of training. We proposes and experimentally compares different losses integrating MRF/CRF regularization terms. We juxtapose our regularized losses with earlier proposal-generation methods using explicit regularization steps or layers. Our approach achieves state-of-the-art accuracy in semantic segmentation with near full-supervision quality.
Mina Nouredanesh
University of Waterloo
Going deeper into human mobility analysis in the wild, using egocentric cameras and wearable IMUs
Despite advances in gait analysis methods, including optical motion capture and wireless electrophysiology, our understanding of human locomotion and falls is still limited. In particular, the vast majority of gait analysis tools are confined to controlled conditions in a clinic or laboratory. To advance our knowledge of human mobility and locomotion under naturalistic conditions, or the ’wild’, and to understand what factors put a person at the higher risk of falling, our research group investigates free-living data (e.g., egocentric video, IMUs) along with deep learning and computer vision methods. Specifically, our objectives are to: 1) develop novel markerless models to detect abnormalities (e.g., compensatory balance reactions, or near-falls) and extract spatiotemporal gait parameters (e.g., step width) to complement existing IMU-based methods, and 2) automatically identify environmental hazards, including slope changes (e.g., stairs, curbs, ramps) and surfaces (e.g., gravel, grass, concrete) that may lead to falls. The findings of our research inform the selection and timing of fall prevention and interventions (e.g., strengthening program, balance re-training).
Soroosh Shahtalebi
Concordia University
TFP-DBRNN: A Real-time Tremor Filtering and Predicting Framework based on Deep Bidirectional Recurrent Neural Networks
Pathological tremor is among the most common movement symptoms of several neurological disorders. Parkinson's Disease (PD) and Essential Tremor (ET), are the most frequent neurological conditions that cause tremor. Action tremor is defined as the combination of the pathological tremor and the subject's voluntary movement. Extracting pathological tremor is of paramount importance in various engineering and clinical applications such as assistive devices, and movement rehabilitation technologies. Numerous works in the literature have attempted to estimate and extract tremor from movement recordings. In this work, we first argue that the ground truth signal that is used in previous works to optimize the performance of tremor extraction techniques is not accurate enough and thus the performance measures for the prior techniques are not perfectly reliable. In addition, most of the existing techniques require prior assumptions which are hard to identify and represent both inter and intra-subject variability. To address the issues above, in this work, we propose a novel technique for the first time that incorporates Deep Bidirectional Recurrent Neural Networks (DBRNN) as a processing tool for extraction of tremor. Moreover, we propound a training strategy for the network that enables the network to perform not only online estimation but also an online prediction of the voluntary movement in a myopic fashion which is currently a significantly important unmet need for rehabilitative and assistive technologies designed for patients with pathological tremor.
Stefan Schneider
University of Guelph
Deep Learning Object Detectors and Similarity Comparison for Animal Re-Identification from Ecological Camera Trap Data
The ability of a researcher to re-identify (re-ID) an individual animal upon reencounter is fundamental for addressing a broad range of questions in the study of ecosystem function, community and population dynamics, and behavioural ecology. Tagging animals during mark and recapture studies is the most common method for reliable animal re-ID however camera traps are a desirable alternative, requiring less labour, much less intrusion, and prolonged and continuous monitoring into an environment. Despite these advantages, the analyses of camera traps and video for re-ID by humans are criticized for their biases related to human judgment and inconsistencies between analyses. For decades ecologists with expertise in computer vision have successfully utilized feature engineering to extract meaningful features from camera trap images to improve the statistical rigor of individual comparisons and remove human bias from their camera trap analyses. Recent years have witnessed the emergence of deep learning systems which have demonstrated the accurate re-ID of humans based on image and video data with near perfect accuracy. Despite this success, ecologists have yet to utilize these approaches for animal re-ID. By utilizing novel deep learning methods for object detection and similarity comparisons, ecologists can extract animals from an image/video data and train deep learning classifiers to re-ID animal individuals beyond the capabilities of a human observer. This methodology will allow ecologists with camera/video trap data to re-identify individuals that exit and re-enter the camera frame. Our expectation is that this is just the beginning of a major trend that could stand to revolutionize the analysis of camera trap data and, ultimately, our approach to animal ecology.
Steven Cheng-Xian Li
UMass Amherst
Learning from incomplete data with generative adversarial networks
Generative adversarial networks (GANs) have been shown to provide an effective way to model complex distributions and have obtained impressive results on various challenging tasks. However, typical GANs require fully-observed data during training. In this poster, we present a modular approach to learning GANs from incomplete observations that can be combined with different generator and discriminator networks and is amenable for use with complex, high-dimensional inputs. The proposed framework learns a complete data generator along with a mask generator. We further demonstrate how to impute missing data by equipping our framework with an adversarially trained imputer. We evaluate the proposed framework with several types of missing completely at random missing data processes.
Thiago Pereira Bueno
University of São Paulo, Brazil
Deep Probabilistic Planning
Deep Probabilistic Planning combines optimization methods in deep neural nets with ideas from probabilistic planning to solve mixed continuous discrete sequential decision-making problems. It is based on the observation that Deep Learning is important not only for its learning contributions, but also for its opportunities to non-convex optimization in model-based settings. The basic approach is to formulate a planning problem as an optimization task defined over a stochastic computation graph and to solve it by gradient-based parameter search methods, such as RMSProp. In this work, we propose: (i) to extend previous works on planning through back-propagation to the more general case of stochastic transitions; (ii) to develop model-embedded extensions of stochastic Recurrent Neural Nets with Deep Reactive Policies; and (iii) to investigate the applicability of techniques of model-based planning to accelerate convergence.
Tristan Deleu
MILA
On the reproducibility of gradient-based Meta-Reinforcement Learning baselines
Meta-learning provides an appealing solution to the data-efficiency issue inherent in both deep supervised learning and (model-free) deep reinforcement learning. The diversity of tasks available in supervised meta-learning and meta-reinforcement learning enabled the fast progress we are recently observing in this field, since one can easily compare a new meta-learning method to existing algorithms. In this paper, we revisit one of these baselines on two basic meta-reinforcement learning problems: the multi-armed bandits and tabular MDPs. We provide updated results for MAML applied to these two problems, and show that MAML compares favorably to more recent meta-learning approaches, contrary to what was previously reported. Along with this baseline, we also include some new results on the same tasks for Reptile, a first-order meta-learning approach.
Tsung-Yu Lin
University of Massachusetts Amherst
Second-order Democratic Aggregation
Aggregated second-order features extracted from deep convolutional networks have been shown to be effective for texture generation, fine-grained recognition, material classification, and scene understanding. In this paper we study a class of orderless aggregation functions designed to minimize interference or equalize contributions in the context of second-order features and show that they can be computed just as efficiently as their first-order counterparts and have favorable properties over aggregation by summation. Another line of work has shown that matrix power normalization after aggregation can significantly improve the generalization of second-order representations. We show that matrix power normalization implicitly equalizes contributions during aggregation thus establishing a connection between matrix normalization techniques and prior work on minimizing interference. Based on the analysis we present gamma-democratic aggregators that interpolate between sum (gamma=1) and democratic pooling (gamma=0) outperforming both on several classification tasks. Moreover unlike power normalization the gamma-democratic aggregations can be computed in a low dimensional space using sketching allowing the use of very high-dimensional second-order features. This results in a state-of-the-art performance on several datasets.
William A Falcon
Columbia University
Neural Networks for Efficient Bayesian Decoding of Natural Images from Retinal Neurons
Decoding sensory stimuli from neural signals can be used to reveal how we sense our physical environment, and is valuable for the design of brain-machine interfaces. However, existing linear techniques for neural decoding may not fully reveal or exploit the fidelity of the neural signal. Here we develop a new approximate Bayesian method for decoding natural images from the spiking activity of populations of retinal ganglion cells (RGCs). We sidestep known computational challenges with Bayesian inference by exploiting artificial neural networks developed for computer vision, enabling fast nonlinear decoding that incorporates natural scene statistics implicitly. We use a decoder architecture that first linearly reconstructs an image from RGC spikes, then applies a convolutional autoencoder to enhance the image. The resulting decoder, trained on natural images and simulated neural responses, significantly outperforms linear decoding, as well as simple point-wise nonlinear decoding. Additionally, the decoder trained on natural images performs nearly as accurately on a subset of natural stimuli (faces) as a decoder trained specifically for the subset, a feature not observed with a linear decoder. These results provide a tool for the assessment and optimization of retinal prosthesis technologies, and reveal that the neural output of the retina may provide a more accurate representation of the visual scene than previously appreciated.
Xiaoxia Wu
The University of Texas at Austin
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine tune parameters such as the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing strong theoretical guarantees in batch and stochastic setting, for the convergence of AdaGrad over smooth, nonconvex landscapes, from any initialization of the stepsize, \emph{without knowledge of Lipschitz constant of the gradient}. We show in the stochastic setting that AdaGrad converges to a stationary point at the optimal $O(1/\sqrt{N})$ rate (up to a $\log(N)$ factor), and in the batch setting, at the optimal $O(1/N)$ rate. Moreover, in both settings, the constant in the rate matches the constant obtained as if the variance of the gradient noise and Lipschitz constant of the gradient were known in advance and used to tune the stepsize, up to a logarithmic factor of the mismatch between the optimal stepsize and the stepsize used to initialize AdaGrad. In particular, our results imply that AdaGrad is robust to the unknown Lipschitz constant and level of stochastic noise on the gradient, in a near-optimal sense. When there is noise, AdaGrad converges at the rate of $O(1/\sqrt{N})$ with well-tuned stepsize, and when there is no noise, the same algorithm converges at the rate of $O(1/N)$ like well-tuned batch gradient descent.
Zafarali Ahmed
Mila, McGill University and Google Brain
What makes a good policy optimization algorithm?
A good policy optimization algorithm converges quickly to optimal behaviour as measured by the expected discounted return (ER). It is unclear how recent policy-based methods improve upon each other: is it variance reduction during gradient estimation or better conditioning of the objective function? In this work, we study the loss surfaces of ER using visualization techniques from deep learning. In the sequential decision-making framework, we find that local maxima and plateaus can separate optimal and suboptimal policies even in simple Markov decision processes (MDP). We then investigate the negative log-likelihood as a surrogate objective to ER in the bandit setting. We plan to show that our observations generalize to domains like CartPole. These preliminary studies into the characterizations of the ER can help us understand the significance of surrogate objectives that penalize significant changes in policies or encourage exploration using entropy regularized objectives.

## Thursday, August 2, 2018 – 4:30PM – 6:00PM

Presenter's Name
Affiliated Organization
Poster Title
Revised Abstract
Alaa Eddin Alchalabi
University of Ottawa
A multi-agent Deep Reinforcement Learning with semi-decentralized networked agents
We propose a multi-agent Deep Reinforcement Learning with semi-decentralized networked agents to exploit the idle time of smart devices (workers) to perform computational tasks such as, hosting gaming sessions. Workers advertise their availability and agents collect data such as RTT delay time, processing capability, bandwidth, geolocation, network traffic and so on. Moreover, each agent makes individual decisions based on both local observations and the past experiences passed by the neighbors over the network. This decentralization would help agents, especially the newly jointed ones, to adjust their learning parameters which can lead to faster convergence. The head-agent which manages the distribution of the learning experience would be static for a very short period, afterwards, another agent would serve as the head to lead. The leadership assignment would then rotate to maintain decentralization. The future work will assess the feasibility of the implementation over a blockchain.
Aliakbar Gorji Daronkolaei
Independent Researcher
Iterative Policy Gradient for Observer Trajectory Planning with Application in multiple-target Tracking
Tracking multiple moving targets with bearings-only measurement is a challenging task, due to the inherent difficulties in determining the correct trajectory of the observer that meets observability conditions. The Observer Trajectory Planning (OTP) problem has been addressed in the literature as a model-based optimal stochastic control task and simulation-based sampling techniques have been used to solve the optimization problem. The work presented here formulates OTP as a continuous control problem, and proposes reinforcement learning as a solution. This work introduces a framework that performs for single-target tracking, and builds upon it, a novel, decentralized architecture for dealing with multiple-target tracking scenarios. The proposed architecture in this work constitutes a model-free framework that allows for the estimation of the states of targets, and that allows multiple targets to be tracked in a realistic scenario, where the agent has no prior information about the initial locations and velocities of the targets. The simulation results verify the superiority and robustness of the architecture in dealing with the OTP in real multiple-target scenarios with false-alarm.
Amy Zhang
Decoupling Dynamics and Reward for Transfer Learning
Current reinforcement learning (RL) methods can successfully learn single tasks but often generalize poorly to modest perturbations in task domain or training procedure. In this work, we present a decoupled learning strategy for RL that creates a shared representation space where knowledge can be robustly transferred. We separate learning the task representation, the forward dynamics, the inverse dynamics and the reward function of the domain, and show that this decoupling improves performance within the task, transfers well to changes in dynamics and reward, and can be effectively used for online planning. Empirical results show good performance in both continuous and discrete RL domains.
Arash Tavakoli
Imperial College London
Prioritized Starting States
Reinforcement learning from diverse starting states is known to promote more robust policies with better generalization, faster training, and higher performance. Nevertheless, often the environment is unknown a priori and, thus, designating such starting states may not be straightforward. In this paper, we propose a method to diversify the starting states by storing the agent's previously encountered states in a buffer. As the agent explores the environment, it increases the diversity of the stored states. Given access to a generative model of the environment, we are then able to effectively modify the start state distribution with one that is based on the buffer. By prioritizing the buffered initial states, we can bias the agent's experiences towards more relevant regions, effectively leading to emergence of an adaptive curriculum of initial states. We demonstrate empirically that our approach improves the performance of the Proximal Policy Optimization (PPO) algorithm on numerous benchmarks. Notably, we show that by prioritizing the starting states from past good trajectories, our approach can significantly improve the performance of PPO in sparse reward problems.
Arushi Jain
Reasoning and Learning Lab (RLLab), McGill University
Safe Option-Critic: Learning Safety in the Option-Critic Architecture
Designing hierarchical reinforcement learning algorithms that induce a notion of safety is not only vital for safety-critical applications, but also, brings better understanding of an artificially intelligent agent's decisions. While learning end-to-end options automatically has been fully realized recently, we propose a solution to learning safe options. We introduce the idea of controllability of the states based on the temporal difference errors in the option-critic framework. We then derive the policy-gradient theorem with the new objective function and propose a novel framework called safe option-critic. We demonstrate the effectiveness of our approach in the four-rooms grid-world, cartpole, and three games in the Arcade Learning Environment (ALE). Learning end-to-end options with the proposed notion of safety achieves reduction in the variance of return and boosts performance in the environments with intrinsic variability in the reward. More importantly, the proposed algorithm outperforms vanilla options in all the environments and primitive actions in two out of three ALE games.
Ben Lansdell
University of Pennsylvania
Spiking allows neurons to estimate their causal effect
Learning in biological neural networks is more challenging than in artificial networks for at least two reasons: first, neurons spike, and spiking is not differentiable; and second, a neuron's causal effect on a reward signal can be confounded by noise correlations shared among other neurons. However a popular technique in economics, regression discontinuity design (RDD), estimates causal effects using thresholded discontinuities analogous to the integration and thresholded response of a neuron. Here we propose a learning rule that implements RDD within a simple network of leaky integrate and fire neurons. In this way a neuron's spiking threshold is able to reveal the influence of a neuron's activity on performance and enable unconfounded learning. The method combines RDD with a policy gradient method to maximize expected reward of the network. These results indicate a link between simple learning rules and economics-style causal inference.
Di Wu
McGill Reasoning and Learning Lab
Transfer Deep Reinforcement Learning for Home Energy Management
Smart grids are advancing the management efficiency and security of power grids with the integration of energy storage, distributed controllers, and advanced meters. In particular, with the increasing prevalence of automation devices and distributed renewable energy generation, residential energy management is now drawing more attention. Meanwhile, the increasing adoption of electric vehicles (EVs) brings further challenges and opportunities for smart residential energy management. This paper formalizes residential energy management with EV charging as a Markov Decision Process and proposes reinforcement learning (RL) based control algorithms to address it. The objective of the proposed algorithms is to minimize the long-term operating cost. Another practical issue for this application is that there may be very limited data available to train a reliable control policy for a new house, while at the same time we may have a large amount of data from other houses. We propose a transfer learning based algorithm which can utilize knowledge learned from multiple source houses to assist in the learning of control policies for new houses. Experimental results on real-world data show that the proposed algorithms can significantly reduce the operating cost and peak power consumption compared to baseline control algorithms.
Dylan Robert Ashley
Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return
Temporal-difference (TD) learning methods are widely used in reinforcement learning to estimate the expected return for each state, without a model, because of their significant advantages in computational and data efficiency. For many applications involving risk mitigation, it would also be useful to estimate the variance of the return by TD methods. In this paper, we describe a way of doing this that is substantially simpler than those proposed by Tamar, Di Castro, and Mannor in 2012, or those proposed by White and White in 2016. We show that two TD learners operating in series can learn expectation and variance estimates. The trick is to use the square of the TD error of the expectation learner as the reward of the variance learner, and the square of the expectation learner’s discount rate as the discount rate of the variance learner. With these two modifications, the variance learning problem becomes a conventional TD learning problem to which standard theoretical results can be applied. Our formal results are limited to the table lookup case, for which our method is still novel, but the extension to function approximation is immediate, and we provide some empirical results for the linear function approximation case. Our experimental results show that our direct method behaves just as well as a comparable indirect method, but is generally more robust.
Glen Berseth
University of British Columbia
Model-Based Action Exploration for Learning Dynamic Motion Skills
We consider learning a forward dynamics model to predict the result, (s_{t+1}), of taking a particular action, (a), given a specific observation of the state, (s_{t}). With this model, we perform internal look-ahead predictions of outcomes and seek actions we believe have a reasonable chance of success. With the learned forward dynamics model, we can compute gradients in the action space similar to DPG. We use these gradients to modify the policy distribution and generate new exploratory actions, pushing the distribution towards actions that have a higher future discounted reward. This method alters the exploratory action space, thereby increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion and juggling.
Mahsa Kiani
University of New Brunswick
Generating Code from Design Mockups using Deep Learning
The purpose of this poster is to demonstrate the application of deep learning in automating front-end development. The inputs of the neural network are User Interface screenshots and previous markup, and the output is the next tag. The encoder creates image features and markup features, then the decoder takes the combined design and markup feature and generates a next tag feature. Image features are extracted using a convolutional neural network, which is pre-trained on ImageNet. Markup features are formed by running the word embeddings through the LSTM layer. The image and markup features are concatenated and are used by the decoder to predict the next tag. A traditional feedforward neural network maps the next tag feature to the final predictions.
Matteo Papini
Politecnico di Milano
We present a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving continuous Markov Decision Processes. Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. We show convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. We also suggest practical variants of SVRPG, and evaluate them on benchmark tasks.
Ulster University
Quantifying real-time decision uncertainty for exploration in deep reinforcement learning
Deep neural networks (DNNs) for reinforcement learning (RL) have recently gained significant traction, i.e. approximating Q-value functions (DQNs) and agent policies (Policy Networks). More recently, stochastic forward passes in DNNs with dropouts (dDNNs) have been used to create decision uncertainty and stochastically sample actions. In RL, learning requires random action selection (e.g. ∈-greedy, softmax function, or dDNNs). Thus, quantifying and utilizing model uncertainty can be crucial for learning. In this work, we quantify decision uncertainty in dDNN actions to control exploration in Policy Networks. We found faster convergence and higher total reward with uncertainty-modulated softmax temperature in the classic cart-pole problem. Ongoing investigation includes application to other state-of-the-art algorithms e.g. DQNs and A3C.
Piotr Wojciech Semberecki
Wrocław University of Science and Technology, Tooploox
Multi-agent Reinforcement Learning in Real-time Strategy Games
In this work I present and compare state-of-the-art solutions for MARL (Multi-agent Reinforcement Learning). The key issue of this problem is to train agents to cooperate with each other to accomplish specific tasks, such as coordinated moves without collision. The aspect that is considered in my research is focused on micromanagement aspect of Starcraft, that requires from algorithm to control low-level actions of units, such as movement, attack orders to perform tasks such as hit and run tactics, coordinated cover attacks etc. In this work I compare approaches used for coordination and communication between units such as Master-Slave MARL, GMEZO, BicNet, and CommNet. In my research, I am focused mostly on model architectures, definitions of rewards and centralization that influence on the winning rate. I show the advantages and disadvantages of described approaches and potential directions in which these methods can be extended.
Rafid Mahmood
University of Toronto
Knowledge-based planning (KBP) is an automated approach to radiation therapy treatment planning that involves first predicting a desirable treatment plan, before correcting it to a deliverable one. We propose a generative adversarial network (GAN) approach to predicting desirable 3D dose distributions that eschews the previous paradigms of site-specific feature engineering and predicting low-dimensional representations of the plan. Experiments on a dataset of oropharyngeal cancer patients show that our approach significantly outperforms previous methods on several clinical satisfaction criteria and similarity metrics.
Reazul Hasan Russel
University of New Hampshire
Optimizing Ambiguity Sets for Robust MDPs Using Bayesian Bounds
An important problem in reinforcement learning is computing policies with sound worst-case guarantees even with limited available data. Robust Markov decision processes are a promising tool for achieving this goal, but distribution-free bounds that are currently used to construct their ambiguity sets from data lead to overly conservative decisions. In this paper, we propose a new approach to constructing less conservative ambiguity sets while preserving reliable performance guarantees. We use a Bayesian approach to leverage prior knowledge and optimize the shape of the ambiguity set rather than simply taking a confidence interval. Our theoretical results show soundness of the approach. Our empirical results demonstrate that the optimized ambiguity sets offer less conservative solutions than the plain Bayesian or distribution-free ones with the same worst-case guarantees.
Rodrigo Andrés Toro Icarte
University of Toronto and Vector Institute
Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning
In this work we propose Reward Machines - a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.
Samineh Bagheri and Markus Thill
TH Köln -- University of Applied Sciences
Reinforcement Learning for strategic Board Games with n-Tuple Systems
Learning complex game functions is a challenging task. We propose a technique which benefits from combination of temporal difference learning (TDL), a variant of the reinforcement learning approach, and n-tuple networks. A perfect-playing agent is trained just by self-play for the game Connect-4. Our agent is able, for the first time, to consistently beat the optimal-playing Minimax agent (in game situations where a win is possible). The ability of our agent in learning and selecting the important features among a large feature space, results in a powerful framework which can learn the game Connect-4 after a few hundred thousand games. We believe that the n-tuple network is an important ingredient for the overall success and we identify several aspects that are relevant for achieving high-quality results. The architecture is sufficiently general to be applied to similar reinforcement learning tasks as well. A similar approach is also applied on other games like dots and boxes and Hex.
Sebastian Oliver Blaes
Max Planck Institute for Intelligent Systems, Tuebingen, Germany
Intrinsically motivated exploration in RL
Exploration is essential in reinforcement learning (RL) in order to gain information about the unknown environment and reward. Typically, exploration is done via random actions, e.g. by a Gaussian distributed policy for each state in continuous action domains. In this way the randomness is independent of the agent-environment coupling. We believe that improving the exploration strategy to be more adapted to the system under control can make current (deep) RL methods learn efficiently. Works on intrinsically motivated exploration using information theory suggest some promising routes. It was shown that predictive information maximization can be implemented as a policy gradient method. This generates an embodiment-specific exploration which goes through many different behavioral modes. We use a controller designed based on this principle as the exploration strategy in off-policy RL methods, e.g. DDPG-HER.
Shagun Sodhani
MILA
Memory Augmented Asymmetric Self-Play
Asymmetric self-play is an unsupervised approach to train RL agents to explore the environment without any external rewards. Two variants of the same agent, Alice and Bob, play a game where Alice tries to find the simplest task that Bob cannot finish and Bob tries to finish the given task as soon as possible. One limitation of this formulation is that Alice has no explicit means of remembering what tasks she has already assigned to Bob and could assign similar tasks multiple times leading to wasteful self-play episodes. This can be overcome by augmenting Alice with a memory module that enables her to build her experience by remembering history across episodes. Empirical evidence suggests that agent pre-trained with memory augmented self-play outperforms agent pre-trained with self-play in both discrete and continuous environments. We also show that memory augmented self-play enables Alice to explore more parts of the environment and generate more diverse trajectories than just self-play.
Stephen Kelly
Dalhousie University
Emergent Tangled Program Graphs in Multi-Task Reinforcement Learning
We propose a genetic programming framework to address high-dimensional, multi-task reinforcement learning problems through emergent modularity. A bottom-up process is assumed in which multiple programs self-organize into collective decision-making entities, or teams, which then further develop into multi-team policy graphs, or Tangled Program Graphs. As such, both the nature of the decision-making policy and its complexity are adapted properties. In the Atari video game environment, the framework generally matched the quality of results from a variety of deep learning approaches. More importantly, it did so at a fraction of the sample and model complexity. In making a direct comparison between genetic programming and deep learning, this work serves as a starting point from which to consider potential hybrid methods, drawing out the strengths of both approaches to biologically-inspired machine learning.
Tony Zhang
Caltech
Complex Maze Foraging Behavior in Mice
While there has been considerable efforts in neuroscience to study rule-based and stimulus-response learning in mice, relatively little work has been done to study sequential decision-making in navigating highly complex environments. With behavioral tracking, we study the hierarchical structure of mice learning in foraging reward in complex maze environments with many branching points across multiple levels in the hierarchy. We quantify higher level behavioral features, such as reversals, distances to rewards, position in search tree, for groups of mice with different cortex structures. Preliminary analysis shows that mice engage in fairly efficient tree-search when navigating the maze and can learn to recall the learned cognitive map upon familiarity without necessarily engaging in reward-based learning.
Wouter Kool
University of Amsterdam