I’m a PhD candidate in the Department of Computing Science at University of Alberta. My supervisors are Csaba Szepesvári and András György.

I work on the theory and applications of statistical machine learning, in particular online learning, stochastic optimization, and computational learning theory. I am also interested in reinforcement learning and AI in general. My PhD research has been focused on theory and algorithms for learning and optimization in parallel and distributed environments.


In my PhD research, I have developed a general, unified theory for online learning and stochastic optimization in parallel, distributed, and perturbed environments. Here are some of my completed and on-going research projects.

Online Learning and Optimization with Delayed, Asynchronous, and Perturbed Feedback

Online, incremental machine learning algorithms can handle large data sets since they do not need to keep the whole data set in the main memory at all times. However, these algorithms are usually developed under an immediate-feedback assumption: the feedback from every prediction is assumed to be received before the next prediction is made. As such, using online learning algorithms in parallel and distributed environments where the feedback can be delayed (e.g., due to communication cost or latency) proposes a challenge for fast, large-scale machine learning.

In this project, I have studied the effect of delays, perturbations, and asynchronous feedback on the performance of online learning and stochastic optimization algorithms. My Master’s thesis focused on multi-armed bandit (MAB) problems and quantified the effect of delay on MAB algorithms. Generalizing the results in the thesis, in our ICML-2013 paper we proposed a unified formulation of online learning under delayed feedback, which enabled us to recover and extend, using two simple meta-algorithms, the previous sporadic results in the literature.

The meta-algorithms in our ICML-2013 paper required to create multiple instances of a non-delayed algorithm. Instead, in our AAAI-16 paper, we studied the performance of a single instance of a non-delayed optimization algorithm in a delayed-feedback setting. We quantified the performance of a large family of online optimization algorithms in terms of the algorithms’ /stability/. In particular, we instantiated the results on a form of AdaGrad to create a delay-adaptive algorithm, solving some open problems in delayed-feedback adaptive learning.


  • Online Learning under Delayed Feedback (Video of my talk at ICML)
    Pooria Joulani, András György, Csaba Szepesvári.
    International Conference on Machine Learning (ICML-2013), Atlanta, Georgia, June 2013.

    Abstract: Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.

  • Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms
    Pooria Joulani, András György, Csaba Szepesvári.
    Thirtieth AAAI Conference on Artificial Intelligence (AAAI-2013), Phoenix, Arizona, February 12–17, 2016.

    Abstract: We present a unified, black-box-style method for developing and analyzing online convex optimization (OCO) algorithms for full-information online learning in delayed-feedback environments. Our new, simplified analysis enables us to substantially improve upon previous work and to solve a number of open problems from the literature. Specifically, we develop and analyze asynchronous AdaGrad-style algorithms from the Follow-the-Regularized-Leader (FTRL) and Mirror- Descent family that, unlike previous works, can handle projections and adapt both to the gradients and the delays, without relying on either strong convexity or smoothness of the objective function, or data sparsity. Our unified framework builds on a natural reduction from delayed-feedback to standard (non-delayed) online learning. This reduction, together with recent unification results for OCO algorithms, allows us to analyze the regret of generic FTRL and Mirror-Descent algorithms in the delayed-feedback setting in a unified manner using standard proof techniques. In addition, the reduction is exact and can be used to obtain both upper and lower bounds on the regret in the delayed-feedback setting.


  • Multi-Armed Bandit Problems under Delayed Feedback

    Pooria Joulani. (Supervisor: Csaba Szepesvári)
    Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, September 2012.

    Abstract: In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the ‘‘stochastic’’ setting when the rewards come from a fixed distribution, an algorithm is given that uses a non-delayed MAB algorithm as a black-box. We also give a method to generalize the theoretical guarantees of non-delayed UCB-type algorithms to the delayed stochastic setting. Assuming the delays are independent of the rewards, we upper bound the penalty in the performance of these algorithms (measured by ‘‘regret’’) by an additive term depending on the delays. When the rewards are chosen in an adversarial manner, we give a black-box style algorithm using multiple instances of a non-delayed adversarial MAB algorithm. Assuming the delays depend only on time, we upper bound the performance penalty of the algorithm by a multiplicative factor depending on the delays.

Fast Cross-Validation for Sequential, Parallel, and Distributed Learning

A standard method for estimating the generalization performance of models learned by machine learning algorithms is -fold cross-validation (-CV). In many applications (e.g., in tuning hyper-parameters of the algorithm), -CV is performed multiple times during the learning process. However, do to its high computational cost (learning models), running -CV multiple times for large values of (e.g., leave-one-out CV) has not been practical on large data sets. Various existing methods are usually specialized to specific learning settings.

In this project, we design a simple, general method that greatly speeds up cross-validation for online, incremental learning algorithms. This method is also naturally applicable to parallel and distributed computing settings, where it efficiently computes the -CV score but avoids moving the data set over the network.


  • Fast Cross-Validation for Incremental Learning
    Pooria Joulani, András György, Csaba Szepesvári.
    International Joint Conference on Artificial Intelligence (IJCAI-2015), Buenos Aires, Argentina, July 2015.

    Abstract: Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.


I enjoy teaching and have been a Teaching Assistant at UofA. I have received UofA’s Graduate Teaching Award for excellence in teaching. Here is a list of the courses I have TAed:

  • Reinforcement Learning for Artificial Intelligence
  • Operating System Concepts
  • Introduction to Computing
  • Computer Ethics
  • Compiler Design

Professional Activities

I have been a reviewer at COLT, ICML, NIPS, ALT, IJCAI, IEEE TSP, and JMLR.

  • Sub-reviewer: Conference on Learning Theory (COLT), 2016.
  • Reviewer: Algorithmic Learning Theory (ALT), 2016.
  • Reviewer: International Conference on Machine Learning (ICML), 2016.
  • Reviewer: Neural Information Processing Systems (NIPS), 2015.
  • Reviewer: International Joint Conference on Artificial Intelligence (IJCAI), 2015.
  • Reviewer: IEEE Transactions on Signal Processing, 2013 and 2014.
  • Sub-reviewer: Journal of Machine Learning Research (JMLR), 2012.
  • Sub-reviewer: Conference on Learning Theory (COLT), 2012.


Here is some code I have written for my research. Send me an email if you are interested in any of them (or stay tuned; they should be available on GitHub sometime soon!)

  • Simulator for Multi-Armed Bandits with Delayed Feedback (for my M.Sc. Thesis).
  • A fast QP method for SVM-style outlier detection.
  • Code for the TreeCV algorithm (for our IJCAI-2015 paper).