I’m a PhD student in the Department of Computing Science at University of Alberta. My supervisors are Csaba Szepesvari and Andras Gyorgy.

I work on the theory and applications of statistical machine learning, in particular online learning and computational learning theory. I am also interested in reinforcement learning and AI in general.


The goal of my current research is to develop general, unified schemes for making machine learning algorithms more scalable. Here is a list of my completed and on-going research projects.

Online Learning and Optimization with Delayed and Asynchronous Feedback

Online, incremental machine learning algorithms can handle large data sets since they do not need to keep the whole data set in the main memory at all times. However, these algorithms are usually developed under an immediate-feedback assumption: the feedback from every prediction is assumed to be received before the next prediction is made. As such, using online learning algorithms in parallel and distributed environments where the feedbacks can be delayed (e.g., due to communication cost or latency), proposes a challenge for fast, large-scale machine learning.

In this project, I study the effect of delays on the performance of online learning and optimization algorithms. My Master’s thesis focuses on multi-armed bandit (MAB) problems and quantifies the effect of delay on MAB algorithms. Generalizing the results in the thesis, in our ICML-2013 paper we propose a unified formulation of online learning under delayed feedback, which enables us to recover and extend, using two simple meta-algorithms, the previous sporadic results in the literature.


  • Online Learning under Delayed Feedback
    Pooria Joulani, András György, Csaba Szepesvári.
    International Conference on Machine Learning (ICML-2013), Atlanta, Georgia, June 2013.

    Abstract: Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.


  • Multi-Armed Bandit Problems under Delayed Feedback

    Pooria Joulani. (Supervisor: Csaba Szepesvári)
    Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, September 2012.

    Abstract: In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the ‘‘stochastic’’ setting when the rewards come from a fixed distribution, an algorithm is given that uses a non-delayed MAB algorithm as a black-box. We also give a method to generalize the theoretical guarantees of non-delayed UCB-type algorithms to the delayed stochastic setting. Assuming the delays are independent of the rewards, we upper bound the penalty in the performance of these algorithms (measured by ‘‘regret’’) by an additive term depending on the delays. When the rewards are chosen in an adversarial manner, we give a black-box style algorithm using multiple instances of a non-delayed adversarial MAB algorithm. Assuming the delays depend only on time, we upper bound the performance penalty of the algorithm by a multiplicative factor depending on the delays.

Fast Cross-Validation for Sequential, Parallel, and Distributed Learning

A standard method for estimating the generalization performance of models learned by machine learning algorithms is -fold cross-validation (-CV). In many applications (e.g., in tuning hyper-parameters of the algorithm), -CV is performed multiple times during the learning process. However, do to its high computational cost (learning models), running -CV multiple times for large values of (e.g., leave-one-out CV) has not been practical on large data sets. Various existing methods are usually specialized to specific learning settings.

In this project, we design a simple, general method that greatly speeds up cross-validation for online, incremental learning algorithms. This method is also naturally applicable to parallel and distributed computing settings, where it efficiently computes the -CV score but avoids moving the data set over the network.


  • Fast Cross-Validation for Incremental Learning
    Pooria Joulani, András György, Csaba Szepesvári.
    International Joint Conference on Artificial Intelligence (IJCAI-2015), Buenos Aires, Argentina, July 2015.

    Abstract: Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.


I enjoy teaching and have been a Teaching Assistant at UofA. I have received UofA’s Graduate Teaching Award for excellence in teaching. Here is a list of the courses I have TAed:

  • Reinforcement Learning for Artificial Intelligence
  • Operating System Concepts
  • Introduction to Computing
  • Computer Ethics
  • Compiler Design

Professional Activities

I have been a reviewer at NIPS, COLT, IJCAI, IEEE TSP, and JMLR.

  • Invited reviewer: Neural Information Processing Systems (NIPS), 2015.
  • Invited reviewer: International Joint Conference on Artificial Intelligence (IJCAI), 2015.
  • Invited reviewer: IEEE Transactions on Signal Processing, 2013 and 2014.
  • Sub-reviewer: Journal of Machine Learning Research (JMLR), 2012.
  • Sub-reviewer: Conference on Learning Theory (COLT), 2012.


Here are some software packages I have written for my research.

  • Simulator for Multi-Armed Bandits with Delayed Feedback (for my MSc Thesis).
  • A fast QP method for SVM-style outlier detection.
  • Code for the TreeCV algorithm (for our IJCAI-2015 paper).