Khashayar Khosravi
For getting the most uptodate list of my publications, please see my Google Scholar page.
Working Papers
Optimal and Greedy Algorithms for MultiArmed Bandits with Many Arms (Joint work with M. Bayati, N. Hamidi, and R. Johari) We characterize Bayesian regret in a stochastic multiarmed bandit problem with a large but finite number of arms. In particular, we assume the number of arms \(k\) is \(T^{\alpha}\), where \(T\) is the timehorizon and \(\alpha\) is in \((0,1)\). We consider a Bayesian setting where the reward distribution of each arm is drawn independently from a common prior, and provide a complete analysis of expected regret with respect to this prior. Our results exhibit a sharp distinction around \(\alpha = 1/2\). When \(\alpha < 1/2\), the fundamental lower bound on regret is \(\Omega(k)\); and it is achieved by a standard UCB algorithm. When \(\alpha > 1/2\), the fundamental lower bound on regret is \(\Omega(\sqrt{T})\), and it is achieved by an algorithm that first subsamples \(\sqrt{T}\) arms uniformly at random, then runs UCB on just this subset. Interestingly, we also find that a sufficiently large number of arms allows the decisionmaker to benefit from “free” exploration if she simply uses a greedy algorithm. In particular, this greedy algorithm exhibits a regret of \(\tilde{O}(\max(k,T/\sqrt{k}))\), which translates to a sublinear (though not optimal) regret in the time horizon. We show empirically that this is because the greedy algorithm rapidly disposes of underperforming arms, a beneficial trait in the manyarmed regime. Technically, our analysis of the greedy algorithm involves a novel application of the Lundberg inequality, an upper bound for the ruin probability of a random walk.
NonParametric Inference Adaptive to Intrinsic Dimension (Joint work with G. Lewis and V. Syrgkanis) We consider nonparametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension \(D\) of the conditioning variable is larger than the sample size \(n\), estimation and inference is feasible as long as the distribution of the conditioning variable has small intrinsic dimension \(d\), as measured by the doubling dimension. Our estimation is based on a subsampled ensemble of the \(k\)nearest neighbors \(Z\)estimator. We show that if the intrinsic dimension of the covariate distribution is equal to \(d\), then the finite sample estimation error of our estimator is of order \(n^{1/(d+2)}\) and our estimate is \(n^{1/(d+2)}\)asymptotically normal, irrespective of \(D\). We discuss extensions and applications to heterogeneous treatment effect estimation.
Matrix Completion Methods for Causal Panel Data Models, Github Repository (Joint work with S. Athey, M. Bayati, N. Doudchenko, and G. Imbens) A central tool in causal inference is treatment effect estimation from observational data. This analysis is challenging due to potential confounders. We consider the panel data models and motivated by the recent advances in matrixcompletion literature, we introduce and study a class of estimators that minimize the distance between the estimated matrix and the original matrix, while favoring less complex models. We prove consistency of this estimator by generalizing the existing results in the matrixcompletion literature to allow the patterns of missing data to include a dependency structure. We also present new insights concerning the connections between the interactive fixed effects models and the literatures on program evaluation under unconfoundedness as well as on synthetic control methods.
Published Papers
Mostly ExplorationFree Algorithms for Contextual Bandits (Joint work with H. Bastani and M. Bayati) Management Science (forthcoming). The contextual bandits literature usually focuses on algorithms that tradeoff between exploration and exploitation. Explorationfree greedy algorithms are very desirable in the settings where the exploration is costly or unethical (e.g., clinical trials), but they might be suboptimal in general. We prove that the greedy algorithms are rateoptimal for the twoarmed contextual bandit problem, if there is sufficient randomness in the covariates. Furthermore, even absent this condition, we prove that greedy algorithms are optimal with some positive probability. Motivated by this result, we introduce and study the GreedyFirst algorithm, which uses observed covariates and outcomes to determine whether to follow greedy algorithm or not. We prove that GreedyFirst is rateoptimal without any additional assumptions on the context distribution or the number of arms.
Multiclass classification, information, divergence and surrogate risk (Joint work with J.C. Duchi and F. Ruan) Annals of Statistics 46(6B):32463275, 2018. In multiclass classification, the decision maker aims to minimize the misclassification rate. This translates to the empirical zeroone loss minimization problem which is in fact NPhard. Therefore, instead of minimizing the zeroone loss, researchers usually focus on minimizing a surrogate loss and study when the resulting estimator is consistent. In many applications, making decisions based on raw data is undesirable as the data might be very highdimensional and carry useless information. In this setting, instead of only learning an estimator (or discriminant function) the decisionmaker also wishes to learn a quantizer. We characterize the equivalence between loss functions, meaning that optimizing either of two losses yields the same optimal discriminant and quantizer, complementing and generalizing the previous result by Nguyen et. al. in the binary setting. We also provide a unifying view of statistical information measures, multiway Bayesian hypothesis testing, loss functions for multiclass classification problems, and multidistribution \(f\)divergences.
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Joint work with H. Inan and R. Socher) ICLR 2017. Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling. However, all such models are based on the conventional classification framework, where the model is trained against onehot targets, and each word is represented both as an input and as an output in isolation. This causes inefficiencies in learning both in terms of utilizing all of the information and in terms of the number of parameters needed to train. We introduce a novel theoretical framework that facilitates better learning in language modeling, and show that our framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables. Our framework leads to state of the art performance on the Penn Treebank with a variety of network models.
Other Publications
Finding the Winner of a Hypothesis Test via Sample Allocation Optimization (Joint work with K. Modarresi) Procedia Computer Science, Elsevier, Vol. 108, 2017.
Steganographic Schemes with Multiple \(q\)ary Changes per Block of Pixels (Joint work with I. Gholampour) Elsevier Journal of Signal Processing, Vol. 108, March 2015.
Interpolation of Steganographic Schemes (Joint work with with I. Gholampour) Elsevier Journal of Signal Processing, Vol. 98, May 2014.
Dissertations
Learning, DecisionMaking, and Inference with Limited Experimentation, Doctoral Dissertation
Stanford University, September 2019.
