On Universal Simulation of Information Sources Using Training Data
Neri Merhav1, Marcelo J. Weinberger Information Theory Research Group HP Laboratories Palo Alto HPL-2002-263
September 20th , 2002*
E-mail: merhav@ee.technion.ac.il, marcelo@hpl.hp.com
random number generators, random process simulation, mutual
information, typical sequences
We consider the problem of universal simulation of an unknown random process, or information source, of a certain parametric family, given a training sequence from that source and given a limited budget of purely random bits. The goal is to generate another random sequence (of the same length or shorter), whose probability law is identical to that of the given training sequence, but with minimum statistical dependency (minimum mutual information) between the input training sequence and the output sequence. We derive lower bounds on the mutual information that are shown to be achievable by conceptually simple algorithms proposed herein. We show that the behavior of the minimum achievable mutual information depends critically on the relative amount of random bits and on the lengths of the input sequence and the output sequence.
While in the ordinary (non-universal ) simulation problem, the number of random bits per symbol must exceed the entropy rate H of the source in order to simulate it faithfully, in the universal simulation problem considered here, faithful preservation of the probability law is not a problem, yet the same minimum rate of H random bits per symbol is still needed to essentially eliminate the statistical dependency between the input sequence and the output sequence. The results are extended to more general information measures.
* Internal Accession Date Only Approved for External Publication 1
With the Electrical Engineering Department, Technion, Israel Institute of Technology, Technion City, Haifa, 32000, Israel. This work was done while visiting Hewlett-Packard Laboratories, Palo Alto, USA © Copyright Hewlett-Packard Company 2002