Boltzmann sampler

Summary

A Boltzmann sampler is an algorithm intended for random sampling of combinatorial structures. If the object size is viewed as its energy, and the argument of the corresponding generating function is interpreted in terms of the temperature of the physical system, then a Boltzmann sampler returns an object from a classical Boltzmann distribution.

The concept of Boltzmann sampler was proposed by Philippe Duchon, Philippe Flajolet, Guy Louchard and Gilles Schaeffer in 2004.[1]

Description edit

The concept of Boltzmann sampling is closely related to the symbolic method in combinatorics. Let   be a combinatorial class with an ordinary generating function   which has a nonzero radius of convergence  , i.e. is complex analytic. Formally speaking, if each object   is equipped with a non-negative integer size  , then the generating function   is defined as

 

where   denotes the number of objects   of size  . The size function is typically used to denote the number of vertices in a tree or in a graph, the number of letters in a word, etc.

A Boltzmann sampler for the class   with a parameter   such that  , denoted as   returns an object   with probability

 

Construction edit

Finite sets edit

If   is finite, then an element   is drawn with probability proportional to  .

Disjoint union edit

If the target class is a disjoint union of two other classes,  , and the generating functions   and   of   and   are known, then the Boltzmann sampler for   can be obtained as

 

where   stands for "if the random variable   is 1, then execute  , else execute  ". More generally, if the disjoint union is taken over a finite set, the resulting Boltzmann sampler can be represented using a random choice with probabilities proportional to the values of the generating functions.

Cartesian product edit

If   is a class constructed of ordered pairs   where   and  , then the corresponding Boltzmann sampler   can be obtained as

 

i.e. by forming a pair   with   and   drawn independently from   and  .

Sequence edit

If   is composed of all the finite sequences of elements of   with size of a sequence additively inherited from sizes of components, then the generating function of   is expressed as  , where   is the generating function of  . Alternatively, the class   admits a recursive representation   This gives two possibilities for  .

  1.  
  2.  

where   stands for "draw a random variable  ; if the value   is returned, then execute   independently   times and return the sequence obtained". Here,   stands for the geometric distribution  .

Recursive classes edit

As the first construction of the sequence operator suggests, Boltzmann samplers can be used recursively. If the target class   is a part of the system

 

where each of the expressions   involves only disjoint union, cartesian product and sequence operator, then the corresponding Boltzmann sampler is well defined. Given the argument value  , the numerical values of the generating functions can be obtained by Newton iteration.[2]

Labelled structures edit

Boltzmann sampling can be applied to labelled structures. For a labelled combinatorial class  , exponential generating function is used instead:

 

where   denotes the number of labelled objects   of size  . The operation of cartesian product and sequence need to be adjusted to take labelling into account, and the principle of construction remains the same.

In the labelled case, the Boltzmann sampler for a labelled class   is required to output an object   with probability

 

Labelled sets edit

In the labelled universe, a class   can be composed of all the finite sets of elements of a class   with order-consistent relabellings. In this case, the exponential generating function of the class   is written as

 

where   is the exponential generating function of the class  . The Boltzmann sampler for   can be described as

 

where   stands for the standard Poisson distribution  .

Labelled cycles edit

In the cycle construction, a class   is composed of all the finite sequences of elements of a class  , where two sequences are considered equivalent if they can be obtained by a cyclic shift. The exponential generating function of the class   is written as

 

where   is the exponential generating function of the class  . The Boltzmann sampler for   can be described as

 

where   describes the log-law distribution  .

Properties edit

Let   denote the random size of the generated object from  . Then, the size has the first and the second moment satisfying

  1.  
  2.  
  3.  .

Examples edit

Binary trees edit

The class   of binary trees can be defined by the recursive specification

 

and its generating function   satisfies an equation   and can be evaluated as a solution of the quadratic equation

 

The resulting Boltzmann sampler can be described recursively by

 

Set partitions edit

Consider various partitions of the set   into several non-empty classes, being disordered between themselves. Using symbolic method, the class   of set partitions can be expressed as

 

The corresponding generating function is equal to  . Therefore, Boltzmann sampler can be described as

 

where the positive Poisson distribution   is a Poisson distribution with a parameter   conditioned to take only positive values.

Further generalisations edit

The original Boltzmann samplers described by Philippe Duchon, Philippe Flajolet, Guy Louchard and Gilles Schaeffer[1] only support basic unlabelled operations of disjoint union, cartesian product and sequence, and two additional operations for labelled classes, namely the set and the cycle construction. Since then, the scope of combinatorial classes for which a Boltzmann sampler can be constructed, has expanded.

Unlabelled structures edit

The admissible operations for unlabelled classes include such additional operations as Multiset, Cycle and Powerset. Boltzmann samplers for these operations have been described by Philippe Flajolet, Éric Fusy and Carine Pivoteau.[3]

Differential specifications edit

Let   be a labelled combinatorial class. The derivative operation is defined as follows: take a labelled object   and replace an atom with the largest label with a distinguished atom without a label, therefore reducing a size of the resulting object by 1. If   is the exponential generating function of the class  , then the exponential generating function of the derivative class  is given by

 
A differential specification is a recursive specification of type
 

where the expression   involves only standard operations of union, product, sequence, cycle and set, and does not involve differentiation.

Boltzmann samplers for differential specifications have been constructed by Olivier Bodini, Olivier Roussel and Michèle Soria.[4]

Multi-parametric Boltzmann samplers edit

A multi-parametric Boltzmann distribution for multiparametric combinatorial classes is defined similarly to the classical case. Assume that each object   is equipped with the composition size   which is a vector of non-negative integer numbers. Each of the size functions   can reflect one of the parameters of a data structure, such as the number of leaves of certain colour in a tree, the height of the tree, etc. The corresponding multivariate generating function   is then associated with a multi-parametric class, and is defined as

 
A Boltzmann sampler for the multiparametric class   with a vector parameter   inside the domain of analyticity of  , denoted as

  returns an object   with probability

 

Multiparametric Boltzmann samplers have been constructed by Olivier Bodini and Yann Ponty.[5] A polynomial-time algorithm for finding the numerical values of the parameters   given the target parameter expectations, can be obtained by formulating an auxiliary convex optimisation problem[6]

Applications edit

Boltzmann sampling can be used to generate algebraic data types for the sake of property-based testing.[7]

Software edit

  • Random Discrete Objects Suite (RDOS): http://lipn.fr/rdos/
  • Combstruct package in Maple: https://www.maplesoft.com/support/help/Maple/view.aspx?path=combstruct
  • Haskell package Boltzmann Brain: https://github.com/maciej-bendkowski/boltzmann-brain

References edit

  1. ^ a b Duchon, Philippe; Flajolet, Philippe; Louchard, Guy; Schaeffer, Gilles (July 2004). "Boltzmann Samplers for the Random Generation of Combinatorial Structures". Combinatorics, Probability and Computing. 13 (4–5): 577–625. doi:10.1017/S0963548304006315. ISSN 0963-5483. S2CID 1634696.
  2. ^ Pivoteau, Carine; Salvy, Bruno; Soria, Michèle (November 2012). "Algorithms for combinatorial structures: Well-founded systems and Newton iterations". Journal of Combinatorial Theory, Series A. 119 (8): 1711–1773. arXiv:1109.2688. doi:10.1016/j.jcta.2012.05.007. ISSN 0097-3165.
  3. ^ Flajolet, Philippe; Fusy, Éric; Pivoteau, Carine (2007-01-06). "Boltzmann Sampling of Unlabelled Structures". 2007 Proceedings of the Fourth Workshop on Analytic Algorithmics and Combinatorics (ANALCO). Philadelphia, PA: Society for Industrial and Applied Mathematics: 201–211. doi:10.1137/1.9781611972979.5. ISBN 978-1-61197-297-9.
  4. ^ Bodini, Olivier; Roussel, Olivier; Soria, Michèle (December 2012). "Boltzmann samplers for first-order differential specifications". Discrete Applied Mathematics. 160 (18): 2563–2572. doi:10.1016/j.dam.2012.05.022. ISSN 0166-218X.
  5. ^ Bodini, Olivier Ponty, Yann. Multi-dimensional Boltzmann Sampling of Languages. OCLC 695180521.{{cite book}}: CS1 maint: multiple names: authors list (link)
  6. ^ Bendkowski, Maciej; Bodini, Olivier; Dovgal, Sergey (January 2018), "Polynomial tuning of multiparametric combinatorial samplers", 2018 Proceedings of the Fifteenth Workshop on Analytic Algorithmics and Combinatorics (ANALCO), Society for Industrial and Applied Mathematics, pp. 92–106, arXiv:1708.01212, doi:10.1137/1.9781611975062.9, ISBN 978-1-61197-506-2
  7. ^ Lampropoulos, Leonidas; Gallois-Wong, Diane; Hriţcu, Cătălin; Hughes, John; Pierce, Benjamin C.; Xia, Li-yao (2017-01-01). "Beginner's luck: A language for property-based generators". Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. POPL '17. New York, NY, USA: Association for Computing Machinery. pp. 114–129. arXiv:1607.05443. doi:10.1145/3009837.3009868. ISBN 978-1-4503-4660-3. S2CID 14378582.