In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If is the hypothesis, and and are observations, conditional independence can be stated as an equality:
where is the probability of given both and . Since the probability of given is the same as the probability of given both and , this equality expresses that contributes nothing to the certainty of . In this case, and are said to be conditionally independent given , written symbolically as: .
The concept of conditional independence is essential to graphbased theories of statistical inference, as it establishes a mathematical relation between a collection of conditional statements and a graphoid.
Let , , and be events. and are said to be conditionally independent given if and only if and:
This property is often written: .
Equivalently, conditional independence may be stated as:
where is the joint probability of and given . This alternate formulation states that and are independent events, given .
The discussion on StackExchange provides a couple of useful examples. See below.^{[1]}
Each cell represents a possible outcome. The events , and are represented by the areas shaded red, blue and yellow respectively. The overlap between the events and is shaded purple.
The probabilities of these events are shaded areas with respect to the total area. In both examples and are conditionally independent given because:
but not conditionally independent given because:
Let the two events be the probabilities of persons A and B getting home in time for dinner, and the third event is the fact that a snow storm hit the city. While both A and B have a lower probability of getting home in time for dinner, the lower probabilities will still be independent of each other. That is, the knowledge that A is late does not tell you whether B will be late. (They may be living in different neighborhoods, traveling different distances, and using different modes of transportation.) However, if you have information that they live in the same neighborhood, use the same transportation, and work at the same place, then the two events are NOT conditionally independent.
Conditional independence depends on the nature of the third event. If you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of one dice will not tell you about the result of the second dice. (That is, the two dice are independent.) If, however, the 1st dice's result is a 3, and someone tells you about a third event  that the sum of the two results is even  then this extra unit of information restricts the options for the 2nd result to an odd number. In other words, two events can be independent, but NOT conditionally independent.
Height and vocabulary are dependent since very small people tend to be children, known for their more basic vocabularies. But knowing that two people are 19 years old (i.e., conditional on age) there is no reason to think that one person's vocabulary is larger if we are told that they are taller.
Two random variables and are conditionally independent given a third discrete random variable if and only if they are independent in their conditional probability distribution given . That is, and are conditionally independent given if and only if, given any value of , the probability distribution of is the same for all values of and the probability distribution of is the same for all values of . Formally:

(Eq.2) 
where is the conditional cumulative distribution function of and given .
Two events and are conditionally independent given a σalgebra if
where denotes the conditional expectation of the indicator function of the event , , given the sigma algebra . That is,
Two random variables and are conditionally independent given a σalgebra if the above equation holds for all in and in .
Two random variables and are conditionally independent given a random variable if they are independent given σ(W): the σalgebra generated by . This is commonly written:
This is read " is independent of , given "; the conditioning applies to the whole statement: "( is independent of ) given ".
If assumes a countable set of values, this is equivalent to the conditional independence of X and Y for the events of the form . Conditional independence of more than two events, or of more than two random variables, is defined analogously.
The following two examples show that neither implies nor is implied by . First, suppose is 0 with probability 0.5 and 1 otherwise. When W = 0 take and to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise. When , and are again independent, but this time they take the value 1 with probability 0.99. Then . But and are dependent, because Pr(X = 0) < Pr(X = 0Y = 0). This is because Pr(X = 0) = 0.5, but if Y = 0 then it's very likely that W = 0 and thus that X = 0 as well, so Pr(X = 0Y = 0) > 0.5. For the second example, suppose , each taking the values 0 and 1 with probability 0.5. Let be the product . Then when , Pr(X = 0) = 2/3, but Pr(X = 0Y = 0) = 1/2, so is false. This is also an example of Explaining Away. See Kevin Murphy's tutorial ^{[3]} where and take the values "brainy" and "sporty".
Two random vectors and are conditionally independent given a third random vector if and only if they are independent in their conditional cumulative distribution given . Formally:

(Eq.3) 
where , and and the conditional cumulative distributions are defined as follows.
Let p be the proportion of voters who will vote "yes" in an upcoming referendum. In taking an opinion poll, one chooses n voters randomly from the population. For i = 1, …, n, let X_{i} = 1 or 0 corresponding, respectively, to whether or not the ith chosen voter will or will not vote "yes".
In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X_{1}, …, X_{n} are independent random variables.
By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the nonexistence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned. In that model, the random variables X_{1}, …, X_{n} are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the next X to be observed will be equal to 1.
A set of rules governing statements of conditional independence have been derived from the basic definition.^{[4]}^{[5]}
The these rules were termed "Graphoid Axioms" by Pearl and Paz,^{[6]} because they hold in graphs, where is interpreted to mean: "All paths from X to A are intercepted by the set B".^{[7]}
Proof
A similar proof shows the independence of X and B.
Proof
The second condition can be proved similarly.
Proof
This property can be proved by noticing , each equality of which is asserted by and , respectively.
For strictly positive probability distributions,^{[5]} the following also holds:
Proof
By assumption:
Using this equality, together with the Law of total probability applied to :
Since and , it follows that .
Technical note: since these implications hold for any probability space, they will still hold if one considers a subuniverse by conditioning everything on another variable, say K. For example, would also mean that .
{{cite web}}
: Missing or empty url=
(help)