This section is aimed at students in upper secondary education in the Danish
school system, some objects will be simplified some details will be omitted.
Probability Space
A probability space is a mathematical construct associated with a random
process that consists of two objects:
1. A sample space Ω of the possible outcomes of the process.
2. A probability function
$$P:Ω\to[0,1]$$
that associates to each outcome
a probability.
Events
Events are sets of outcomes, so an event can be denoted as
$$A=\{ω_1,ω_2,\dots\}⊂Ω$$
and this allows us to extend the probability function from outcomes to
sets of outcomes in the following way
$$P(A)=\sum_{ω\in A}P(ω)=P(ω_1)+P(ω_2)+\cdots$$
This is actually why probability spaces usually include a third object,
the set of all the events, which the probability function is defined on.
This is an unnecessary abstraction in this setting.
This formula allows us to consider three interesting events, the empty
event, \(Ø\) inspired by the danish letter Ø, which has the property
that
$$P(Ø)=0$$
the whole sample space, with
$$P(Ω)=1$$
and the complementary event,
$$A^c=Ω\backslash A=\{ω\in Ω:ω\notin A\}$$
which are all the outcomes that are not in \(A\).
Set Theory
Events have several interesting properties that follow from set
theory. For starters, for two events \(A\) and \(B\) we can define
their union
$$A\cup B=\{ω\in Ω:ω\in A\vee ω\in B\}$$
i.e. all the outcomes that are in either \(A\) or \(B\), or both for that matter. We can also define
their intersection
$$A\cap B=\{ω\in Ω:ω\in A \wedge ω\in B\}$$
which are all the outcomes that are in both. We say that two events
are mutually exclusive if their intersection is empty, i.e.
\(A\cap B=Ø\) and \(P(A\cap B)=0\).
There are two sets that are always mutually exclusive, namely \(A\)
and its complementary event, \(A^c\). Lets consider two events that
are not mutually exclusive, then the probability of their union can be
expressed in the following terms
$$P(A\cup B)=P(A)+P(B)-P(A\cap B)$$
The explanation is that when we calculate the probability of \(A\) and
\(B\) seperately, we've counted their intersection twice, once for
each event. If the events are mutually exclusive, the last term is
just 0, so it's still valid, and the restriction is unnecessary. There
are two events that are mutually exclusive by construction, namely
\(A\) and its complementary event \(A^c\), which means that
$$1=P(Ω)=P(A)+P(A^c)⇒\boxed{P(A^c)=1-P(A)}$$
Uniform Probability Spaces
In a probability space where each outcome is equally likely, i.e.
they all have the same probability, consider that
\begin{align}
&&1=&P(Ω)=\sum_{ω\in Ω}P(ω)=|Ω|p\\
\implies
\end{align}
$$P(ω)=\frac{1}{|Ω|}$$
and
$$P(A)=\frac{|A|}{|Ω|}$$
where \(|A|\) denotes the "size", sometimes called the
"cardinality", of the event.
Proof
Assume that all the outcomes have the same probability \(p\).
Then I use the fundamental property of the probability function,
namely that something has to happen
which yields the formula by isolating \(p\). For the second part
I do something similar
$$\boxed{P(A)}=\sum_{ω\in A}P(ω)=|A|\frac{1}{|Ω|}
\boxed{=\frac{|A|}{|Ω|}}$$
Conditional Probability
If we consider the last result, you can actually extend that to
arbitrary probability spaces. Imagine having a geometric shape that
has area 1 that represents whe whole sample space, then you can
colour in the different outcomes corresponding to their
probabilities. Then the probability of an event is the total area
of the outcomes that it consists of. In this case we can consider
the size \(|A|\) to be its area in the diagram and when we divide
by \(|Ω|\) we're just dividing by 1. This the setting for
conditional probabilities.
The probability of an event on the condition of another event is
denoted as
$$P(A|B)=\frac{P(A\cap B)}{P(B)}$$
where what we've done is replaced the whole sample space with the
conditional event. So the conditional probability is the fraction
of the event that is "within" the conditional event.
Bayes' Theorem
We can isolate the intersection in the last equation to yield
$$P(A\cap B)=P(A|B)P(B)=P(B|A)P(A)$$
which is the basis for the relatively simple, but powerful, Bayes'
theorem, that states that
$$P(A|B)=P(B|A)\frac{P(A)}{P(B)}$$
A classic situation where we can apply Bayes' theorem is testing
for infectious disease. Within this context there is a concept
called sensitivity, which is the ability to correctly identify
infected people, this is \(P(I|+)\), the conditional probability
of being infected given a positive test. This number is usually
obtained experimentally, so this can be used to calculate the
probability of testing positive, given that you are infected by
Bayes' theorem
$$P(+|I)=P(I|+)\frac{P(I)}{P(+)}$$
where the probability of being infected can be estimated by several
parameters including the spread in the community and some health
parameters, and the probability of testing positive can potentially
be estimated by the number of people who actually test positive
with similar parameters to yours. There is another concept in the
field of testing for infectious disease which is specificity, which
is how likely you are to identify a non-infectious person, i.e.
\(P(I^c|-)\). This can then be used to estimate the probability of
testing negative without the disease
$$P(-|I^c)=P(I^c|-)\frac{P(I^c)}{P(-)}=P(I^c|-)
\frac{1-P(I)}{1-P(+)}$$