×

# How To Be Much Cleverer Than All Your Friends (so they really hate you)

### Part II: Being a Superbeing. Study Bayes, says Mike Alder. Cont. from Issue 51.

The promise of making you smarter is close to being kept. No short changing going on, no deception practised. Master the next part and you will be effectively smarter than Leibniz. For he knew there were rules for plausible reasoning, and even followed them, but he wasn’t able to say what they were. I’m shortly going to tell you.

We have, for any proposition B some value P(B) which is going to mean the extent to which we believe B. You can call P(B) the plausibility of the proposition B. And you can think of it as being a measure of the truth of B. It is true that you might assign one number to P(B) and I might assign a different one, but that might have been true even if we were only allowed to assign zero or one for false or true. Life is full of conflict. P(B) is required to be some number between 0 and 1 and to have the usual meaning of falsity and truth at the end points.

We have certain constraints. One is that in the cases where the values are in fact 0 or 1, we want to recover the usual rules of logic. We are trying to generalise Aristotelian logic, not make something totally different.

Another is a continuity assumption. Sometimes it makes sense to say that we can change a proposition continuously; for example ‘the length of this bar is x centimetres’ is a family of propositions depending on x, and changing continuously as x does. It is desirable that if for any x in some range, Bx is the above proposition, then P(Bx) changes continuously with x too. There should be no sudden jumps in the value of P(Bx) with x. When you cross the road, you don’t suddenly vanish from this side and reappear on the other, and degrees of belief shouldn’t change that way either. Of course, when some new information comes in that is substantial, then your value of P may change dramatically and quickly. But if the new information is almost identical to what you already knew, sudden jumps in P are unreasonable.

There are other reasonable properties we want our new logic to have; for example if we take a coin and toss it a thousand times and get five hundred Heads and an equal number of Tails, then we would expect that any sane assessment of P(B) where B is the statement “next time I toss this coin it will come down Heads,” ought to be, in the light of the data, in the vicinity of one half. Happily, all the reasonable properties anyone could want are forced by a very small number indeed. We don’t need to write down all the requirements we need. Given a modest few, the rest are guaranteed and follow by ordinary logic.

One thing we can do is to argue that if we have a proposition B, and we have a value P(B), then if A is some proposition that is logically equivalent to B, and we have P(A) then we ought to have:

RULE 1: If B and A are logically equivalent using classical logic, then P(B) = P(A).

Another thing we can argue is that if we have P(B) then we ought to be able to say what P(∼B) is. This should not depend on B: if B and A have the same truth value, that is if P(B) = P(A), then P(∼B) = P(∼A). So I shall give

RULE 2: P(∼B) is a continuous function of P(B)

You should feel free to brood over this and decide if some sort of confidence trick is being pulled here. I hope you will conclude that RULE 2 is reasonable. You might feel that we ought to come right out and put P(∼B) = 1-P(B), and this is certainly one possible continuous function, but let’s go with the weaker assumption.

Can we do the same thing with AND and OR? If I know the value of P(B) and the value of P(C) can I say what the value of P(B ∧ C) is? The answer to this is no, not if we want to have the interpretation we want. The example I give is from Ed Jaynes’ recent (and vastly entertaining) book Probability Theory, The Logic of Science, which has caused a certain stir in some quarters. Following Jaynes, we suppose there are fifty people in a room who have blue eyes and fifty who have brown eyes, and someone sends us one of them, picked by what rule we do not know. Then we might reasonably say that the proposition B, ‘The person’s right eye is blue’ has credibility value around one half. And the proposition C, ‘The person’s left eye is blue’ also has value about one half. Now the proposition B ∧ C says that both eyes are blue, which also has credibility value about one half. On the other hand if D is the proposition ‘The person’s left eye is brown’, then this also has value about one half, while B ∧ D, the statement that the person’s right eye is blue and his left eye is brown has credibility close to zero. So if we are to use our generalised logic to have the meaning of credibility, we conclude that the value of P(B ∧ C) depends on what B and C actually are, not just on their credibility values. Incidentally, Fuzzy Logic tries to actually force a value on B ∧ C which depends only on the value of B and the value of C, which tells us immediately that whatever Fuzzy Logic is about, credibility isn’t it.

We are similarly stopped if we try to work out P(B ∨ C) as a function of P(B) and P(C) only. We persevere however. We go to Modus Ponens, which is the law of inference

B
B → C
–––––
C

which says that if B is true and B implies C then we can safely deduce C. This can be turned into formal logic:

(B ∧ (B → C) ) → C

It can be made more symmetric by writing it in the equivalent form:

B ∧ (B → C) = B ∧ C

This says that to assert that B is true and that B implies C, is equivalent to asserting that both B and C are true. It is a tautology, a theorem of classical Aristotelian logic. The equals sign doesn’t mean that both sides are identical, it means that whenever the left side is true the right side is true, and vice versa.

Now the truth value of the left hand side should be equal to the truth value of the right hand side whenever both sides are logically equivalent by RULE 1. It is not too far fetched to believe that the truth value of B ∧ C is some continuous function of the truth value of B and the truth value of B → C. So I postulate

RULE 3: P(B ∧ C) is a continuous function of P(B) and P(B → C)

And it might occur to you that just multiplying the values would work nicely, giving the right answer in the extreme cases where the P values are either 0 or 1.

The above three rules are called the Cox axioms, after the physicist Richard Cox who wrote The Algebra of Probable Inference in 1960. It can be shown by somewhat messy algebra that if you accept these rules, then there is only one possibility for each of the functions; we must have P(∼C) = 1-P(C) and we must also have P(C ∧ B) = P(B → C) • P(B), where the dot in the right hand side means ordinary multiplication of the numbers.

One of the consequences of this is that we can define P for an implication:

P(B → C) = P(B ∧ C) / P(B)

Now this technically breaks one of our rules, the one that says that in the limiting case of probabilities being 0 or 1 we should reduce to classical logic. We run into the vexing problem of what happens to the truth of B → C when B is false.

This worries a lot of people when they first meet it. Philosophy students at university who embark on a course on Logic and are told that when B is false, B → C is true, are frequently baffled. One such, when assured by Bertrand Russell that this was so, challenged Russell to show that if 0 was equal to 1 then Bertrand Russell was the Pope. Russell proved it on the spot. I give a variant of his argument:

• If 0 = 1 then, by adding 1 to both sides we get 1 = 2, and by properties of = we deduce that 2 = 1.
• The set of people consisting of me and the Pope has two elements.
• Since 2 = 1 the set of people consisting of me and the Pope has one element.
• This can only happen if ‘me’ and ‘the Pope’ are different names for the same thing.
• So I am the Pope.

This is a valid argument, but many people feel unhappy about it. They feel even more unhappy about the claim that if you, the reader, are a tree then Bill Gates is a pauper and Microsoft is bankrupt. Nevertheless, this is a true statement according to the rules of Logic, although it is doubtful if even Bertand Russell could have provided a convincing proof. There is a strong feeling, shared by many, that A → B ought to mean something about the truth of A having something to do with the truth of B and in order to placate the unhappy, Russell chose to call this → material implication, with the suggestion that the complainer was thinking of some subtly different kind of implication. Well, maybe he was thinking of the new sort of implication given by

P(B → C) = P(B ∧ C) / P(B)

This is simply not defined when B is false. And it behaves in a rather reasonable manner when B is not false, ranging from when it is rather unlikely to when it is absolutely certain. Try putting in some values for B and C as statements and choose reasonable looking values for P(B) and P(C), and verify that your belief in P(B → C), defined as P(B ∧ C) / P(B), behaves sensibly. Many pleasant hours can be passed doing this for a variety of Bs and Cs. You will find that P(B → C) when defined this way does indeed behave in a reasonable manner, reflecting your feelings about your faith in P(B → C) in simple cases. Since it would be misleading to pretend that the two sorts of implication are the same when they aren’t, I shall use the modern notation and write P(C|B), read as ‘P of C given B’ in place of P(B → C). My position on implication is that this was the implication we ought to have had, because we shouldn’t have used Aristotelian Logic in the first place. Had Aristotle been a bit smarter, we could have saved a few thousand years of muddle by doing logic the proper way with a continuum of values from the beginning. All classical logic was any good for was the simplest kinds of arguments, anyway. And building computers.

We might as well go the whole hog and use the probability theory formalism for everything else: we get for the Modus Ponens law:

P(B ∧ C) = P(C|B) • P(B) which reads:

“P of both B and C is equal to P of C given B times P of B.”

In frequentist theory this is a definition of what is called ‘conditional probability’ but to us logicians it is just good old Modus Ponens in a generalised logic.

With the new improved implication, the three axioms given are sufficient to deduce all of Bayesian probability theory; you have to throw in Logic as well of course to nail down the extreme case. The version of probability theory you get if you follow this line of thought is called Bayesian Probability. It has to be said that there is a sort of religious war going on in universities between Bayesians and Frequentists, and the religion of Bayesianity has been steadily making more adherents. My own view is that it is perfectly respectable to choose whichever interpretation seems convenient, depending on the problem, and that one ought not to get dogmatic about these choices. I suspect that any problem that can be solved in one interpretation can be solved in the other, but for any problem, one is usually easier than the other (and sometimes a lot easier) to work in. The drawback of my approach is that it is considered vile heresy by both religions, but I’d rather be an apostate than a nong.

Some people find the Bayesian perspective more natural and easier to defend. Jaynes’ book mentioned above is a lovely, polemical defence of Bayesian thinking and one of the more interesting books of the millennium. Get your local library to order a copy. No, I don’t get a commission, I just think it’s a great book and an exciting read, and bashing through it is going to give you some power-thinking skills that will beat the hell out of anything that master Yoda ever came out with.

So I have come to the crux of the case. If you want to, you can learn Bayesian probability theory. Start with Jaynes, it shouldn’t take more than about ten years to finish the book, assuming you don’t waste time on anything else, making it excellent value for money. If you do, you will be acquiring the skill of thinking in a non-Aristotelian Logic, just as advertised. This will make it possible for you to solve problems that are currently beyond your powers to even state let alone solve. People who can reason in such a way about the world are readily employable and useful members of society: we call them statisticians.

I claimed that mastering a non-Aristotelian logic makes you smarter and able to see things lesser mortals cannot. An example would help at this point; you can see a small problem though: if you are still a lesser mortal, how will you see it? Still, I shall give one anyway; it deals with the expected lifetime of the human species. Papers have been written explaining that it is very likely that the human race will be extinct within a few thousand years. The argument is one which the simple minded non-Bayesian might find convincing, but which the Bayesian super-mind can penetrate easily and dispose of as a pile of dingo-droppings. Naturally, since you are not, as yet, a Bayesian super-mind, you won’t follow this – but you may get the flavour of it.

Imagine that you are given a box which is fixed on a desk top and has a button on top.

You are told that the box may contain either ten balls or a thousand balls. All the balls are the same except that one and only one has your name printed on it. You are asked to decide which box you have here, the thousand ball box or the ten ball box. All you can do is to press the button, and you are told that when you do, a ball will fall out of the box.

You reason that you have to press the button eleven times. If the eleventh button press produces a ball, then it must have been the thousand ball box, since the ten ball box wouldn’t have anything to produce. So far we have conventional Aristotelian type reasoning.

You press the button once and a ball comes out. You press it again and another ball comes out. You press it again and a third ball comes out – and this one has your name on it.

You can now make a pretty good guess as to which box you have. It is one hundred times as likely to be the ten ball box as the thousand ball box. This result should agree with your intuitions if you have any. The Bayesian can provide a justification for this very quickly – but this is easy and understandable only for superbeings and you aren’t one yet. You should, however, be able to see that getting your name up in the first three goes is not too improbable if there are only ten balls in the box but is awfully unlikely if there are a thousand. And if it is a hundred times as unlikely, then the ten ball explanation ought to be about a hundred times as believable. This is the intuitive, common sense approach. To a Bayesian, it is not just plausible it is blindingly obvious – although it requires some additional assumptions, which he or she can state precisely and you can’t. This is because as a result of using a powerful non-Aristotelian Logic, they are smarter than you. Annoying, isn’t it?

Now we come to the life of species. We accept the opinion of anthropologists that the human race has been in existence for less than a million years, and more than one hundred thousand. Just how long depends on how exactly you define human, so there is some unavoidable fluffiness about this time, but it does not affect the argument materially. Now consider the two possibilities: first that the human race will last another million years, and the second that humanity will be extinct within five thousand years.

In the first case, the total number of human beings who will ever have lived, a number which grows exponentially with time, becomes something colossal. The number in the first hundred thousand (or million) years is approximately twice the number currently on the planet, around six billion. If the present population level continues for a million years, a very modest assumption indeed, the total number who will ever have lived at the end of that time will be about ten to the power fourteen.

In the second case, with a lifetime of the species of five thousand years, the total number of human beings who will ever have existed is much smaller, only about fifty times as many as at present.

Now given these two possibilities, and given that you are alive at present, your name is on the ball, the lifetime for the species of five thousand years is much more likely than the lifetime of a million years. The probability that you would be here, right at the beginning, in the first fraction of a percent of all people, is obviously very small. The analogy with the boxes and balls is obvious and the same kind of reasoning gets you to the result. From which we deduce that the human race is likely to become extinct quite soon.

You may find this conclusion utterly convincing or totally unconvincing. Some people may be found who take it very earnestly indeed, others think the argument sucks. It has provoked a lot of debate, and many pages of sometimes heated writings can be found. The point I wish to make is that amateurish argument of the ‘It seems to me …’ sort is a waste of time. A Bayesian can dispose of it quite quickly and make the underlying assumptions explicit in both the case of the two boxes and the two lifetimes. A simple example of a problem that can lead the ordinary muddled human being into endless hours of debate with no clear end in sight, but where the properly trained Bayesian thinker can cut through it immediately. If you imagine an evil galactic overlord wishing to cause alarm and despondency by throwing the expected lifetime of the species at us, (“Har har, terran scum, you will be extinct soon anyway!!!”) and the beautiful girl falling into the arms of the man who can solve the problem in short order, you know what to do to collect the beautiful girl. If you don’t want a beautiful girl, preferring perhaps a handsome man or a few bottles of plonk, make the appropriate changes.

No, I won’t tell you the answer, unless you are a beautiful girl or a few bottles of plonk. If you want to dispose of the matter in a clean and compelling way, learn Bayesian probability theory and apply both your brains. You will also be able to solve a good many other more important problems.

Two final matters which might trouble the sceptic. First, is it possible that nobody can learn to be more intelligent, more competent by these means, but only that you have to be much cleverer than average to master the damned stuff in the first place? In other words that I am cheating you, the causality is the other way around. It is not that a training in probabilistic logic makes you smarter than average, it is that you have to be smarter than average in order to survive the training.

I am able to assure the sceptical reader that a close investigation of some of my colleagues who are professional statisticians has revealed no signs of innate intellectual superiority whatever. One cannot rule out the possibility that they are merely concealing superior minds, possibly in the hope of making more friends, but if so they are doing a very fine job of it.

The second worry is altogether a graver matter. Are there any side effects? I understand that being streched ten centimetres on the rack does indeed make you taller, or at least longer, but gives a certain languor to the personality. Ex-rackees are said to spend a lot of the time lying down and are slow off the mark when pursued by bears or vampires. Are there similar undesirable side effects of being put on the mental rack, being made to learn Bayesian probability theory?

It is hard to say. A comparison between those who have learnt orthodox probability theory and statistics and those who have done the Bayesian theory would seem to indicate that the former does indeed have much the same effect as being stretched on the rack. The victims are frequently pallid and harrassed looking, and low on humour and vivacity. They appear to have been trained beyond their natural intelligence. Bayesians on the other hand seem to be of a sunnier disposition, wittier and altogether better company. But this was a small sample and neither orthodox nor Bayesian statisticians would be inclined to build much on the data. You will just have to take your chances.