So recently I gave a talk at the H+ Summit in Los Angeles. However, I got the impression that the talk, which was about the fundamentals of Artificial General Intelligence (something I decided to call ‘foundations of AGI’) was not fully understood. I apologize to anyone in the audience who didn’t quite ‘get’ it, as the blame must fall upon the speaker in such instances. Although, in my defense, I had only 15 minutes to describe a series of arguments and philosophical threads that I had been musing over for a good few months🙂
If you haven’t seen the talk, and would like to watch it, here it is:
However, this article is written as a standalone resources, so don’t worry if you haven’t seen the talk.
What I would like to do is start exploring some of those issues on this blog. So, here is my attempt to describe the first of the points that I set out to try and explore in the talk. I’ve used a slightly modified argument, to try and complement the talk for those who have already seen it.
What do superintelligences really want?
S. Gildert November 2010
(Photo © Thomas Saur)
Humans are pretty intelligent. Most people would not argue with this. We spend a large majority of our lives trying to become MORE intelligent. Some of us spend nearly three decades of our lives in school, learning about the world. We also strive to work together in groups, as nations, and as a species, to better tackle the problems that face us.
Fairly recently in the history of man, we have developed tools, industrial machines, and lately computer systems to help us in our pursuit of this goal. Some particular humans (specifically some transhumanists) believe that their purpose in life is to try and become better than human. In practice this usually means striving to live longer, to become more intelligent, healthier, more aware and more connected with others. The use of technology plays a key role in this ideology.
A second track of transhumanism is to facilitate and support improvement of machines in parallel to improvements in human quality of life. Many people argue that we have also already built complex computer programs which show a glimmer of autonomous intelligence, and that in the future we will be able to create computer programs that are equal to, or have a much greater level of intelligence than humans. Such an intelligent system will be able to self-improve, just as we humans identify gaps in our knowledge and try to fill them by going to school and by learning all we can from others. Our computer programs will soon be able to read Wikipedia and Google Books to learn, just like their creators.
A perfect scientist?
But the design of our computer programs can be much more efficient in places where we, as humans, are rather limited. They will not get ‘bored’ in mathematics classes. They will work for hours on end, with no exhaustion, no fatigue, no wandering thoughts or daydreams. There would be no need for such a system to take a 2-hour lunch break, to sleep, or to worry about where its next meal will come from. The programs will also be able to analyze data in many more interesting ways than a human could, perhaps becoming a super-scientist. These programs will be far greater workers, far greater scholars, perhaps far greater citizens, than we could ever be.
It will be useful in analyzing the way such a machine would think about the world by starting with an analysis of humans. Why do humans want to learn things? I believe it is because there is a reward for doing so. If we excel in various subjects, we can get good jobs, a good income, and time to spend with others. By learning about the way the world works and becoming more intelligent, we can make our lives more comfortable. We know that if we put in the hard work, eventually it will pay off. There seem to be reward mechanisms built into humans, causing us to go out and do things in the world, knowing that there will be a payoff. These mechanisms act at such a deep level that we just follow them on a day-to-day basis – we don’t often think about why they might be there. Where do these reward mechanisms come from? Let’s take an example:
Why do you go to work every day?
To make money?
To pay for the education of your children?
To socialize and exchange information with your peers?
To gain respect and status in your organization?
To win prizes, to achieve success and fame?
I believe that ALL these rewards – and in fact EVERY reward – can be tied back to a basic human instinct. And that is the instinct to survive. We all want to survive and live happily in the world, and we also want to ensure that our children and those we care about have a good chance of surviving in the world too. In order to do this, and as our society becomes more and more complex, we have to become more and more intelligent to find ways to survive, such as those in the list above. When you trace back through the reasoning behind each of these things, when you strip back the complex social and personal layers, the driving motivations for everything we do are very simple. They form a small collection of desires. Furthermore, each one of those desires is something we do to maximize our chance at survival in the world.
So all these complex reward mechanisms we find in society are built up around simple desires. What are those desires? Those desires are to eat, to find water, to sleep, to be warm and comfortable, to avoid pain, to procreate and to protect those in our close social group. Our intelligence has evolved over thousands of years to make us better and better at fulfilling these desires. Why? Because if we weren’t good at doing that, we wouldn’t be here! And we have found more and more intelligent ways of wrapping these desires in complex reward mechanisms. Why do we obfuscate the underlying motivations? In a world where all the other members of the species are trying to do the same thing, we must find more intelligent, more complex ways of fulfilling these desires, so that we can outdo our rivals. Some of the ways in which we go about satisfying basic desires have become very complex and clever indeed! But I hope that you can see through that veil of complexity, to see that our intelligence is intrinsically linked to our survival, and this link is manifested in the world as these desires, these reward mechanisms, those things that drive us.
Building intelligent machines
Now, after that little deviation into human desires, I shall return to the main track of this article! Remember earlier I talked about building machines (computer systems) that may become much more intelligent than we are in the future. As I mentioned, the belief that this is possible is a commonly held view. In fact, most people not only believe that this is possible, but that such systems will self-improve, learn, and boost their own intelligence SO QUICKLY that once they surpass human level understanding they will become the dominant species on the planet, and may well wipe us out in the process. Such scenarios are often portrayed in the plotlines of movies, such as ‘Terminator’, or ‘The Matrix’.
I’m going to argue against this. I’m going to argue that the idea of building something that can ‘self-improve’ in an unlimited fashion is flawed. I believe there to be a hole in the argument. That flaw is uncovered when we try to apply the above analysis of desires and rewards in humans to machine intelligences. And I hope now that the title of this article starts to make sense – recall the famous experiments done by Pavlov  in which a dog was conditioned to expect rewards when certain things happened in the world. Hence, we will now try to assess what happens when you try to condition artificial intelligences (computer programs) in a similar way.
In artificial intelligence, just as with humans, we find that the idea of reward crops up all the time. There is a field of artificial intelligence called reinforcement learning , which is the idea of teaching a computer program new tricks by giving it a reward each time it gets something right. How can you give a computer program a reward? Well, just as an example, you could have within a computer program a piece of code (a mathematical function) which tries to maximize a number. Each time the computer does something which is ‘good’, the number gets increased.
The computer program therefore tries to increase the number, so you can make the computer do ‘good things’ by allowing it to ‘add 1’ to its number every time it performs a useful action. So a computer can discover which things are ‘good’ and which things are ‘bad’ simply by seeing if the value of the number is increasing. In a way the computer is being ‘rewarded’ for a good job. One would write the code such that the program was also able to remember which actions helped to increase its number, so that it can take those actions again in the future. (I challenge you to try to think of a way to write a computer program which can learn and take useful actions but doesn’t use a ‘reward’ technique similar to this one. It’s actually quite hard.)
Even in our deepest theories of machine intelligence, the idea of reward comes up. There is a theoretical model of intelligence called AIXI, developed by Marcus Hutter , which is basically a mathematical model which describes a very general, theoretical way in which an intelligent piece of code can work. This model is highly abstract, and allows, for example, all possible combinations of computer program code snippets to be considered in the construction of an intelligent system. Because of this, it hasn’t actually ever been implemented in a real computer. But, also because of this, the model is very general, and captures a description of the most intelligent program that could possibly exist. Note that in order to try and build something that even approximates this model is way beyond our computing capability at the moment, but we are talking now about computer systems that may in the future may be much more powerful. Anyway, the interesting thing about this model is that one of the parameters is a term describing… you guessed it… REWARD.
Changing your own code
We, as humans, are clever enough to look at this model, to understand it, and see that there is a reward term in there. And if we can see it, then any computer system that is based on this highly intelligent model will certainly be able to understand this model, and see the reward term too. But – and here’s the catch – the computer system that we build based on this model has the ability to change its own code! (In fact it had to in order to become more intelligent than us in the first place, once it realized we were such lousy programmers and took over programming itself!)
So imagine a simple example – our case from earlier – where a computer gets an additional ‘1’ added to a numerical value for each good thing it does, and it tries to maximize the total by doing more good things. But if the computer program is clever enough, why can’t it just rewrite it’s own code and replace that piece of code that says ‘add 1’ with an ‘add 2’? Now the program gets twice the reward for every good thing that it does! And why stop at 2? Why not 3, or 4? Soon, the program will spend so much time thinking about adjusting its reward number that it will ignore the good task it was doing in the first place!
It seems that being intelligent enough to start modifying your own reward mechanisms is not necessarily a good thing!
But wait a minute, I said earlier that humans are intelligent. Don’t we have this same problem? Indeed, humans are intelligent. In fact, we are intelligent enough that in some ways we CAN analyze our own code. We can look at the way we are built, we can see all those things that I mentioned earlier – all those drives for food, warmth, sex. We too can see our own ‘reward function’. But the difference in humans is that we cannot change it. It is just too difficult! Our reward mechanisms are hard-coded by biology. They have evolved over millions of years to be locked into our genes, locked into the structure of the way our brains are wired. We can try to change them, perhaps by meditation or attending a motivational course. But in the end, biology always wins out. We always seem to have those basic needs.
All those things that I mentioned earlier that seem to limit humans – that seem to make us ‘inferior’ to that super-intelligent-scientist-machine we imagined – are there for a very good reason. They are what drive us to do everything we do. If we could change them, we’d be in exactly the same boat as the computer program. We’d be obsessed with changing our reward mechanisms to give us more reward rather than actually being driven to do things in the world in order to get that reward. And the ability to change our reward mechanisms is certainly NOT linked to survival! We quickly forget about all those things that are there for a reason, there to protect us and drive us to continue passing on our genes into the future.
So here’s the dilemna – we either hard code reward mechanisms into our computer programs – which means they can never be as intelligent as we are – they must never be able to see or adjust those reward mechanisms or change them. The second option is that we allow the programs full access to be able to adjust their own code, in which case they are in danger of becoming obsessed with changing their own reward function, and doing nothing else. This is why I refer to as humans being self-consistent – we see our own reward function but we do not have access to our own code. It is also the reason why I believe super-intelligent computer programs would not be self-consistent, because any system intelligent enough to understand itself AND change itself will no longer be driven to do useful things in the world and to continue improving itself.
In the case of humans, everything that we do that seems intelligent is part of a large, complex mechanism in which we are engaged to ensure our survival. This is so hardwired into us that we do not see it easily, and we certainly cannot change it very much. However, superintelligent computer programs are not limited in this way. They understand the way that they work, can change their own code, and are not limited by any particular reward mechanism. I argue that because of this fact, such entities are not self-consistent. In fact, if our superintelligent program has no hard-coded survival mechanism, it is more likely to switch itself off than to destroy the human race willfully.
As this analysis stands, it is a very simple argument, and of course there are many cases which are not covered here. But that does not mean they have been neglected! I hope to address some of these problems in subsequent posts, as including them here would make this article way too long.
 – Pavlov’s dog experiment – http://en.wikipedia.org/wiki/Classical_conditioning
 – Reinforcement Learning – http://en.wikipedia.org/wiki/Reinforcement_learning
 – AIXI Model, M Hutter el el. – http://www.hutter1.net/official/index.htm