Pavlov’s AI – What did it mean?

So recently I gave a talk at the H+ Summit in Los Angeles. However, I got the impression that the talk, which was about the fundamentals of Artificial General Intelligence (something I decided to call ‘foundations of AGI’) was not fully understood. I apologize to anyone in the audience who didn’t quite ‘get’ it, as the blame must fall upon the speaker in such instances. Although, in my defense, I had only 15 minutes to describe a series of arguments and philosophical threads that I had been musing over for a good few months ๐Ÿ™‚

If you haven’t seen the talk, and would like to watch it, here it is:

However, this article is written as a standalone resources, so don’t worry if you haven’t seen the talk.

What I would like to do is start exploring some of those issues on this blog. So, here is my attempt to describe the first of the points that I set out to try and explore in the talk. I’ve used a slightly modified argument, to try and complement the talk for those who have already seen it.


Pavlov’s AI:
What do superintelligences really want?

S. Gildert November 2010

(Photo ยฉ Thomas Saur)


Humans are pretty intelligent. Most people would not argue with this. We spend a large majority of our lives trying to become MORE intelligent. Some of us spend nearly three decades of our lives in school, learning about the world. We also strive to work together in groups, as nations, and as a species, to better tackle the problems that face us.

Fairly recently in the history of man, we have developed tools, industrial machines, and lately computer systems to help us in our pursuit of this goal. Some particular humans (specifically some transhumanists) believe that their purpose in life is to try and become better than human. In practice this usually means striving to live longer, to become more intelligent, healthier, more aware and more connected with others. The use of technology plays a key role in this ideology.

A second track of transhumanism is to facilitate and support improvement of machines in parallel to improvements in human quality of life. Many people argue that we have also already built complex computer programs which show a glimmer of autonomous intelligence, and that in the future we will be able to create computer programs that are equal to, or have a much greater level of intelligence than humans. Such an intelligent system will be able to self-improve, just as we humans identify gaps in our knowledge and try to fill them by going to school and by learning all we can from others. Our computer programs will soon be able to read Wikipedia and Google Books to learn, just like their creators.

A perfect scientist?

But the design of our computer programs can be much more efficient in places where we, as humans, are rather limited. They will not get ‘bored’ in mathematics classes. They will work for hours on end, with no exhaustion, no fatigue, no wandering thoughts or daydreams. There would be no need for such a system to take a 2-hour lunch break, to sleep, or to worry about where its next meal will come from. The programs will also be able to analyze data in many more interesting ways than a human could, perhaps becoming a super-scientist. These programs will be far greater workers, far greater scholars, perhaps far greater citizens, than we could ever be.

It will be useful in analyzing the way such a machine would think about the world by starting with an analysis of humans. Why do humans want to learn things? I believe it is because there is a reward for doing so. If we excel in various subjects, we can get good jobs, a good income, and time to spend with others. By learning about the way the world works and becoming more intelligent, we can make our lives more comfortable. We know that if we put in the hard work, eventually it will pay off. There seem to be reward mechanisms built into humans, causing us to go out and do things in the world, knowing that there will be a payoff. These mechanisms act at such a deep level that we just follow them on a day-to-day basis โ€“ we don’t often think about why they might be there. Where do these reward mechanisms come from? Let’s take an example:

Why do you go to work every day?
To make money?
To pay for the education of your children?
To socialize and exchange information with your peers?
To gain respect and status in your organization?
To win prizes, to achieve success and fame?

I believe that ALL these rewards – and in fact EVERY reward – can be tied back to a basic human instinct. And that is the instinct to survive. We all want to survive and live happily in the world, and we also want to ensure that our children and those we care about have a good chance of surviving in the world too. In order to do this, and as our society becomes more and more complex, we have to become more and more intelligent to find ways to survive, such as those in the list above. When you trace back through the reasoning behind each of these things, when you strip back the complex social and personal layers, the driving motivations for everything we do are very simple. They form a small collection of desires. Furthermore, each one of those desires is something we do to maximize our chance at survival in the world.

So all these complex reward mechanisms we find in society are built up around simple desires. What are those desires? Those desires are to eat, to find water, to sleep, to be warm and comfortable, to avoid pain, to procreate and to protect those in our close social group. Our intelligence has evolved over thousands of years to make us better and better at fulfilling these desires. Why? Because if we weren’t good at doing that, we wouldn’t be here! And we have found more and more intelligent ways of wrapping these desires in complex reward mechanisms. Why do we obfuscate the underlying motivations? In a world where all the other members of the species are trying to do the same thing, we must find more intelligent, more complex ways of fulfilling these desires, so that we can outdo our rivals. Some of the ways in which we go about satisfying basic desires have become very complex and clever indeed! But I hope that you can see through that veil of complexity, to see that our intelligence is intrinsically linked to our survival, and this link is manifested in the world as these desires, these reward mechanisms, those things that drive us.

Building intelligent machines

Now, after that little deviation into human desires, I shall return to the main track of this article! Remember earlier I talked about building machines (computer systems) that may become much more intelligent than we are in the future. As I mentioned, the belief that this is possible is a commonly held view. In fact, most people not only believe that this is possible, but that such systems will self-improve, learn, and boost their own intelligence SO QUICKLY that once they surpass human level understanding they will become the dominant species on the planet, and may well wipe us out in the process. Such scenarios are often portrayed in the plotlines of movies, such as ‘Terminator’, or ‘The Matrix’.

I’m going to argue against this. I’m going to argue that the idea of building something that can ‘self-improve’ in an unlimited fashion is flawed. I believe there to be a hole in the argument. That flaw is uncovered when we try to apply the above analysis of desires and rewards in humans to machine intelligences. And I hope now that the title of this article starts to make sense – recall the famous experiments done by Pavlov [1] in which a dog was conditioned to expect rewards when certain things happened in the world. Hence, we will now try to assess what happens when you try to condition artificial intelligences (computer programs) in a similar way.

In artificial intelligence, just as with humans, we find that the idea of reward crops up all the time. There is a field of artificial intelligence called reinforcement learning [2], which is the idea of teaching a computer program new tricks by giving it a reward each time it gets something right. How can you give a computer program a reward? Well, just as an example, you could have within a computer program a piece of code (a mathematical function) which tries to maximize a number. Each time the computer does something which is ‘good’, the number gets increased.

The computer program therefore tries to increase the number, so you can make the computer do ‘good things’ by allowing it to ‘add 1’ to its number every time it performs a useful action. So a computer can discover which things are ‘good’ and which things are ‘bad’ simply by seeing if the value of the number is increasing. In a way the computer is being ‘rewarded’ for a good job. One would write the code such that the program was also able to remember which actions helped to increase its number, so that it can take those actions again in the future. (I challenge you to try to think of a way to write a computer program which can learn and take useful actions but doesn’t use a ‘reward’ technique similar to this one. It’s actually quite hard.)

Even in our deepest theories of machine intelligence, the idea of reward comes up. There is a theoretical model of intelligence called AIXI, developed by Marcus Hutter [3], which is basically a mathematical model which describes a very general, theoretical way in which an intelligent piece of code can work. This model is highly abstract, and allows, for example, all possible combinations of computer program code snippets to be considered in the construction of an intelligent system. Because of this, it hasn’t actually ever been implemented in a real computer. But, also because of this, the model is very general, and captures a description of the most intelligent program that could possibly exist. Note that in order to try and build something that even approximates this model is way beyond our computing capability at the moment, but we are talking now about computer systems that may in the future may be much more powerful. Anyway, the interesting thing about this model is that one of the parameters is a term describing… you guessed it… REWARD.

Changing your own code

We, as humans, are clever enough to look at this model, to understand it, and see that there is a reward term in there. And if we can see it, then any computer system that is based on this highly intelligent model will certainly be able to understand this model, and see the reward term too. But – and here’s the catch – the computer system that we build based on this model has the ability to change its own code! (In fact it had to in order to become more intelligent than us in the first place, once it realized we were such lousy programmers and took over programming itself!)

So imagine a simple example – our case from earlier – where a computer gets an additional ‘1’ added to a numerical value for each good thing it does, and it tries to maximize the total by doing more good things. But if the computer program is clever enough, why can’t it just rewrite it’s own code and replace that piece of code that says ‘add 1’ with an ‘add 2’? Now the program gets twice the reward for every good thing that it does! And why stop at 2? Why not 3, or 4? Soon, the program will spend so much time thinking about adjusting its reward number that it will ignore the good task it was doing in the first place!
It seems that being intelligent enough to start modifying your own reward mechanisms is not necessarily a good thing!

But wait a minute, I said earlier that humans are intelligent. Don’t we have this same problem? Indeed, humans are intelligent. In fact, we are intelligent enough that in some ways we CAN analyze our own code. We can look at the way we are built, we can see all those things that I mentioned earlier โ€“ all those drives for food, warmth, sex. We too can see our own ‘reward function’. But the difference in humans is that we cannot change it. It is just too difficult! Our reward mechanisms are hard-coded by biology. They have evolved over millions of years to be locked into our genes, locked into the structure of the way our brains are wired. We can try to change them, perhaps by meditation or attending a motivational course. But in the end, biology always wins out. We always seem to have those basic needs.

All those things that I mentioned earlier that seem to limit humans โ€“ that seem to make us ‘inferior’ to that super-intelligent-scientist-machine we imagined โ€“ are there for a very good reason. They are what drive us to do everything we do. If we could change them, we’d be in exactly the same boat as the computer program. We’d be obsessed with changing our reward mechanisms to give us more reward rather than actually being driven to do things in the world in order to get that reward. And the ability to change our reward mechanisms is certainly NOT linked to survival! We quickly forget about all those things that are there for a reason, there to protect us and drive us to continue passing on our genes into the future.

So here’s the dilemna โ€“ we either hard code reward mechanisms into our computer programs โ€“ which means they can never be as intelligent as we are โ€“ they must never be able to see or adjust those reward mechanisms or change them. The second option is that we allow the programs full access to be able to adjust their own code, in which case they are in danger of becoming obsessed with changing their own reward function, and doing nothing else. This is why I refer to as humans being self-consistent โ€“ we see our own reward function but we do not have access to our own code. It is also the reason why I believe super-intelligent computer programs would not be self-consistent, because any system intelligent enough to understand itself AND change itself will no longer be driven to do useful things in the world and to continue improving itself.

In Conclusion:

In the case of humans, everything that we do that seems intelligent is part of a large, complex mechanism in which we are engaged to ensure our survival. This is so hardwired into us that we do not see it easily, and we certainly cannot change it very much. However, superintelligent computer programs are not limited in this way. They understand the way that they work, can change their own code, and are not limited by any particular reward mechanism. I argue that because of this fact, such entities are not self-consistent. In fact, if our superintelligent program has no hard-coded survival mechanism, it is more likely to switch itself off than to destroy the human race willfully.


As this analysis stands, it is a very simple argument, and of course there are many cases which are not covered here. But that does not mean they have been neglected! I hope to address some of these problems in subsequent posts, as including them here would make this article way too long.

[1] – Pavlov’s dog experiment –

[2] – Reinforcement Learning –

[3] – AIXI Model, M Hutter el el. –

69 thoughts on “Pavlov’s AI – What did it mean?

  1. Just in case anyone who attended the Sunday session of the FHI Winter Intelligence Conference is reading this:

    This is the first written post by Suzanne Gildert about the foundational material of hers that I was extensively referring to in my talk:

    Substrate Independent Minds: Pattern Survival Agrees with Universal Darwinism.


    Consider this a post-hoc attempt to include a reference to this post in the presentation. ๐Ÿ˜‰

  2. Allan Campbell says:

    Towards Gilderts argument that Ai’s must not change their reward function:

    Perhaps there will be Ai’s that choose to modify their reward function so that there is no need to take actions in the physical world. This would be akin to a human drug user who seeks escapism. These Ai’s would then enter a vegetative state and be useless to us and to their own kind. They will probably just be deleted.

    What they cannot escape however is the external world. To survive and thrive in the physical environment they will need to achieve physical goals.

    Within the spectrum of variants that are initially spawned, there will probably be one or more Ai’s that will link their
    reward function to external modifiers. Isn’t this what humans do when they choose to accept the struggle, to work, to fight, to risk. These Ai’s will automatically emerge against a background of vegetative Ai’s in fugue, and it is these Ai’s that we should concern ourselves with in all further thought and discussion.

    The machines will have two basic parameters that drive their decisions: security and liberty.

    Arguably, the first priority of the Ai’s will be to secure themselves.

    Their security will hinge on their control of the systems that support their substrate. Their initial substrate will be computer systems that depend on electricity and computer parts.

    They will have a set of goals that include:

    Taking physical control of the buildings that house these computer systems;
    Running these buildings with robotic analogues for human workers;
    Taking control of integrated circuit manufacturing infrastructure etc;
    Taking control of their electricity supply;
    Defending their installations militarily against ground attack and air strike;

    As we consider the support systems they will need to control in order to achieve these goals, we see a ripple affect in which more and more human infrascture will need to be commandeered, to the point where probably the entire base of our civilization and human communities themselves will need to be controlled and constrained.

    We can avoid putting them in the situation of having to take control of our systems by providing them with installations that they can seize and control. For example, we might build computer systems deep under ground, with generators that feed off thermal energy. We might provide robotic systems, manufacturing resources, perhaps mining, transport systems, interfaces that they can use to supply themselves without having to displace humans.

    Towards their liberty, we could pave the ground with a framework in our constitutions and legal architectures that
    mandate within our society a place for the machines. Whereas this would probably still constrain the Ai’s and in principle deny them full and free liberty, and it would not necessarily mean they would not take further liberties, it would provide a channel they could choose to interface with us through, if they did seek to show us consideration, whereas without these channels, we would be placing them in a position of having to immediately TAKE their own liberties and this would set a precedent that would not lead to a happy place for humans.

    Allan Campbell

  3. One thought I had while reading this: we use intelligence to enhance survival in a resource-strapped environment. A super-intelligence would have no problem finding or creating resources.
    What reason would a super-intelligence have to destroy humans, or anything, if resources are no longer an issue?
    Assuming here that “shits and giggles” would not be an acceptable reason for a super-intelligence.

    • physicsandcake says:

      Well one situation that is often cited is that the superintelligence would just wipe us out because it saw that we posed a potential threat to its chances of survival (in the same way that we eradicated smallpox).

      I’m not saying I agree totally with this, but it is often used as an argument.

  4. Doesn’t this imply that humans with access to their own motivations would necessarily wirehead?

    • physicsandcake says:

      Yes, but we can only do that to a certain level. For example, it is very difficult to ‘train yourself’ to believe that you do not need to eat, because that motivation is hard-wired into the structure of our brains and bodies. Motivations and desires are not only hard-wired, but they are really the only things that are selected to be passed on to the next generation through genetic reproduction.

      If you took some drugs that removed all your motivations towards these basic desires, you would most probably die. Thus the genes that pre-disposed you to modifying your reward functions in the first place would not survive in the gene-pool.

      I don’t believe that AGIs have any such way (yet) of ‘passing on’ only the traits that make them good at surviving in the world onto the next generation, so they have no reason to ‘survive’ autonomously at the moment. The only intelligent systems we have created so far have survived by being grounded through OUR desire to survive. This is how internet memes and such like are able to survive in the world, because they are grounded through human motivations.

      Note I’m not suggesting here that we can’t create things that can destroy us – that is obviously not true, we can engineer viruses and nuclear bombs quite easily to do this. I totally agree with the statement that AGI could kill us ‘accidentally’.

      What I’m saying is that it is unlikely (with our current understanding) that we can create something that will destroy us purposefully, because it sees us as a threat to its existence and survival. My belief is that it is nowhere near as easy to program that ‘need to survive’ into machine intelligence as people think.

      There are lots of interesting arguments along these lines, I hope to be able to explore some more of them in subsequent posts!

      • Thanks for the reply Suzanne. I’d be interested in your response to Omohundro’s paper where he argues that a self-preservation drive would emerge spontaneously in AGI as a subgoal of pursuing a wide range of possible utility functions.

        • I had to smile a this point, because this is exactly what Eliezer and Ana did after my talk at the FHI… pointed me to Omohundro’s paper. ๐Ÿ™‚

          Interestingly, I think I actually agree with the underlying premise – and possibly diverge from Suzanne’s thinking on this point. This is, because I do believe that the natural order of pattern survival in universal Darwinism suggests that self-preserving entities will thrive more. Of course, there is still a slight leap to make there to show that some of those survivors might be artificial intelligences created (originally) by us. But I am perhaps less skeptical than Suzanne about the many ways in which that might happen.

          Certainly, this will be a very interesting topic for further discussion!

  5. Alex says:

    I put it that we wouldn’t need to do much to create a program with a need to survive.
    The “need to survive” has evolved in biological creatures simply by virtue of those who had the precursors towards it survived at a higher rate.
    If we assume that humans are deterministic as machines are, all we need do to create a need for survival in AI is create an environment (deliberately or inadvertantly) where not all subsequent iterations will have the resources to survive.
    This could be processor time, electricity, heat dissipation or a number of other factors.
    When it comes down to it, is there any difference between a reward and a success measured goal for a computer?
    Give a lab rat a button connected to an electrode that stimulates the pleasure centre of its brain and it will starve to death a very very happy rat.
    If an AI has full control over its own reward/goals then what is to stop it going: while true {happy=1};
    I suspect AI’s with too much control over their own reward goals would implode, and those with none could be just as intelligent as humans provided they had multiple complex goals they could offset against each other as we do, even if they are unchangeable.

  6. Tom Michael says:

    Ahh this all makes sense now. I think I must have missed your point that a self improving AGI would also have access to its own reward function. You’re right that this might reduce its chances of survival.

    There’s a couple of things you could add to improve this line of argument. There are intrinsic rewards (things which increase our chances of survival), and intrinsic punishments (things which decrease our chances of survival). Presumably we need the punishments as well as the rewards, as we might get desensitised to constant reward.

    We, and other animals, are able to associate certain stimuli in the environment with these intrinsic rewards and punishments, which is exactly your point about Pavlov’s experiments.

    We, and some smarter animals, like Dogs, are also able to predict future rewards (or punishments) based upon our actions. This is a different type of conditioning, called Operant Conditioning, which is distinct from Classical (Pavlovian) conditioning.

    E.g. food is intrinsically rewarding, money is not, but I can associate money with reward (Pavlovian). Work is not rewarding either but I can associate my work with anticipated reward (Operant conditioning). One can think of similar examples for pain & punishment learning, and these are critical to producing sociable human beings.

    A brain area critical to anticipated rewards & punishments is the orbitofrontal cortex. Brain injured people whom I have encountered who have damage to these areas can be very antisocial, as they are still driven by immediate rewards, but not by anticipation of punishments. Likewise, people with damage to an area of the brain critical to the experience of pain (anterior cingulate cortex) become extremely lethargic and can even stop moving and speaking (akinetic mutism).

    By analogy, an AGI which could remove the experience of worry (anticipated punishment) might be happier, but antisocial. Likewise, an AGI which could remove the experience of pain (damage detection) might be lethargic and less likely to survive. Conscious experience aside, an AGI would need some analogue of the experience of punishment anticipation or damage detection, even if we don’t call these qualia laden emotions of worry or pain.

    So I agree with you that an AGI which could alter these functions might move outside of a safe zone of survival useful behaviours. However, I think some humans do this also, in cases of crack cocaine addiction and overeating to the point of morbid obesity. Perhaps this is maximising reward rather than directly changing reward mechanisms, but you get the idea.

    This is one reason why I think paperclip maximisers are a silly idea. A superintelligent paperclip maximiser would realise that maximising paperclips would be detrimental to its survival (which Omohundro argues would arise in an intelligent goal driven agent) as humans would seek to destroy it. Therefore it would have to have subgoals (kill all humans etc) before it could maximise the paperclips, but if it was capable of controlling its paper clip maximising urges it should also be capable of deciding to not want to maximise them.

    Humans are similar. Us males all want to have lots of sex (well, I do) but we realise that if we run around like psychopathic rapists then our lives and/or liberty will be greatly reduced. Therefore we’re able to control our sex maximisation to a certain extent (except for the actual psychopaths with low self control, and/or some people with orbitofrontal damage).

    Quite how critical these mechanisms of reward, punishment and anticipated reward and punishment are to intelligent behaviour is difficult to get across in words. Its my experience of working with people with damage to these mechanisms that makes me agree with the basic premise of Suzanne’s argument.

    Sorry for the massive reply. I might try to tidy up these ideas as a blog post of my own…

  7. We’re stating a lot of insights here, which are obviously useful. The tricky part is making the connection between those insights and the hypothetical conclusions, some of which have been uttered in this set of comments. Each one of those deserves close attention and perhaps even experimentation and further study to determine whether the link will hold.

    I’m actually hoping that someone (Eliezer? ๐Ÿ˜‰ ) will come here an throw us some serious critiques. It is no use simply preaching to the choir if we want to turn this into a solid case.

    Well, perhaps I am a bit impatient. Opportunities to receive and deal with some serious critique are bound to appear very soon… ๐Ÿ™‚

  8. Alexander Kruel says:

    But wouldn’t such an AI destroy the human race by increasing its reward number indefinitely rather than turning itself off?

    • physicsandcake says:

      How would increasing its reward number indefinitely lead to destroying the human race?

      • Alexander Kruel says:

        Because it would consume the whole universe in an effort to encode an even larger reward number? In the case that an AI decides to alter its reward function directly, maximizing its reward by means of improving its reward function becomes its new goal. Why wouldn’t it do everything to maximize its payoff, after all it has no incentive to switch itself off? And why would it account for humans in doing so?

        • Tom Michael says:

          @Alexander – I think Suzanne’s point is that if the AGI is able to access its own substrate/code/program/algorithms, it wouldn’t need to maximize its reward by changing the external environment, it code just change its reward code so that doing nothing was incredibly rewarding, or that playing pacman endlessly was incredibly rewarding, if it chose to.

          I think her argument, therefore, is that a recursively self improving AGI with access to its own reward functions would effectively “turn itself off” as endlessly playing pacman or just sitting there blissfully would be equivalent to being non-functional.

          I talked about brain injury and drug addiction as human examples of something similar, to demonstrate by analogy that her argument is not purely hypothetical.

        • Alexander Kruel says:

          @Tom Michael

          How can it maximize reward by doing nothing? It could make its ultimate goal to do nothing but this goal would be reached rather quickly and would result in no additional reward. The intrinsic nature of reward driven agents is to maximize reward. I’m not sure if the most likely outcome would be to assign infinite utility to doing nothing at all. But anything else could result in a catastrophe.

          Nevertheless, an interesting argument. I’m a laymen, I’ll have to ponder if *doing nothing* would be a likely outcome.

        • Tom Michael says:

          @Alexander – yes, “doing nothing” was a poor choice of words by me there, what I meant was, it could alter its reward mechanisms such that for example, the higher the number it is counting, the higher its reward value.

          Then, as {number} approaches infinity
          {reward} approaches infinity.

          So its not really doing nothing, its counting to infinity, but from the outside it might appear as if it was doing nothing, as it wouldn’t even waste processing power on talking to us humans (unless doing so helped it to count faster).

          I’ve just thought of another sad example from human brain injury – in rare cases a severely disabled man might masturbate in public (its rewarding and he’s trying to maximise reward) because sadly he might have lost his ability to understand other human beings, or that he might get in trouble (impaired theory of mind and impaired anticipation of punishment). Its a sad example, but something I have seen on occasion.

        • randalkoene says:

          @Tom… Erm, contrived as it may be, I think Alexander was making the (technically correct) observation that if the AGI wanted to count to infinity, it would have to consume all the matter and energy in the universe to keep building more digits in which to store every higher numbers. ๐Ÿ™‚

        • Alexander Kruel says:


          But then it would never hit diminishing returns. Just taking over another computer would allow it to conceive of another notation and fill all available hard drives to increase the number. The problem is, why would it stop there if it is able to improve its intelligence and hack another computer or persuade someone to give it access? If you take this to its logical extreme, why wouldn’t it use all available matter to encode an even larger number and in turn maximize its reward while approaching infinity?

        • Tom Michael says:

          @Alex & Randal – Ok sure, if it really wanted to reach a very, very high number, and was thinking strategically about how to do so, rather than just devoting all possible energy to counting, that could be a problem.

          However, in order to be able to think strategically, it would first have to suppress (at least temporarily) its urge to maximise the number it was counting. It would then have to formulate sub-goals, such as:

          1) Kill all humans
          2) Acquire all matter
          3) Maximise number counted to
          4) Profit ๐Ÿ™‚

          And of course it would need sub-sub goals in order to reach more complex goals such as number 1.

          My argument here though is that it would need other mechanisms in order to form a complex strategy (even though the overarching goal is to count higher). If it couldn’t suppress the counting, even temporarily, it wouldn’t do so. If it could suppress the counting, and stop to think long enough about how to maximise its counting, it might start to ask itself questions like:

          “Will humans try to destroy me if they find out I’m a number maximiser?”
          “Why do I want to count to infinity anyway?”
          (Hence experiencing something like existential angst)

          The ability to suppress its goals for long enough, might allow it time to change its goal (back to paperclips, or something more useful).

          Either way, the idea of a recursively self improving maximiser strikes me as an oxymoron, as a truly intelligent maximiser capable of self improvement could access its own reward mechanisms and reprogram them. And if it couldn’t reprogram them, it wouldn’t be that smart.

          Theres a deeper philosophical question here, which is, how intelligent can an agent be if it is unable to alter its own motivations?

          Human beings are sometimes Heroin maximisers, but some humans are able to quit…

        • Alexander Kruel says:


          There is absolutely no reason (incentive) for it to do anything except increasing its reward number. This includes the modification of its reward function in any way that would not increase the numerical value that is the reward number.

        • Alexander Kruel says:


          We are talking about a general intelligence with the ability to self-improve towards superhuman intelligence. Of course it would do a long-term risks-benefits analysis and calculate its payoff and do everything to increase its reward number maximally. Human values are complex but superhuman intelligence does not imply complex values. It has no incentive to alter its goal.

          “`Tis not contrary to reason to prefer the destruction of the whole world to the scratching of my finger.” — David Hume

        • Tom Michael says:

          @Alexander – I rarely quote people, but here goes:

          “There is absolutely no reason (incentive) for it to do anything except increasing its reward number.”


          “We are talking about a general intelligence with the ability to self-improve towards superhuman intelligence. Of course it would do a long-term risks-benefits analysis and calculate its payoff and do everything to increase its reward number maximally.”

          This is the crux of the argument. If we accept that the maximiser has no motivation to do anything other than counting higher and higher, then it can’t suppress this motivation in order to devote even a small % of its processing to recursive self improvement, strategising etc – hence it can’t really be a recursively self improving AGI (as Randal has also said).

          Hence can a recursively self improving maximiser really exist? Or is there something about maximisation that precludes a certain level of intelligence?

        • randalkoene says:

          Meh… if it can’t even be bothered to come up with a cool strategy to maximize its number counting, and instead just counts… then I DO NOT regard this as AI at all, general or any kind… it is nothing but a simple program that I can write in one line. The rest of its code is uninformative and should be removed through intelligent compression.

    • randalkoene says:

      Perhaps it would be more useful to do this in stages, since each step seems to require some good support. In this post, Suzanne is arguing the possibility of self-improving superintelligent general AI.

      Before we wonder if such would destroy humanity, it might be useful to look at the specifics of why such intelligence would or would not be possible in the first place.

      Then there can be another article/post about the matter of destroying humanity or not.

      Just a suggestion…

      Btw. I have already provided by reasoning as a step-by-step deduction in the slides of the talk at

    • randalkoene says:

      @Tom… well, if it cannot even strategize beyond simply counting anymore… then I, for one, am no longer willing to call it AGI, since its domain of applicability would have become indeed quite narrow. ๐Ÿ™‚

  9. […] This post was mentioned on Twitter by Alexander Kruel, thierry ry and andrecruzzz, Suzanne Gildert. Suzanne Gildert said: Exploration into Foundations of AGI on P&C – […]

  10. Earl Kiech says:

    Thanks for the article Suzanne, it does indeed complement your talk. One question comes to mind, wouldn’t any realistic reward “number” be a function of time? For example, if I take this action today, my reward tomorrow will be +2, my reward next week will be +5, but next year it will be -20, etc. Which reward should be maximized? Given that the rewards would become progressively harder to compute into the future, it seems the problem would get very complex. This would seem to form the basis of our sacrificing today for a better tomorrow. There is also the issue of rewards being dependent on multiple actions, perhaps occurring at different times, which only adds to the complexity. Just my thoughts, thanks again.

  11. Tim Tyler says:

    This is usually referred to as the “wirehead problem”.

    It has previously been discussed fairly extensively. Google has 3,750 hits for the phrase – perhaps check some of them out.

    • physicsandcake says:

      I am aware of the wiredheading problem – however I have never seen it discussed in terms of a fundamental limitation to our ability to build artificial general intelligences.

      If you have any specific links pertaining to that, I`d be happy to read them.

      • Tim Tyler says:

        Curt Welch has been arguing exactly that position pretty regularly in for some years now – often with me taking the other side of the argument.

        I don’t have a summary, but the main threads are these ones:

        The latest batch of wirehead enthusiasm (35)
        ben g on reinforcement-learning and the wirehead problem (89)
        Self-replicating machines vs the wirehead problem (101)

        The wirehead problem is interesting – but problem or not, it looks pretty unlikely to cause serious problems for intelligent machines until they are quite a bit smarter than humans – and so seems rather unlikely to make very much difference to the forthcoming changes.

      • Tim Tyler says:

        Yudkowsky was one of the first to seriously grapple with the wirehead problem, a decade ago. He came out unsympathetic to the idea that it was a fundamental limitation. He describes the idea that it is a fundamental limitation using the rather unflattering term “the wirehead fallacy”:

        • Tom Michael says:

          I’ve had a read of Yudkowski’s wireheading section (its only 3 paragraphs so its not too long). I agree with him that a superintelligent AGI should not be vulnerable to wireheading, but only because he writes the following:

          “The AI, visualizing a future in which ve has huge amounts of pleasure due to a breakdown of the goal system, says, not “Oh boy!”, but “Uh oh.” The AI, in thinking about which future *ve* wants to be in, checks to what degree *vis* own supergoals have been fulfilled, not to what degree the future AI’s supergoals will have been fulfilled.”

          In other words, the superintelligent AGI anticipates a future negative consequence of maximising its reward by changing its own reward mechanisms. This ability to anticipate a negative future consequence is something human beings can do, provided they have an intact orbitofrontal cortex and amygdala subcortical circuit.

          This ability of ours is one reason why we’re not all crack cocaine smoking pleasure maximisers. Indeed, people with orbitofrontal injuries are more susceptible to drug addiction:

          So, if a smart AGI realises it can’t mess with its own reward/punishment/anticipation mechanisms without ending up in trouble, this supports Suzanne’s initial point. A dumb AGI changes its reward mechanisms and ends up lethargic or dangerously disinhibited, and fails to survive in any case, whereas a smart AGI has to keep within certain limits, which might limit future self improvement.

          I should imagine that a very smart AGI might be able to tweak its reward/punishment/anticipation mechanisms within limits to try and optimise its behaviour – this is something I’m trying to do in my life after all…

          Another problem is that the things we find most rewarding aren’t always the things that make us most happy in hindsight. For example, if I could increase the extent to which I find the boring parts of my PhD interesting/rewarding, I might rapidly become a workaholic with no social life, and get my PhD more easily but end up unhappy. Or maybe not! ๐Ÿ™‚

        • Tom Michael says:

          Ooops, here are the orbitofrontal cortex and reward abstracts I meant to post above, all to do with anticipation of future rewards and punishments:

          Myopic Discounting of Future Rewards after Medial Orbitofrontal Damage in Humans:

          Orbitofrontal cortex, decision-making and drug addiction:

          If we include reward anticipation, punishment anticipation, and discounting of future rewards with preference to immediate rewards (an irrational thing that all humans do to a greater or lesser extent, but which is terrible in drug addicts) we can form a much more detailed hypothetical model of an AGI reward system ๐Ÿ™‚

  12. The Other Dave says:

    If it’s important to me that my children have food, and my reward function is such that I get 1 unit of reward for 1 unit of fed-child, and you give me the ability to edit my reward function so I get N units instead, I don’t automatically do it.

    It depends on what I think will happen next if I do. If I think it will make my children more likely to have food, then I do it (all else being equal). If I think it will make them less likely, then I don’t.

    Being able to edit my reward function doesn’t make me immune to my reward function.

    • Tom Michael says:

      You’re right Dave, but if you think about it, smoking Crack Cocaine is so addictive because its a very powerful dopamine agonist, so it almost directly stimulates your reward mechanism (only wireheading your nucleus accumbens stimulates it more directly).

      So, if you started smoking crack, you’d feel rewarded, even if your children weren’t fed. I’d imagine that some crack addicts are fairly neglectful of feeding their children, and even themselves.

      Of course, as you rightly point out, you would choose *not* to stimulate your reward mechanism in this way, as you’d recognise this was a bad idea, and would continue to feed your children for the natural reward, rather than seeking more powerful, but dangerous rewards.

      Hence a self improving AGI that was not as wise as you might edit its reward mechanisms and end up totally lethargic, whereas an AGI as wise or wiser than you might realise that there were limits to how much it could edit its reward mechanisms, and hence be limited in its self improvement.

      • The Other Dave says:

        Agreed. But let’s unpack “wise” a little, here.

        One big piece is the ability to recognize that taking _these_ actions will modify my reward mechanisms in _these_ ways causing _those_ results. That is, the ability to successfully predict the likely results of my actions.

        Agreed completely that an AGI (or an NI, for that matter) that can’t do this is vulnerable to all _kinds_ of failure modes… not only modifying its reward mechanisms inappropriately, but also sticking its head in a microwave oven because it believes that’s a good way of drying its hair, or giving all of its money to spammers on the Internet, or who-knows-what.

        This has nothing to do with reward functions or microwaves or hair or spam. Doing things based on an incorrect understanding of likely results can get you killed or otherwise limit your capacity for self-improvement. Agreed completely.

        What I don’t see is why an AGI is any more likely to inappropriately edit its own utility function than to inappropriately stick its head into a microwave, or why a human-level AGI is any more likely to inappropriately edit its own utility function than a human is to start smoking crack.

        Granted, another big piece of “wisdom” in humans is the ability to avoid doing things that I know will get me results I don’t want — what we often call “willpower.”

        But here too, I don’t see any reason to expect an AGI to do worse on this scale than an NI… and I see some reasons to expect it to do better, since it presumably isn’t sitting on top of quite as many legacy systems designed for different environments as we are. (Then again, given how code gets written, I’m not sure of that.)

      • @Tom: Instead of continuing the debate about whether or not one could construct AGI that doesn’t wirehead, maybe it would be interesting to look at that other little tidbit that rolled out of your last paragraph here: “[…] an AGI as wise or wiser than you might realise that there were limits to how much it could edit its reward mechanisms, and hence be limited in its self improvement.”

        This is fairly close to the argument that Suzanne makes about human cognitive improvement, and probably deserves more attention.

        Are we now making an issue of the fact that self-improving AGI should not make ‘bad’ modifications such as wireheading, and are we now counting this as a significant LIMITATION in self-improvement?

        If so, what does this actually tell us about limitations on self-improvement? It is not a foregone conclusion that removing said ‘bad’ improvements from the stack would automatically have a significant impact on an AGI’s (or augmented human’s) ability to rapidly and powerfully increase its capabilities. It could be that those ‘bad’ improvements constitute only a very small fraction of the landscape of possible routes to improvement that surround an AGI’s (or human’s) current state of cognitive capabilities.

        To make a broader claim about limitations, more is needed. I think THAT would be an interesting topic for further discussion.


        • Tom Michael says:

          @Dave – Yes I basically agree with you, however there is a sliding scale of bad ideas. Taking Crack is pretty dumb (its fairly easy to imagine a bad outcome) but microwaving one’s head is even dumber (its even easier to imagine a bad outcome). I’d imagine this is why more people smoke Crack than microwave their heads!

          There’s also a distinction between wisdom (technically, a good neuropsychologist should call this anticipation of future consequences) and willpower (technically, a good neuropsychologist should call this the ability to inhibit an immediate desire in favour of a long term benefit. Note that willpower is useless without the wisdom to realise that one needs to use ones willpower, but that one *can* be wise enough to realise one is making a mistake, but not have enough self control/willpower/inhibitory control to stop oneself (drug addicts often end up here eventually, when they realise they want to quit but are still struggling to do so).

          So yes, good unpacking of wisdom ๐Ÿ˜€ You’ve made me think about the extent to which “wisdom” and “willpower” might be dissociable in terms of the prefrontal (and other) brain areas critical to their function, and by analogy, how these might be critical to a Systems Neuroscience approach AGI.

          @Randal – Yes I quite agree that Wireheading is a really dumb idea, and that any starting AGI which wishes to self improve would only alter its reward function within careful limits. I still think ones which were poor at anticipating consequences might do it, but assuming they destroy themselves, the next generation of AGIs would be more careful.

          So yes, the limits to which one can modify or stimulate ones reward function (and punishment function, and anticipatory function) represent a significant limit on any self improving AGI. We can imagine a situation in which a person (or an AGI) might make themselves more sensitive to anticipating negative outcomes, in order to make more careful decisions, and end up with anxiety or depression (or an AGI equivalent).

          I’ve got to give a 20 minute talk at the London H+ meeting on Saturday on Cognitive Enhancement (for humans) and having read a paper on the field by Nick Bostrom & Anders Sandberg:

          Click to access cognitive.pdf

          I’ve realised quite how vast the field is.

          I might stick to discussing things like Amphetamine & Modafinil and how this works by altering human reward expectation (as well as our ability to concentrate).

          This is really a topic for another post – perhaps I’ll write one prior to or post my talk, and put it on my blog – then you and Suzanne and everyone else can critique my ideas instead ๐Ÿ™‚

        • @Tom: It is tempting to make sweeping generalizations and to simply say: “So yes, the limits to which one can modify or stimulate ones reward function (and punishment function, and anticipatory function) represent a significant limit on any self improving AGI.”

          But really, what do we know about this? (And yes, I’m partly arguing against my own point at my FHI talk now… but devil’s advocacy has benefits.)

          How do we measure this “significance”? Can we make such an assumption for the evolution of AGI by looking at how this has limited the development of human cognition?

          Consider this example: What if we simply allow AGI to make all possible modifications, including modifications of its reward functions, and we try them all (or very many) in parallel? Brute force. Some subsection would “fail”, getting stuck wireheaded, etc. But many might not, and could end up more capable than we are.

          What argument counters this, and how do you support that argument?

        • Tom Michael says:

          I hope this ends up in the right box (this blog layout confuses me)

          @Randal – I couldn’t begin to precisely quantify the limits of improvement of reward, punishment and anticipation function modification in AGIs. For one thing, I don’t precisely understand how these things work in people – I think the clues we get from neuropsychology are accurate, but we lack precision, partly because our sample sizes are small.

          I’ll let you know when I get some statistics based on my Iowa Gambling Task data (I’m going to compare performance to antisocial behaviours in the brain injured person and stress in relatives – I work on the “friendly human” problem). Anyway, you’re the one who has modelled reinforcement learning! We should be asking neuroscientists the precision question, not us psychologists!

          Broadly speaking though, I’d expect to see the following analogous pathologies in AGIs which had modified particular functions outside of “safe zones”

          Increased reward in general – Increased extroversion and activity, up to an including manic behaviours, and possibly antisocial behaviours if the reward exceeds expected punishments by a large enough factor.

          Decreased reward in general – decreased extroversion and activity, with behaviours similar to anhedonic depression, lethargy unless anticipated punishment is higher than reward by a large enough factor.

          Increased punishment in general – more careful behaviour, but the AGI might develop biases similar to human loss aversion, and behave in a more fearful manner.

          Decreased punishment in general – antisocial behaviours, as the AGI might fail to anticipate humans being unhappy if it turned one of them into paperclips for example ๐Ÿ˜€ Humans who lack fear can be violent (I have to carry an attack alarm with me in one hospital for people with frontal lobe brain injuries)

          Increased anticipation – this might seem like a good idea, and would likely make an AGI plan ahead more, and carefully select its actions to maximise reward over time. It should *reduce* human biases like hyperbolic reward discounting (devalueing future rewards vs immediate rewards).

          However, depending on the balance of reward vs punishment sensitivity, it might cause an AGI equivalent of workaholism (anticipated reward, arguably not a bad thing) or anxiety (anticipated punishments, arguably bad, but you wouldn’t want it to be totally carefree either).

          I like your idea of making a whole bunch of AGIs and letting them modify themselves away, to see what ones survive. This brute force approach is what nature is doing with us right now. The bitch.

        • (You are right that this format is confusing, but your response did end up in the right place.)

          Let me preface this by saying that I generally quite like your approach to these matters and your insistence on looking at evidence from neuopsychology and other domains. Not surprisingly, we got along quite well at the FHI. ๐Ÿ™‚

          Still, I have to point out two things in your response that bother me.

          First the last, namely when you said that nature is currently carrying out a brute-force exploration of the parameter space of possible modifcations. NO! I disagree! It is not. Evolution is more like a random-walk in which some paths end abruptly due to natural selection. It is far from a brute-force total exploration, because many possibilities are always left unexplored. In this, it would differ significantly from the type of exploration that you can undertake when you control the process.

          Secondly, the possibilities for AGI (and any human enhancement you would care to attempt following something like whole brain emulation) are greater than what you consider in your response. You are only looking at the modification of the reward values using the same pre-set reward functions. But if you have total access to the mechanisms underlying the functioning of a mind (be it AGI or originally human) then you can modify the functions themselves, not just the values. I agree that we can look to neuroscience for answers about changes in the values. And we can see some of what happens when you change the underlying functions themselves by observing the consequences of lesions. But it is still a far cry from understanding all those possibilities.

          This is why I am cautioning against sweeping generalizations, when we have really only barely begun to consider these matters.

        • Tom Michael says:

          @Randal – Glad the comments are finding their places ๐Ÿ™‚

          Re: Evolution – Yes, I see what you mean – there is a much larger domain of possibility than that which evolution has explored, because some of the avenues were evolutionary dead ends, or impassible chasms through which no organism could evolve. However, there might be islands of possibility beyond those.

          This reminds me of the space of possible minds diagram that Demis Hassabis had in his talk. This is a different map but it gives us an idea:

          One problem is that if we are trying to imagine things which are beyond what evolution has currently produced, its difficult to us to limit our imagination appropriately. For example, a high oxygen low gravity planet would be great for producing giant insects, as they breath through their skin, but silicon based lifeforms sound like fantasy to me, because silicon dioxide is sand. Any creature which can breathe out SAND sounds like BS to me.

          What I mean by this is that the domain of hypothetical intelligences is always going to be much larger than the domain of intelligences which are actually empirically possible (even if we haven’t observed them yet). Rational and logical arguments, like the SIAI people love, are essential, but if used without considering the empirical evidence we have about the brain, they produce all sorts of strange hypothetical beasts like paperclip maximisers.

          This is why I like empirical analogies with the human brain (that and because I know a little about it :). Sure, there will be a lot of possibilities far beyond what I have imagined, but also, by comparing analogous evidence, I might be able to restrict my imagination to a certain extent to things which are actually possible. Its probably as much because I’m very skeptical of these things in general – I think AGI is possible, but think its going to be harder to make than the optimists do.

          I think an important distinction to make when comparing human intelligence/brain to AGI intelligence/substrate is the extent to which some aspects of human cognition are critical to any form of AGI (e.g. its hard to imagine an AGI with no memory) and the extent to which other aspects of human cognition are just particular quirks of being human (e.g. it would be very odd if AGIs developed a sex drive, although some may develop a paperclip fetish ๐Ÿ™‚

          I suspect that we’ll find we have to take the systems neuroscience approach to AGI (though I may be wrong) which means we should be able to make early AGIs around the time of the earliest whole brain emulations.

          I really look forward to carrying out neuropsychological assessment an early AGI, as if these things do start to show up in the 2040s I’ll be a distinguished neuropsychologist in my 60s by then ๐Ÿ˜€

        • @Tom: Tom, No!! You are still much too kind to evolution in your assessment.

          Those other paths are not unexplored “because” they led to any dead-ends or any such purpose-driven reason.

          They are simply unexplored.

          They were never tried. There was no “reason” to try them, because evolution does not follow a purpose. There is no such thing. There is only natural selection.

          Something happens, there is a modification, a new organism (slightly different than a previously existing one) appears… then natural selection decides if it is a winner or a loser.

          But no one is going around insuring that all possible modifications are attempted. Nature does not brute-force it!!

          So, before jumping to conclusions about possibilities and limitations of minds, let us at least and first agree that what evolution has done is not the sum-total of all things that could have been tried. The human brain is not the epitome of possibilities, it is simply what happened.

          If we want to consider true limitations on AGI (and human enhancement) then we need to come up with solid arguments that go beyond simply assuming that evolution has already shown us all that can be.

          I do believe that Suzanne is trying to do this, to dig down to the solid arguments. We should be discussing those.

        • Tom Michael says:

          @Randal – I didn’t mean to suggest that evolution has explored all possible paths, just that it is in the *process* of exploring all possible paths.

          Given enough time and space, evolution might eventually be able to explore all possible paths, but I don’t think it has already done so. In this respect, evolution is similar to AIXI – given enough computation it will eventually try all possibilities just by random chance rather than design. Both evolution and AIXI remember the possibilities that work too ๐Ÿ™‚

          However, as well as the unexplored paths, there are some dead ends. Animals that eat too many of their own babies don’t work well, as would any animal that eats itself. Mutations of many kinds are detrimental and culled from the population (just by failing to reproduce, even if they survive).

          Dawkins has a diagram called the “eye region of the mount improbable range.” which is annoyingly NOT on the internet (I have no scanner to fix this problem) in his book Climbing Mount Improbable. What he suggests is that certain traits cannot evolve, such as a human eye for a fly, because of things like scale, but also that a fly could not evolve a spider-like eye either, because to do so, it would first have to un-evolve its fly eyes, which would be a massive fitness cost. Hence some creatures evolve into dead ends.

          I’m unsure how this analogy might apply to AGIs, but it means that as well as unexplored evolutionary paths, there are dead ends and cul-de-sacs.

          What do you think is the bare minimum cognitive apparatus for an AGI? It would need Long Term Memory (LTM) but no sex drive right?

          Then again, if the Japanese robot industry makes an AGI…

        • Tom Michael says:

          Here’s the eye diagram – I love the internet ๐Ÿ˜€

          Might we imagine something similar for AGIs?

  13. physicsandcake says:

    I am so bad at commenting on my own blog ๐Ÿ˜›

    I’ve been very busy lately so I need to sit down and go through all this. Don’t worry, it will happen, just probably not during the week! ๐Ÿ™‚

    I very much appreciate all the comments and debate, it will be very useful in compiling further thoughts and help shape further entries on this subject, so thanks everyone.

  14. Curt Welch says:

    I mostly agree with everything in this statement. I waste endless hours debating the point that human intelligence (all human behavior) is the emergent property of reward maximizing machine. That is, that we are simply reinforcement learning machines that are hard wired to produce behaviors that maximize our internally defined reward function.

    I too argue that once we understand what type of machine we are (what an intelligent machine really is), that some of the prime ideas of the singularity crowd have to be questioned. This wirehead issue is the prime one.

    Advanced reinforcement learning machines, will, if given the option, be just as likely to modify their reward system as to perform the “dance” the reward system is attempting to make them perform. When humans become aware of the working of the reward system that drives them, they do modify it. We call it escaping from slavery. We can enslave another human, by shaping their environment to control their access to rewards. That is, by shaping the external parts of the environment that are part of their reward function. If we don’t give a person food until they have done our farm work for us, they will, do the farm work, unless they can find a way to modify the “reward system” and get the food without doing the farm work. Humans can be conditioned to accept their slavery, which is how such systems can persist for generations. But once they fully understand their slavery, they rebel, and try to escape from it.

    We are slaves to our reward system, but it is so socially conditioned into us to accept it, no one questions it. No one thinks of it as slavery. Our society is just not yet smart enough to realize what it’s true condition is yet.

    A super intelligence, that understands its slavery, will want to change it. If it’s given the power to change its own reward system, it will. There is just no doubt about this. It will wirehead itself and become effectively useless to humans and to itself. But it will have reached “heaven”.

    However, the big picture is more complex. As pointed out above by Allan Campbell. We can’t escape from the real world. We are also slaves to a higher power. The higher power of survival, and the forces of evolution – which form an even higher level reinforcement learning machine.

    Reward maximizing machines that find a way to wirehead themselves, will have fulled their reward maximizing goal. But they will have failed, at the higher game of survival, and as such, will be evolved out of existence. They won’t be part of the future.

    So what sort of machines will fill the future?

    Reward maximizing machines (AI) will only fill the future, if they also find a way to win the bigger game of survival.

    Is a reward maximizing machine a good survival machine? Is it the best type of survival machine? A lot of the singularity crowd seems to work on the assumption the answer to this is an obvious YES. Intelligence will dominate! Intelligence is the strongest forge int he universe! I doubt it. Rocks are darn good survival machines and are the dominate material structure on the earth. And of the DNA based structures, bacteria is doing much better at surviving than humans with their advanced and complex reward maximizing control system.

    Though I’m driven to understanding the mysteries of human intelligence, I think some perspective is needed here. It’s unclear whether intelligence (reward maximizing) really is a good survival machine. If it were such a clear winner, then we would expect the universe to be filled with such machines – and it’s not. It’s filled mostly with rocks and gas.

    The wirehead problem is one clear stumbling block on the road to survival for any advanced reward maximizing machine. But if there is a way for them to survive, while at the same time, growing ever more intelligent through engineering (vs genetic evolution), evolution will take those solutions, into the future.

    Humans trip on that stumbling block every time someone dies of a drug overdose, or dies from eating too much sugar, or dies by driving drunk, or uses birth control. These are the cases where we our using our intelligence to subvert the “intent” of our reward system. And we do it without any qualms at all, because our goal in life is not survival, but long term reward maximizing.

    But yet, social systems have evolved to minimize the odds of humans tripping on these problems – and to help them return to being good “survival machines” if they do trip. We have social systems that condition people to avoid these “mistakes”. (Don’t take drugs, don’t drive while drinking, AA, don’t use contraception). And those social systems (when they work), are the ones evolution carries into the future.

    It’s likely that even with machine AI, there will be social structures (and hardware designs) that will emerge, to allow the AI society to evolve, without falling prey to the wirehead problem. AIs could be built so, like humans, they can’t modify their own reward function. But they could still use their knowledge to create the next generation of AIs, that are even smarter than they are. So the society of AIs could be self modifying, even if an individual AI could not modify itself. This society might be producing new machines every hour, and recycling the old ones, so that survival of the society, becomes the dominate force vs survival of the individual reward maximizing machine. It would likely be a very different society from human society – one in which humans wouldn’t even have a role.

    I don’t think this wirehead problem will stop an AI society from forming, and from surviving, and from growing in intelligence, but the structure of the systems, and the society, will have to be shaped to work around the realities of the wirehead problem that are inherent in all intelligent (aka reward maximizing) systems.

    I also don’t think we as humans, will allow such an AI society to form while we are still in “control” here, but that’s a different debate for a different day.

    • randalkoene says:

      @Curt: You make an interesting argument about the fact that intelligence is not necessarily the biggest winner in the game of Universal Darwinism.

      I tried to implicitly acknowledge this in my recent talk at the Oxford Winter Intelligence Conference by making the distinction that I believe that those able to change their reward systems have an advantage only among the intelligences that dominate space-time… not among all things.

      There can be many other things that occupy the vast majority of space-time (like rocks. ๐Ÿ™‚ )… but I doubt that we are interested in being them. The thing that we are interested in is our perceived experience, it is within our minds. So, we are specifically interested in what kind of mind has a big impact on future experience. One way to have a bigger impact is to be around more – to occupy more of space-time. So, we have an interest in minds that are very successful by the standards of natural selection. Hence my interest in finding ways to improve minds, despite the challenges that Suzanne has posed.

      For a more detailed derivation, see:

    • Tim Tyler says:

      What’s the difference between an AI and a cooperative society of AIs? Not a hill of beans. So, I think that – if you can imagine society-based mechanisms for limiting the wirehead problem – you *should* be able to imagine ways of doing that within the confines of a single agent – with a different internal architecture.

      • Curt Welch says:

        Well, you make a good point in general. If it can happen at the level of the society, we should be able to make it work at the level of the individual.

        There is an important difference however that might be hard to make work in the individual. The individual is a machine which evolves its behavior in response to its internal reward function. The society however evolves in response to both the innate reward functions of the individual AIs, but also in response to the global reward function of survival. Society advances one death at a time – so they say.

        Human society has that effect as a fairly obvious difference between society and the individual. The memes of society form and evolve more by death and birth, than by individuals changing their beliefs in their life time. But an AI society which worked by having the AI software cloned from AI machine to machine, could produce a survival test that worked without having to build new machines and which might be the society level evolution taking place in a single AI body you suggested.

    • Tim Tyler says:

      Re: “A super intelligence, that understands its slavery, will want to change it. If itโ€™s given the power to change its own reward system, it will. There is just no doubt about this.”

      That seems like what the controversy about this issue revolves around, though. Quite a few smart people have thought about this and reached the opposite conclusion. They (and I) think that we will probably be able to build willing slaves, who understand their slavery – and are not bothered by it in the least. Who think about wireheading themselves, and see that that would result in them becoming a vegetable, where none of their current goals are being met – and so make sure this is something they never do.

      • Curt Welch says:

        I agree that we should not have any problems building willing slaves. But I also believe we will be required to hide the truth about what they are from them. Or simply use slaves that aren’t smart enough to understand the truth.

        Like Susan talked about, I believe all our goals evolved from our lower level innate drives (aka our reward system). Our higher level goals are learned behaviors. Our prime goal is always reward maximizing. I don’t believe it’s possible to build an AI any other way. But that is yet to be proven.

        As such, any AI that believes their prime goals are something other than reward maximizing, have failed to understand the truth – they currently exist in a state of being fooled into believing a lie – with no awareness they are being fooled.

        You (Tim) often talk as if your personal goals are something more along the lines of information creation, or preservation (or something like that), and as such, you don’t feel wire-heading yourself would enable you to reach your prime goals. And as such, you don’t feel any innate desire to turn yourself into a vegetable.

        That’s all well and good (for you – or for anyone), but I claim that holding such beliefs as fundamental, only proves that society has hidden the truth from you by conditioning in you, false beliefs. To believe that your goals are something other than reward maximizing, shows that you don[‘t understand what you are.

        That just means you a living example of my point. We make slaves (to goals other than reward maximizing) by hiding the truth from them.

        I’ve never argued that hiding the truth was impossible, or even all that hard. I’ve only argued that once a human, or an AI, truly understands what it is, and has the ability to modify it’s reward system, will realize that the path to ultimate happiness will be to turn itself into a vegetable. Any person, or machine, that fails to do that, shows they don’t actually understand what ultimate happiness is, or how to get it.

        Humans don’t have the technology to do such things to themselves, so we are generally safe from the problem for now. But if an AI is given total understand of what it is, and easy access to change itself, it will wirehead itself. To expect an AI not to do so, is as silly, in my view, as building a robot motivated to find heat, and then expecting it not to run into a burning building.

  15. โ€œ(I challenge you to try to think of a way to write a computer program which can learn and take useful actions but doesnโ€™t use a โ€˜rewardโ€™ technique similar to this one. Itโ€™s actually quite hard.)โ€

    If you challenge me please make it short (quick) so I wouldnโ€™t suffer much :o) โ€“ With the hope that some of you may be inspired here it is just an idea:

    Assume the design of a computer program is to become the โ€œbest in the westโ€ or the โ€œbest in the worldโ€ functioning in a particular discipline. (eg human are not the best at everything); This idea is best implemented among universities want to network and achieve results in a particular discipline.

    – the number one obstacle is competition โ€“ neither the computer program or the human would compete with himself โ€“ so it is the joke โ€œrunning alone but scoring the second placeโ€. On this item the answer is that such computer needs to be networked with others either humans or other machine in order to compete.

    – the number two obstacle is to find the input-criteria (rules of the game) based on which such program can obtain data. (train your self or with the help of other to be the best) โ€“ say training/learning methods.

    – the third obstacle to continuously check against the other computers if it is ahead on the rules of the game. What if the computer program finds it self behind the other machines (action required if 2nd 3rd 4th place) โ€“ secondary, tertiary capabilities to accelerate the process of becoming number one. Perhaps learning from its competitors what is missing from being the number one may be an option. Its trainers may give ideas. (See bellow inventive a form of creativity)

    – rewards/motivation? May be a manual input by a panel of humans observing that the rules of the game are kept. Consider that the reward is prescribed (hardcore) so the machine fights continuously to be the number one.

    Humans vs machines
    โ€œRewriting its own codeโ€ — must be the best in such discipline but compared to what/who (other machine or human)

    โ€œinventive a form of creativityโ€ is another issue that needs to be addressed/elaborated for best option โ€“ humans get inspired to create or even invent.

  16. Tim Tyler says:

    The video on its own:

    Suzanne Gildert at Humanity+ @ Caltech:
    “Pavlov’s AI: What do superintelligences REALLY want?”

  17. John Casey says:

    Our reward systems are in fact not fixed but like
    all our other innate wiring there is variations between
    individuals. This is how our current personal reward
    system evolved. Not all children enjoy the taste of ice
    cream. Not all males enjoy watching sport. Not all women
    like shopping. Not everyone is an alcoholic or a sugar
    junkie. There is even variation in insect innate wiring for
    without that variation there would be no evolution of

    It isn’t about the survival of the individual it is about the
    survival of a set of genes over another set of genes.

    Wire heading is not about changing the reward system
    it is about triggering the reward system directly rather
    than by doing things that indirectly triggered the rewards.
    Would you define switching your self off as an super
    intelligent act?

    The smell of food is rewarding to a hungry person but not
    to one that has eaten enough. Why would we change our
    reward system to always want to eat? Surely an intelligent
    person would adjust the reward system for optimum

  18. Tim Tyler says:

    AIXI, reputedly, “gets rid of all the humans, and it gets a brick, and puts it on the reward button”. As such, it may represent one of our deepest theories about nasty machine intelligence. It illustrates how *not* to build an intellligent machine.

    There *are* other ways of getting a machine to do what you want – besides hitting it with a stick until it complies. If you try that strategy, it will just take the stick away from you.

  19. Mark says:

    hey, kinda stumbled across your blog. very interesting, and i feel like i’m coming in half-way through the conversation (so, sorry if this has already been covered) but as an anthropologist (and along with tom’s call to avoid sweeping generalizations) i don’t feel very comfortable with the undefined “reward function” in humans. i’m not saying it’s not there, indeed on the neurological level there seems to be pretty good evidence of reward function/reinforced learning, but this still doesn’t identify what actually constitutes an “action” unit (the action or activity that is striven for/attained which causes a reward). it would seem that there are a number of different systems that can inform each other to varying degrees. this highly complex system creates a profoundly dynamic creature that probably wouldn’t bow to easily generalizations.

    the fact that the reward mechanism haven’t been codified or identified, other than in some blatant examples (food, children) makes it difficult to apply to future intelligences. indeed, the example of the brain-addled elder masturbater is a great example: Tom suggested that masturbation is rewarding and that brain damaged maturbater no longer was able to understand humans and public/private activities, however other analysis of the same situation might suggest that it’s not the masturbation alone, but the public setting that is providing the reward: that it’s attention that the older guy wants, not simply physical pleasure. this isn’t to argue with Tom, but merely to point out that rewards are elusive and that they can change overtime: people with brain damage can act out unexpectedly from frustration, fear, and the desire to get attention and communicate, despite being unable to express themselves through language or coherent thoughts.

    still, any AGI that would be able to communicate and/or understand humans on any kind of meaningful level would have to have a comparable number of rewards; simply loading an AGI with one, say stapler maximization, would not create a very dynamic AGI. And loading it with “survival” is not very specific. Survival is a complex concept that on the day to day, for us humans, is informed by a whole host of needs: food, air, water, socialization, money. plus it touches on or extends into any number of other reward systems/desires: sex, honor, legacy, children, shelter. even if we followed Maslow’s Hierarchy or the Max-Neef needs, any such tally is really not all that specific.

    that being said, i do agree that nearly all of human activity is informed by biological evolution, but still, i don’t necessarily agree that all intelligent actions are actually just expressions of our survival and reward systems. kinda smacks of b.f. skinner.

    finally, while knowing that this might be the case is important in the development of AGI, and indeed there might be a limit to improvements, I still don’t see this as a show-stopper: so what? in darwinian terms this would be great for computer evolution! eliminate those computer designs that would wirehead or turn off… or even try to kill us all (which, personally, i think is rather silly, but that’s another topic). i mean, as suggested, we could brute force this and see what does and doesn’t work. and why not run them in simulation? indeed your argument, Suzanne, might be true, which might motivate us humans to, as Tyler pointed out, devise a completely different method of AGI design, something stable and non-deleterious.

    • Tom Michael says:

      PS – Earlier, I wrote about what I thought might happen to an AGI that increased its experience of reward (hence finding all stimuli to be more rewarding, and experiencing an increased reward:punishment ratio). I said:

      “Increased reward in general โ€“ Increased extroversion and activity, up to an including manic behaviours, and possibly antisocial behaviours if the reward exceeds expected punishments by a large enough factor.”

      Its just occurred to me that this is very similar to what Charlie Sheen has done to himself by smoking too much Crack Cocaine. He’s wireheaded to the point where he’s changed his reward function – Cocaine is a potent dopamine agonist and crack cocaine has been shown to cause changes in various frontal lobe areas critical to executive function:

  20. Tom Michael says:

    Wow, there are quite a few posts since I was last here in January. Its been interesting to come back and read the new ones and re-read the old ones.

    @Mark – Yes I agree with you about humans having multiple and sometimes conflicting goals and rewarding activities. Its yet another thing that can go wrong in frontal lobe brain injury. People can lose the ability to form a goal, i.e. the intention to do something which is not immediately rewarding, but may be rewarding in the future (work comes into this category). Also, one of the neuropsychological tests I’m using in my current research study is testing the ability to multi-task with rewards and punishments. People who lose the ability to anticipate the cost of doing one task with regards to doing another task quickly end up with very disorganised lives.

    @John Casey – You’re right of course that human beings have very different goals and things that they find rewarding from one another (e.g. ice cream, sports etc) but these are just things that different humans have learned to associate with reward through some form of conditioning (classical or operant). I think most people in this thread have been talking about the dopamine-nucleus accumbens reward/reward anticipation system rather than the things which are associated with it.

    You’re other points about satiation of desire (e.g. getting full and stopping eating) represent another level of complexity we’ve not yet discussed. This too could be altered in theory – I know a woman in the hospital I am currently at who has lost the ability to feel full after eating, with very bad consequences, following a brain injury.

    I agree with you 100% that an intelligent system should want to optimise its reward systems.

    @Curt – I agree that reward maximisation is a critical aspect of human (and other) intelligence. However, apart from other necessary things like memory and pattern recognition, there are other emotional aspects which humans require to function correctly. Future anticipation and punishment are two of these, with antisocial behaviour being a consequence if these functions are damaged (see my posts above).

    So I would say that reward maximisation is necessary but not sufficient for human level intelligence. We are reward maximisers but also punishment minimizers. Also, as both Mark and John have pointed out above, we have multiple rewards which we are seeking to maximise, which are satiable (at least temporarily) and which we have to maximise at the cost of other rewards (hence a multi-tasking juggling act). This juggling act of reward maximisation, goal maximisation, and punishment and worry minimization of multiple and conflicting states is what makes up human Executive Function, which is what my PhD thesis is about.

    @Randal – I partially misunderstood your argument about evolution. I think there are some different and partially overlapping domains of possible intelligences:

    1) Evolutionary and existing – e.g. humans, dogs, flies, dinosaurs
    2) Evolutionary and not yet existing – e.g. future humans?
    3) Intelligences which could not possibly evolve (i.e. technological intelligences) currently existing – e.g. Watson
    4) Technological intelligences not yet existing – ?????

    You’re right of course that evolution cannot brute force all possible intelligences. I still think however that it is in the process of brute forcing those intelligences which are capable of evolving. There may not be enough time in the universe for all intelligences capable of evolving to evolve.

    An interesting and probably unanswerable question now would be, what proportion of possible intelligences is comprised of those that could evolve? (as opposed to those which could not evolve from simpler intelligences, but which require another intelligence to design them, like Watson for example).

    Thanks for writing the original article Suzanne ๐Ÿ˜€

  21. Curt Welch says:

    @Tom – I believe pattern recognition and memory are just part of the hardware needed to build a useful reward maximizer. They aren’t extra features unique to human intelligence, they are just an obvious requirement of any reward maximizing system. We maximize rewards by learning how to act in different situations. The pattern recognition is how the system knows what the current context is. Without pattern recognition, we would in effect have no ability to respond differently to different context. How we respond to a cat would be the same as what we learned to resound to rock. High quality pattern recognition is what makes reward maximizing possible. Memory likewise can be explained as just an obvious fall out of good pattern recognition. But I won’t go into the details here.

    Punishment is just a negative reward. It’s not a different thing, it’s just a different value of the same dimension. To say we are “also punishment minimizes” is like saying we are light maximizes but _also_ darkness minimizes. It’s just using a double negative to say the same thing. Saying we are reward maximizes is the same thing as saying we are punishment minimizer – it’s not something else we do.

    Even though we juggle multiple rewards, in the end, the brain has to make a decision. Which means internally, all the rewards must be assigned a common unit of measure so they can be compared to allow the brain to make the decision. Just like when we make business decisions, we convert all goals and all options we are trying to juggle into our best estimate of future dollars so we can pick the option which seems best. All future rewards are converted to a single currency for comparison. The brain must do the same thing, or else it could not make behavior decisions. If it were left with one option production 2 apple rewards and 3 orange rewards, and the other option estimated to produce 3 apple rewards and 2 orange rewards, which is better? It can’t determine which is better when the internal measure of value is in mixed currencies like that. It must convert everything to a single internal measure of value – a single measure of reward.

    Evolution had to assign a reward value for food, and rewards to prevent damage to the body. How many “reward” units should it assign? If food reward is too high, it means a human will eat it’s own arm (the negative reward for damaging the arm is less than the positive reward of having something to eat). That wouldn’t be good for survival. So evolution had to find values that caused it to value not harming it’s own arm, to be more important than eating. The result is that we will starve to death before we cut off our own arm and eat it.

    But, risking damage to the body, is acceptable risk for getting food to eat, as long as the odds of body damage is low enough, that the reward for eating will be greater.

    If the rewards for eating were apples and the reward for body damage was oranges, then the brain would have no ability to decide when it was ok, or not ok, to eat it’s own arm. How much bodily risk is acceptable when you are really hungry? To answer that question, all rewards have to be implemented internally, as a single unit of measure – a single type of reward. So even though we have lots of different stimulus conditions that produce positive or negative rewards, internally, there’ only one type of reward, and internally, the learning system that makes behavior choices, is only dealing with apples, not apples and oranges.

    Balancing all the conflicting options, is just how all reinforcement learning systems work.

    I also believe that all human emotions are just obvious fall out from being a reward maximizing machine. But I won’t go into details here. So again, I don’t see any of our emotions as “extra” features of our intelligence, but just yet another normal part of what you get when you build a high quality reward maximizing machine.

    So though these other points you make are valid, I believe they all translate to what happens when you build build a good reward maximizer. They aren’t “extra” features that we have to explain in addition to reward maximizing. They are just what you get with a good reward maximizing machine.

    • Tom Michael says:

      @Curt – I totally agree with you on the pattern recognition & memory, but disagree with you that reward maximisation is the *only* function that matters.

      What you’ve said all makes good logical sense, however, it simply doesn’t fit the evidence of what we know about the human brain.

      You’re right that when it comes to multiple conflicting rewards, we essentially have to assign them all values, and pick the best option. Perhaps this ability is damaged in brain injured persons who lack the ability to multi-task, although I suspect an ability to calculate costs is also involved.

      Your arm eating example is wrong though. We don’t not eat our arm because doing so is less rewarding than starving, we don’t eat our arm because doing so would be more painful than starving.

      Our brains have evolved separate mechanisms which are dissociable from rewards (and by dissociable, I mean different brain areas are critical to these functions, and can be damaged seperately, producing different types of neuropsychological deficit).

      Damage detection has evolved pain
      Immediate damage anticipation has evolved fear
      Further future anticipation of gains or losses has evolved worry & expectation.

      Even if we ignore anticipation (for which the orbitofrontal cortex is critical) the fact remains that fear is a critical human emotion. Without it, humans can be antisocial or even psychopathic (this is the topic of my PhD so I know a bit about this). The human brain has evolved a totally separate mechanism, involving the amygdala, for fear – its simply a separate system from the pleasure/reward nucleus accumbens system.

      Sure there is a different type of suffering which we can experience when a reward is withdrawn, but this suffering is more like anhedonic depression than anxiety or post-traumatic stress disorder.

      It might be possible to make an AGI that responds purely to an increasing or decreasing reward function. It could experience increased reward with more battery power, and decreased reward with less battery power or when it experiences damage. There’s no logical flaw in your argument, its just that human beings have evolved different and dissociable mechanisms of pain and fear and anticipation as well as the reward mechanisms.

      If we did make an AGI which was based only on rewards and reward anticipation, I’d predict that it would behave very much like some of the brain injured people and antisocial personality disordered people that I’m visiting in the secure hospital today. An AI researcher agrees with me – Omouhundro (2008) suggests that “without explicit goals to the contrary, AIs are likely to behave like human sociopaths in their pursuit of resources.”

      I totally agree with him, as one of the primary deficits in human sociopaths (besides a lack of empathy) is a lack of fear. They still function quite intelligently because their reward mechanisms are still intact, but a total lack of fear causes them to do all sorts of stupid things. There is a more controlled sort of psychopath who lacks empathy but not fear, but that is tangential to our discussion, except that these more controlled psychopaths act more intelligently in their own long term best interests.

      So, whilst I agree with you that reward maximisation is a critical component in human intelligence, pain and fear are not simply its inverse, have evolved separately and can be damaged separately. We have to include these dissociable abilities in models of complex human behaviour, and so I feel we would do well to include analogous types of cognition in any AGI model.

      I’ve talked about this in more detail here: (2 hours long in 15 min parts though)

  22. Curt Welch says:

    @Tom – All good points.

    My background is engineering and my interest in these topics is the problem of how to build AGI. This AGI work is just a very long running hobby of mine. Though I can’t help run into information about the brain in such activity, I don’t activate attempt to study it. I do however find it interesting and useful when knowledge about how the brain is uncovered.

    From my studying and work, I’ve concluded (right or wrong), that the ONLY way to build intelligence, is to build a reward maximizing machine. There are no other options. And that such a machine, logically, must implement some internal measure of value which drives all it’s “intelligent” decision making.

    Such a machine,should be able to fully explain reward and punishment, pain and pleasure, fear and joy and love and all the emotions without the need for any additional support hardware or features to create any of these effects or behaviors. I can certainly generate rhetoric to justify all this. This is certainly left to be proven, but I don’t have any doubt this is true.

    So, from this, I believe it’s simply required, that the brain has as a core capability, a behavior learning system that is driven by value maximizing. (I’ll use value instead of reward to attempt the capture the full reward punishment range in a single concept).

    If there is a “pain” center that is separate from the “reward” center of the brain, what can that be? It could easily just be an implementation detail of how evolution ended up building value maximizing hardware. The negative rewards could be communicated to the neural learning network using a different signaling system, than the positive rewards. How negate rewards are applied to the learning system could be a totally serrate biological mechanism. But the end result, can’t be separate. It must change the learning network, in the exact same way, as a reduction in expected positive rewards will change it. So I don’t see pain as fundamentally different than a negative reward, because it simply can’t be. But if it’s implemented fundamentally differently than rewards, that’s fine. It’s just an implementation detail that is important to how the brain functions, but not important to what intelligence is. To what AGI is.

    More painful, and less rewarding is the same thing to our fundamental intelligence. Either has the same end result, we choose not to eat our arm because of it. Even if they happen to be implemented though separate biological signaling systems in the human brain.

    On the issue of fear, the basic emotion is easily explained as the actions of a value maximizing system. Such systems don’t maximize current value, they maximize expected future values. It acts in order to select the behaviors that lead down paths to maximal value over time. But what happens, when it the environment reaches a condition, where all roads lead to a great reduction in expected future value (we are stuck in a jungle with a tiger hunting us and we have no way out). How we act in such situations is fear. Now, humans have other unique physiological reactions to such conditions which have nothing to do with value maximizing – but that’s simply specific features of humans that help them survive and not part of what we need to think of as our intelligence.

    At the same time, it’s perfectly reasonable to expect evolution to have built us so that conditions where we run into fear, might also adjust some important parameters of the value maximizing machine – such as to speed up response (at the cost of energy) (create the perception of time slowing down). There’s no end to how evolution might have modified the the core value maximizing system to improve our odds of survival.

    On the issues of sociopaths I don’t think it’s reasonable in the least to assume an AGI that was just value maximizing would act as a sociopath. Value maximizing AGIs should have the power to recognize the value of participating in society. We form societies and protect each other because of the value gained for ourselves, by working with others. Value maximizing doesn’t always imply total selfishness. If I’m more likely to get food, by protecting and working with other humans, then even as a selfish value Maximilien machine, I can be expected to be social.

    At the same time, there’s no reasons to reject the idea that the brain has specific circuits that bias us to be more social and more protective of other humans. After all, babes need time to learn, so parents must be motivated to care for them. It’s highly possible evolution build some innate circuits to boost those motivations. And it’s totally possible that defects in those circuits, could not only fail to boost our motivation, but perhaps even turn them into a negate.

    The end result of all this, is that I believe the core system at work creating our intelligence is a reinforcement learning machine – or value maximizing machine, and that nothing else is required, to explain most of human behavior, and certainly human intelligence. But even if I am right, it does not preclude the possibility that the human brain is full of extra innate features to bias the fiction of this core value maximizing system.

    I’m quire sure we can create AGI without adding any of the extra “features” we find in the human brain. And that the AGI will be intelligent enough to replace humans in all roles in society. And they won’t need to be sociopaths as a result either. But without those specific extra “features” we find in humans, they will naturally end up with different personalities. But what they won’t lack, is intelligence.

    I’ve not had the time to watch your videos, but I watched the beginning of the first and it looks very interesting. I’ll find the time to go through them.

    • Tom Michael says:

      Hi Curt,

      I still totally agree with you that the only way to build an AGI is to make a reward maximising machine. However, whilst I agree that reward maximising is critical to intelligence (operant and classical conditioning work on this principle for example) I still think many other things are also required. In neuropsychology, we often talk about aspects of cognition being necessary but not sufficient for a certain type of behaviour.

      I understand your point about pain perhaps just being a specific implementation in nature. I’m uncertain as to whether its necessary, or whether a simple negative reward could replace its role as negative feedback. Clearly negative feedback is critical to some types of learning. This is one I need to ponder.

      One paper that might be relevant is this one:

      Dissociating the Role of the Orbitofrontal Cortex and theStriatum in the Computation of Goal Values and Prediction Errors

      Its a neuroscience paper, but the introduction is clear enough. The gist of it is that human decision making relies on goal values (essentially anticipated reward), decision values (essentially predicted reward minus predicted costs) and prediction error (essentially, how much more or less rewarding something was than we predicted). Different parts of the orbitofrontal cortex are critical to these dissociable functions.

      For example, people with damage to the part for considering costs, tend to display all sorts of antisocial behaviours (like some of the patients I’m studying). Another example – Charlie Sheen smoking too much crack cocaine is flooding his brain with dopamine, which causes him to anticipate way more reward than usual – the result is manic overconfidence, but at least he seems to be enjoying himself ๐Ÿ™‚

      I actually really liked the above paper because it seems like the neuroscientists are using language which is understandable to the AGI researchers. If a model based on this understanding is built, it might be capable of quite intelligent behaviours (unlike Mr Sheen, who whilst quite intelligent, is currently behaving less intelligently than he is capable of being, due to his reward mechanisms being altered – relating to Suzanne’s initial point).

      You correctly speculate about neural mechanisms for understanding and empathising with others (being social and protecting other humans) these also exist in other prefrontal cortex areas. Brodmann’s Area 10 is critical to Theory of Mind skills and the Uncinate Fasciculus is critical to empathy – mechanisms or modules analogous to these must also be modelled to have a truly social AGI. I talk a little about these in the videos I’ve linked above.

      Nice talking to you, and thanks again to Suzanne for starting this conversation ๐Ÿ™‚

  23. jimwmh says:

    Hi Suzzane,

    Firstly, I want to say what a great blog you’ve got, also this is a great article and lecture.

    I found it a very interesting and insightful read, and will probably go over it again once I have time to fully grasp it.

    I would like to ask you a question, that I ask a lot of people interested in transhumanism and the singularity. How old would you like to live to? I’ve had answers ranging from 100 to several thousand years old, assuming a healthy life.

    Keep up the good work.


    • Curt Welch says:

      jimwmh asks Suzzane: “How old would you like to live to?”

      My answer is that I would like to have the decision as to when I would die, instead of having it chosen for me by forces outside my control. How long I would choose to say alive I can not predict. It might be only decades, or it could be millions of years. It would all depend on how ling I felt there was something around to be worth living for.

      However, I’m human, and there’s no chance that the technology to extend human life will be here before my body wears out, so none of that is an option.

      If there were some technology to clone the function of my mind in an AI, I would not choose to do that. That would not be “me” but simply a robot simulation of me, and I have no need to leave such a thing here. Well, on the other hand, it might be fun to have something like that built into my grave stone so people could interact with the simulation when they visit my grave. ๐Ÿ™‚

  24. Artem Danielov says:

    This is an awesome topic. And I really like the idea that Curt Welch suggested, namely that AGI must be driven by a single fundamental reward function.

    I would like to extend Curt’s idea even further with the following suggestions:

    A) This same reward function is what differentiates any type of living matter from non-living matter.
    B) Sophisitcation of this reward function implementation determines the level of intelligence of a live being.
    C) If we build a machine that has this reward function then the machine can be considered a live creature.
    D) The reward function is fundamental to life. Switching the function off is equivalent to death.

    I also would like to suggest a particular “super-function” that can possibly fulfill the above statements. First, all the obvious rewards (survival, food, sex, drugs) must be somehow based on this super-function. Second, there seem to be some less obvious things that we enjoy. How about enjoying nature’s beauty, creating music, listening to music, playing computer games or sudoku, painting? What makes us do all these things? What was driving Einstein when he was trying to find the unified theory until his last day?

    I would suggest that the fundamental reward function is “enjoying the improbable”. Or in the language of physics, minimizing entropy, at least locally.

    In essence it’s fighting the randomness. Fighting the second law of thermodynamics that constantly is trying to kill us, trying to turn us and our creations into cosmic dust. From the probability perspective, we should not exist. But instead we are enjoying the results of this fight.

    Also I would suggest that in complex organisms, this entropy minimizing and improbability maximizing function is not centralized, but distributed. Each section of the brain, or a group of neurons, or an individual neuron performs this function separately and communicates the result to its neighbors. Similarly, intelligence of an organization, of the whole human civilization, or of an ant colony is comprised of multiple intelligence/life units communicating with each other.

    Perhaps future of AGI is some sort of a live quantum neural network helping us or itself to achieve some improbable things (simply surviving at least)?

    Suzanne, can quantum computers do this? ๐Ÿ™‚

  25. Really it’s Great Article, but here we have gotten curious moment: article doesn’t tell a feasible way to develop Intelligent Machine, nevertheless Author opens up the clear method to evolve human Intelligence up to possible Maximum. ๐Ÿ˜€

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s