and and like as you were noting in mathematics there’s often this thing where there’s a a sort of a naive way of seeing a problem but you can potentially uh view the problem in a totally different space and like I I’m not I’m not that familiar but my understanding is like analytic number theory is basically just like doing like uh you know like totally operating differently than how you might have imagined in terms of how they approach various various various aspects of the situation quick pause given that we’re having a quential San Francisco conversation about AGI what if I told you that you could profit from your expert knowledge in artificial intelligence let me show you CI the first regulated platform where you can trade on the outcome of real world events Ki lets you bet on hundreds of markets from politics to Tech to entertainment for example let’s bet on open AI achieving AGI before 2030 I mean obviously they W by the way but let’s just hedge it for a laugh it’s actually fascinating just to look at the graphs of how other people have traded because it tells you a hell of a lot of information that you might not know anyway I’m investing $500 which means a potential profit of 17% on this trade if open AI announced that they achieve AGI before 2030 if I wanted to be more bullish I might bet before 2026 and I’d earn a lot more money if I was correct now you should trade as quickly as possible if you know something that others don’t with ki’s app you can trade anywhere anytime and the best part it’s legal in all 50 states now join now using my special link and the first 500 people to join and deposit $100 will get a $20 bonus now back to the interview yeah because I think that that’s the fundamental thing when when I read a lot of stuff on this wrong about deception and stuff like that you know it’s all um assuming that these things have intentionality they they have goals and so on and for well in the future yeah oh yeah yeah I mean if that’s your position then then I think that’s very reasonable yeah I mean I don’t necessarily think that I don’t think current LMS are that well described as having cross context I think they they don’t really have cross context goals or cross context intentions at the moment but I think I’m just like I don’t know AI are getting more powerful what do you mean cross context yeah so what I mean is you can prompt an LM like uh you should do X even if the US even if like the environment makes this hard and it will try to some extent to do that um but that’s different from saying that it’s really pursuing like kind of a specific things across many different prompts potentially prompts that tell it to do something else um and and and this is like uh yeah I think this is pretty different oh interesting okay cool well um let’s slowly kick off then so Ryan it’s it’s an honor to have you on mlst can you introduce yourself for the audience yeah so um I’m Ryan I work at Redwood research and um recently or I guess I guess what I’m uh known for is uh getting to uh quite good score on rcgi using uh GPD 40 uh recently yes now Ryan put a blog post out a couple of weeks ago and of course we covered it in in the the episode that we did on Arc just just now I was very impressed I was surprised could you outline your approach and what your current accuracy is yeah so I started by just trying to mess around some with gd4 and I was like hey this model is actually not that bad at recognizing the patterns in these images and so I tried a few different things but ended up figuring out the best approach would be get gp4 to write code um that implements the transformation rule in R AI so in rgi you have these pairs of images there’s an input and an output and the goal is to produce the output for some for some additional input and and I basically have gp40 right python code that takes in an input and generates the output and I have it do that a lot of times so I I do that for uh like like um in my most recent submission about a thousand times per each um input and then what I do is I have all these programs and I can run the programs on all of the examples that you’re given and check that they’re they they pass all these examples all these test cases and then I pick whichever program passes all the test cases and then run that one on the held out input that produces my output and I hope that that’s right because the program was right on the other other inputs and submit that and there’s a bunch of other stuff I do to make it more performant but that’s sort of the the core of the approach that I use very cool um what happens if multiple programs work which one do you select yeah so what I do at the moment is I do a majority vote in terms of what those programs output on the held out input so we don’t actually care about what the program actually does in general we actually only care about what it does on the inputs we need to submit um and then we we for run it on the test cases so if we have a bunch of programs that are all agreeing on the held out input then it’s like great there’s no no issue here they all think the same thing is true if they disagree then obviously one of them is wrong because there’s only one correct answer um but what we can do is be like well if 90% of them agree on this thing and then 10% of them think it’s something else well we’ll go with the 90% um and actually there’s there’s you can submit two two guesses on R GGI so we can actually take the top two majority votes so we can be like well first we take the one that the most vote for then we take the one the second most vote for and so on um I I think this improves performance a tiny bit it doesn’t make much of a difference honestly um over just like a more naive strategy like if you just picked it random it’s it’s not it’s just like a higher variance approximation of majority vote basically so yeah I’m quite interested in this concept of ambiguity because presumably there are an infinite you know set of possible programs that would solve many of the and I’m also interested in how we conceptualize the problem so it seems like there is an ambiguity in how we conceptualize the problem but there is in the solution and then when we are disambiguating out of the possible solutions we could argue about how we might do that so we might pick one which has the lowest complexity or the description length or the one which is most aligned to our you know human prize if you like do you have any thoughts on that yeah so I’m really banking a lot on the fact that the language model is not trying to cheat so if the language model was just trying to cheat and was just trying to special case the solutions it saw and and just produce exactly those then if you’d select for a correct program you just get a program that get is right on those inputs and has no hope of generalizing but in practice the model you know puts in puts in a good try to at least some extent uh and and therefore it’s sort of like we’re generating the program from like I guess as you were saying like the prior of programs that are like a reasonable guess at what you might do given this given this input um and which aren’t just very specific to the exact inputs you receive though I I do see some cases where the model like overly specializes um and then once you have that program I don’t think it would help that much to to like you know try to pick the one that’s least complexity basically because it’s just too hard like we don’t have a great notion of complexity it might have helped to do something like ask a model like have another model rate the program for how much it seems overfit and then if it seems super overfit reject it but this I think I think this would mostly just add cost without adding much benefit because now you have to have another call to the model where it reviews every single program um but but but yeah um in terms of the ambiguity here I mean it’s worth noting that there’s a lot of ambiguity in like program space that doesn’t apply in the behavior of the program right so there’s many many different ways you can Implement any given thing and then in addition to that there’s also ambiguity and behavior where you you know you have many implementations that just do different things and many implementations that do the same thing on the test cases but different thing on things on potentially held out held out um held out values yeah yeah so that dichotomy between program space and behavior space that that’s super interesting and um how are you thinking about that so you’re you’re kind of looking at Behavior space now rather than program space what do you think um yeah I mean I would say my Approach basically doesn’t really look at the programs at all and basically just looks at their behavior so in that sense that that’s true but I am depending on like a St the kind of strong guarantee is on in the space of programs which are like the programs are not trying to overfit and I wouldn’t have you know the the approach I was using wouldn’t have any chance if the programs were random so it also needs to be the case that the programs are actually getting close to the right solution or or hitting the right solution a pretty pretty high fraction of the time yeah yeah interesting I mean in your own articulation what do you think the purpose is of the arc challenge yeah so I think it’s I think it’s an interesting Benchmark that shows some ways in which current llms are weak in which current AIS are are somewhat worse um it it’s more dependent so I’d say like the biggest things for rgi at the moment from my perspective are Vision visual reasoning and some amount of like puzzle solving type ability and and or and just just general reasoning in that um and and there’s you know also I should say there’s there’s many different ways you could approach these these problems and some of the ways will depend on different skills the way I approached it depends on the things I said but you could potentially use approaches that don’t depend on those things at all um and and I think um the the part of it that’s most interesting to me is the extent to which it captures something like here’s a specific property that current LMS are are quite bad at um or or just current AI in general though I I had happen to be somewhat fixated on um on LMS right I I happen to think about that mostly um and it’s it’s like um I I think um like you know current language models are very good at something so like you know we can there’s lots and lots of programs they can write quite well um but they’re banking on a lot of knowledgeability so they’re banking on the the fact that they have a lot of knowledge of different programs I mean LMS have much more stuff memorized than any human does they can speak every language they can speak in codes they can speak every programming language um but they’re somewhat less good at at I would say reasoning or like quite a bit less good that at reasoning than humans I think they still have you know some some ability to reason some ability to understand the situation some ability to learn from examp they’re given in context uh but but it’s definitely a weakness right now yeah I mean you said visual reasoning and that’s quite interesting as well because I’m I’m interested in um how reasoning changes in in different domains I was reading that paper I think GN had the paper about Terren towel growing up I’m not sure if you’ve read that and you know even as a seven-year-old boy he was kind of like a wizard at mathematics and um apparently he prefer to use analytical methods rather than you know kind of spatial methods for solving a lot of problems but you know the the the miracle of of abstraction space is that I mean first of all a lot of reasoning happens inside of modality but abstraction is about finding meta links between modalities so for example identifying that gravity is is the thing which links the movement of the planets and when when the Apple falls from the tree it’s like you know actually Mak making that that link between the domains which is interesting but um I’m also interested I mean by the way I think that the ark challenge is about knowledge so I I think charet specifically said that here a knowledge Gap and you know reasoning is how we acquire knowledge and intelligence is the efficiency of that reasoning process and I’m giving you a problem where you don’t have the knowledge but then there’s the question of where does the knowledge come from so language models they have knowledge and abstract reasoning is about being able to find a model and the kind of the information space or the pathway between the data you have and the model you create might be quite tenuous so how do I efficiently find that link yeah um I I yeah I mean you said a bunch of different things so I’m going to respond respond to to one thing at a time so so one thing is uh yeah I I think the puzzles are most naturally solved by humans using visual reasoning so the puzzles reasonably often will involve things like moving shapes around rotating shapes uh doing like taking a subset of one grid and looking at another and so I think for humans it’s most naturally solved visually now of course you could as you have noted like transform it into another another regime like maybe you could express every puzzle in terms of some like I don’t know like DSL that is is is easier for for some other like for LM to interact with or maybe even easier for some humans to interact with and that doesn’t actually get corresponded to visuals at all um and and like as you were noting in mathematics there’s often this thing where there’s a sort of a naive way of seeing a problem but you can potentially uh view the problem in a totally different space and like I I’m not I’m not that familiar but my understanding is like analytic number theory is basically just like doing like uh you know like totally operating differently than how you might have imagined in terms of how they approach various various various aspects of the situation um so yeah I I think I think I basically agree with that my guess is that in practice the most effective way to solve Arc AGI puzzles at least using the sort of approach I used is going to bank on visual reasoning and is going to bank on the fact that models have been trained on huge numbers of images and over the course of that Acquire some somewhat General ability to reason about images understand images Rec patterns and images um even if that ability is somewhat weak um yeah so you were saying you said the just just going to your next point you said that the arc um yeah Arc AGI challenge or whatever is about uh knowledge um it really sounded like your description was more that it was about quickly acquiring or applying knowledge or like you know zooming in on that like for example for any given puzzle I don’t think that anyone assuming the puzzle is you know sufficiently distinct has an exist existing like rule that applies to that puzzle which they’ve already found in their head like no one has like memorized the rule from the set of like a bajillion rules and then is pulling that out I would say it’s more like I don’t know at least for me I I’ve solved a decent number number of these puzzles now just messing around in this sort of stuff uh and and and for me I have like oh I have like some sense of what the moving Parts here are like oh there’s this thing where like you’re looking at a subset you’re changing colors there’s like various Styles there’s like fill-in the blank there’s grids and I sort of have these moving Parts in my head and that makes it so I can solve these puzzles much faster um because I’ve sort of overfit to The Domain actually because I’ve looked at a bunch of these um and I think humans definitely get better at solving these over time um I certainly certainly I have um and and and then you you can you know move around these parts but you can also solve them relatively Den noo just using uh knowledge that most people have um my guess is that the way the language models are doing it at least in my Approach is somewhere in between where I think they’re probably not actually as overfit to rgi as I am in some sense with my Approach because they actually I think they probably have fewer examples that they’re have as like good recollection of that they’ve learned as well from at the moment because while they’ve been probably trained on a bunch of related data they were trained on it in addition to a bunch of other data and they probably don’t have it as as Salient whereas I spent like you know a bunch of time messing around with the puzzle um that might change in the future I don’t know what the next training Run for the next model looks like uh but my guess is that in some sense they’re actually operating from sort of a disadvantage in this regime which is a little surprising because in most cases their advantage is having seen this but I think in this case that that’s one of their disadvantages is that they haven’t actually been trained on or like they it’s not as it’s not as active for them yeah well yeah I think there’s an interesting um dichotomy between system 2 creation versus system 2 intuition so when you’re solving riddles many of my friends tell me about this you know it’s a skill it’s actually something you can learn and I think of the arc is similar to the concept of abduction right so you know I’m basically um cutting down I’m framing a set of plausible hypotheses and I’m doing inference over those hypotheses and this is Guided by intuition now um when kids are taught mathematics at school they’re not allowed to use their calculator because we teach them how to reason right we teach them the the rules the axioms and so on and you know you might sort of naively say well why bother you know just give me the calculator well the reason is that these rules can be intuitively recombined and composed together to solve other problems you know it’s really powerful knowledge so now we’re just kind of you know we have this knowledge and we’re just interpolating over it we’re using our intuition and so on but the key question is where did the knowledge come from in the first place so at some point some very bright person you know this is what we do with science we come up with models these abstract models that work in many many situations and then most of us don’t use the models because you know we already kind of know about them being used and then we just do pattern matching and intuition over them but at some point the models either got invented or they got discovered and then there’s the question of how fundamental are these are these models so even mathematical operations can be expressed in forms of like set operations or whatever right so you could have like you know some real fundamental operations and we can have an algorithm that does program learning and you know we have this like library of inscrutable kind of compositions of operations or whatever and maybe that’ll be good but it wouldn’t be very human aligned it wouldn’t be very understandable but um yeah what what what do you think about um how fundamentally we could you know represent knowledge and where the knowledge comes from yeah so so one maybe one thing a bit of push back there I I think in practice people who really are good at working with models basically do reinvent them with the inspiration like I think that it’s it’s very common in my work that like I basically reinvent ideas that I had six months ago all the time and I’m like Ah that’s a good idea and I’m like well I basically already kind of knew that but I think I think uh I find at least like something that’s sort of like rediscovery with like hints is actually pretty key to my ability to have uh things in place and like in some sense like the best way to have compressed memory is to have the beest pointer to a thing such that you can ReDiscover it and and I think it’s like in a lot of intellectual work it’s it’s um I forget where I I think i’ first read this in some some blog post like it’s a good idea to be in a regime where you can basically reinvent any key piece of knowledge from the rest of your knowledge um and and and do that uh and then as far as uh the claims that like Arc AGI is about like abduction um I would say that that for for me at least um a lot of the puzzles I basically or like at least a lot of the easier puzzles I basically just use something more like intuition um at least for me like as in what I do is I look at the puzzle and I sort of just know what the rule is and then I’m just like great I know what the rule is and then I check it but like then I’m just right I just I just knew what the rule was and then there’s some amount of puzzles where you like figure out what the shape of the rule is but there are some free variables like okay I know that this shape is being changed color in this way but in what way is it being changed like in what way is the color being changed um and I do have to do conscious you know as you were saying like sort of more system to like thinking slow style reasoning for that um and and and and I agree uh that that is in fact a core element I think interestingly in in the the setup I use it’s not like both phases happen uh at two different points so I have there’s sort of kind of two parts to my Approach one of them is that I have the language model do reasoning and generate the code and it does first does sort of Chain of Thought reasoning that I based on based on how I prompted it then it generates code and then I do sort of selection over these programs um and I would say that there’s some amount of both selection for having the right intuition and selection for having the right like you know more like system to trickier to pin down parts that require like testing things uh in the selection phase but it’s also the case that when the model does reasoning it does do a bit of both the intuition and also some amount of eliminating itheses um it’s not great at this it’s pretty bad it messes it up a lot of the time but it does in fact do correct it does correct itself um some of the time and it does it does like you know analyze the situation sort of more in a system two way where I have it describe what the different grids look like and notice commonalities and and it’s it messes it up a lot to be clear like it’s not if you looked at these reasoning traces if you looked at the average reasoning Trace you might not be impressed but it does have sort of I would say a spark of the of the actual reasoning of the actual ability in there and then in addition to that I also do a second pass which is which is quite helpful which is I take Solutions where it seemed close uh like it seemed like the model was kind of getting close to it but it messed something up and I I give it to the model again in another context and I say hey here’s a solution that that some idiot wrote I wonder who wrote this can you fix it and and make it so it actually works here’s what it actually output on the examples here’s what it was supposed to Output you should fix the code no I actually I tell the model that it was a previous version of it that wrote it but I I think I I I just I don’t know I think it’s sort of funny that it’s like it’s just getting this thing from the blue and being like yeah someone wrote this now you have to fix it and it’s just sort of appearing into the void because it doesn’t really have much context doesn’t have much State and it just has to fix this program that that someone else wrote yeah yeah yeah I mean you you said a few things there that made me think of three things I mean first of all there’s a fundamental search versus framing dichotomy and you know a lot of the early Solutions just did an exhaustive search a bit like an early chess algorithm where you do you know basically it’s reell and’s generating a bunch of completions and you’re doing some refinements and so on and you could argue that I mean imagine there’s a vend diagram of the knowledge in in the the language model and the knowledge in in the arc problem and if they mostly intersect then you know in theory there’s some solution which is traversible in inside the language model so I’m not sure what you think about that but there’s also the thing about doing these reflective steps and you know the of thought reasoning and so on um because you can almost think of that as an agential potential set of directories trajectories so so if you imagine I’m setting up the Chain of Thought I’m setting up the prompts and I think of language models a bit like a database query so what you’re essentially doing is you’re creating a cone of possible retrieval trajectories and you’re also eliciting external information from the you know the evaluation function which in this case is the python um interpreter and you know you’re kind of like traversing that and it’s already interesting that you know the ions are so easily found I mean I know we’re talking about 30,000 completions but that’s actually a lot lower than I than I thought it might be which is quite good yeah so my most recent solution uses closer to 1,000 completions um oh wow okay so tell me about that yeah so so the for the most recent submission on the public leaderboard I use fewer submissions per problem basically uh because I wanted to run it on more problems and there was a cost budget and I I I included a few minor improvements as part of that or as part of that like resubmission uh but yeah so so so it does include um uh fewer Sol like fewer Solutions um I should also note that uh there were there’s we can look at the curve of how many solutions you use versus how well do you perform and it’s not that it’s in some sense it’s it’s from from some perspective the curve is very brutal and some from some perspective the curve is very optimistic about language model abilities because there’s really not amazing returns to adding more samples so each roughly with my Approach I think it’s something like each doubling of samples gets you something like maybe about four or 5% additional right with my most recent solution if you account for all the different steps that I I’m using and so I’m like okay we’re doubling we’re like we’re exponentially growing the number of samples we’re using and we’re getting a fixed return in terms of what we’re solving and it it’s quite brutal because like you know if you wanted to get an additional 30% correct you would need 10 doublings uh 10 doublings is what like over it’s like 2,000 or something I’m like that’s a lot of doublings like uh but at the same time uh it is the case that you’re that that means that at at at a small number of of doublings like at only like let’s say I don’t know about uh 128 total completions we’re getting non-trivial performance so I think for example with my uh with with my most recent approach that I was submitting with about 256 completions I think I was getting about um maybe around like 35% accuracy or so um and now that’s I think that’s quite a bit worse than than humans would do but it’s not like wildly worse than what humans do my guess is on the the test set that I was evaluating like sort of a typical human performance is maybe maybe around 70% I don’t really know my performance is probably quite a bit higher than that maybe but I’m you know I’m practiced I I uh probably probably better at solving these problems than difficult people are uh and and maybe I get like 90 95% and and and maybe a reasonable fraction of my mistakes are due to like messing up typing things into the like clicking the Grid or whatever they’re so they’re less interesting uh but but it is it it’s getting something and like some of the puzzles it’s getting are not that trivial you know uh even in even in the case where it has this many samples and if you look at the reasoning traces it’s pretty often that it sort of gets the idea but it something up like it’ll it’ll get like oh there’s like something with horizontal lines and I’m supposed to like complete them or like add War horizontal lines and it’ll be getting the idea but then despite getting the idea it messes up one of the details or sometimes it gets the the rule basically right but messes up the implementation like it’ll it’ll have done all this hard work and have just you know it’ll have luckily got on the rule and then it’ll have an off by one error and it’s like no you’re so close uh but but yeah yeah I I enjoyed reading your your your blog about that yeah it was it was surprising and I I’ve tried a few examples as well and and similarly I was impressed by um the ability of gpc4 to get the to get the outline of the solution but then as you say there there’s a surprising number of of like you know just just stupid little errors that that that shouldn’t be there but a couple of things um I wanted to point out so when I spoke of Jack he is talking about um essentially you know fine-tuning and argumenting and doing so-called active inference so he’s actually um retraining the base model so what’s very interesting about your solution is that you don’t do that right so so you just you’re using a foundation model and then you’re doing a whole bunch of like you know computation um you know like a pattern of computation but but freezing the base model which is very interesting um but I wanted to talk to a little bit the um you know what what the trade-offs are there so um I’m going to use the term aspect ratio to talk about the kind of the trade-off between doing lots of initial completions versus doing deep completions which is to say doing many steps of of refinement um what’s the tradeoff there yeah so I’ll start with the the aspect ratio question then maybe I’ll get back into some of the trade-offs on active inference versus something else so as far as far as that um a really really important aspect of of how my solution ends up working and and and the choices I made is that I am really heavily using the open AI like end completions feature or just like a feature that that various LM apis have is the ability to get out and compete where you only have to pay for the prompt once so we’ll have this language model we’ll give it a prompt and we get out a completion and by default every time we get a completion we have to pay for the model processing the entire prompt but uh it turns out because of some sort of clever details about how the Transformer architecture at least a causal Transformer architecture works that we can actually like sort of save the prefix and just process the prefix once and get out a bunch of completions uh and and there’s some non-trivial aspects of this but uh basically there’s a a bunch of different like um you know apis and librar support the option of being like for a fixed prefix we get out a bunch of completions all at once and without this my solution would be much much much more expensive like maybe 10x more expensive um maybe yeah I think I think about 10x more expensive is about right uh and so I actually don’t really have the option of doing things that are very deep because I have this huge subsidy towards uh branching because at each point if I Branch by less than a factor of about uh 32 or 64 I’m sort of leaving money on the table uh and like I I think we should actually think of these these sorts of solutions as what is the cost per performance uh and I think I think that’s a very natural way of thinking about especially solutions for rgi but also solutions for other problems and so I basically just don’t even have the option of going very deep with the current technical affordances I have that said it’s not as though it would be impossible to go deep so I was saying before you can you can cash the prefix or you can you know save the the processing done on the prefix so that also can work for going deep you can basically have the model it generates an output and then we now cache both the the prefix and the new output it just generated and then we sample another output and we keep going with that uh and have interaction from the world back and forth there um and I should also note that that each completion is doing each completion from the model is both doing reasoning and that reasoning can be as deep as the model wants and also is generating completion but you might have you might have thought a more natural workflow especially with the solution I use is something like you generated completion you then look at what it outputs you do debugging maybe even let the model use tools like you let it use tools like like it can zoom in on the grid somewhere it can like you know try to like use some like it can like click around it can it can like visualize things using like various like visualization tools you give it where it could just like you know write some python code that that geniz generates a visualization and I I think all this totally could work and there’s no there’s no strong obstacle to making it sort of like more like an LM agent that is sort of like navigating the world and doing a bunch of actions before it eventually Zooms in on a final submission and the reason why I didn’t do that is as I was saying before like it basically it would naively have too low of a branching factor to be efficient and and it wouldn’t work that well uh another reason is that models right now are much better at sort of making a good first attempt than a debugging I don’t know if people have ever tried debugging with models but very often they’ll sort of get into get into Loops they’ll mess things up and they can do it they can do a bit of debugging especially these days like I think uh 3.5 Sonet is Maybe uh uh just just on the basis of various benchmarks um in uh anthropic sort of uh blog post it’s maybe better at doing sort of a Genentech tasks and debugging they said they had some you know pull request Benchmark where it was doing quite a bit better uh but it’s also much better than humans at drafting a working code on the first try like I don’t know if you’ve ever tried just taking a programming problem and just writing out the entire solution in one go no backtracking it’s not easy like this is not an easy thing to do and the models are just just doing it straight up and so they’re much better in some sense than us at doing that but they’re worse at the process of being like Oh I drafted the solution but there’s this issue I need to fix this and and people are you know this is improving over time people are training them to do this better uh but for that reason sort of the width is maybe both cheaper for the the affordances I had access to and also it’s sort of the LM specialty is sample a bunch of things pick the best one uh and I think this is a common pattern is sample the best things or sample a bunch of things pick the best one okay so that’s that’s the that’s the the sort of the depth versus I don’t know if width is the right word aspect ratio you were saying question yeah yeah so and I was quite intrigued by what you’re saying actually about you know so there could be some agential solution where um an agent is actually kind of you know sampling information from the grid and doing a bunch of stuff and you know essentially I can imagine some kind of a graph of computation and we might be able to um you know make some statements about the efficiency of reasoning based on the shape of that graph and how much work is being done but one thing that does come out to me is that you know this is a neuros symbolic method so that graph has heterogeneous compute which means it has like a mix of different computation models so it’s got some llm inference and it’s got some some touring machines it’s got the python interpreter and so on so we’re mixing a bunch of things together and I think that’s relevant because I I think having that hybrid approach is important but what do you think about that yeah so I I feel like the term neuros symbolic it feels a bit misleading in this case um I I agree I agree applies for the reasons you said like if you’re going to be like well it’s using there’s some aspect which is using the python interpreter therefore it’s neuros symbolic that that’s fine that’s a fine way to define a term but then in that case we should be like great so software engineering is a heavily neuros symbolic discipline because when I software engineer you know I run code I run linters I I’m constantly looking at the outputs of programs I’m using tools I’m I’m using static analysis tools and you know and that’s fine we can be like great so a key aspect of software engineering productivity is giving access to a bunch of tools I think it makes maybe gives people the wrong View of what’s going on uh in terms of how the typical connotations of the word um in in this case because I at least for me the connotations of the word are more like a core aspect of the processing of the of the system in terms of how the system is doing like real like honest to goodness thinking is in the is in the is in like symbolic representations um that are you know sort of you know using like uh computer programs as opposed to learned um and I don’t think that’s super accurate especially in in the case of what I was doing now you know there’s a spectrum like there’s a question of like how much is one thing versus how much is another thing I think my solution is it’s certainly much more neuros symbolic than if it was the solution was just you have the model you prompt it it generates in one go it outputs the completed grid as sort of just like an output in tokens but at the same time I think it’s it’s much less neuros symbolic than approaches you can imagine for example or like much less I guess symbolic on the neuros symbolic side uh than approaches you could imagine which are more like uh you know explicitly representing a bunch of symbolic States and doing sort of um you know something that’s much more uh like where I guess more of the like structure is being built up at that level um whereas it feels like a lot of the structure from in the case in the approach I’m using is basically coming from the fact that the model writes code uh and you can even cut the part where we do revision you can basically just be like the solution is just the model takes in a problem it does some reasoning it outputs some code we pick the best piece of code and then now there’s like basically one one symbolic component which is or like oneish symbolic component which is picking the best piece of code and then generating the corresponding thing for that piece of code and indeed that that’s an important component it helps a lot but it’s it feel it feels like it’s more analogous to like if I’m there’s a bunch of problems where if I was to solve them the way I would solve them is I would write code to solve them I wouldn’t I wouldn’t do it myself I don’t know yeah sry pushing back there but there there there’s more content here but but um yeah you want to continue with well just a quick comment on that I mean it’s not just picking the code it’s also running the code Because the actual artifact that you have produced is a symbolic program now um I believe that to solve something like an arc challenge you need to have a hybrid um architecture which means you need to have a type one component and a type two component and the interesting thing with with your solution is you’ve gone type one then type two so you kind of you know you’ve got the intuition bit you generate a program you know which you evaluate and select on but you also run the program because that’s the artifact and this is what charet said he said like an intelligent system is a thing which produces skill programs and then because it is a type two program it runs on a touring machine that means it can have an expandable memory it can like you know um address a potentially infinite number of possible situations because these two machines have infinite tapes Jack Cole said that um you know symbolic type two programs are narrow but deep and you know type one machine learning uh type programs are wide but but shallow so having that as an output artifact is clearly very very robust because you know if you have a well- described program in Python that does this kind of geometric manipulation on on the 2D Grid it’s clearly going to work in an infinite number of situations in the same way that your Python program uh you know your your interpreter it can accept an infinite number of possible python programs because it’s just a different model of computation yeah so I again I again want to push back here a bit so I’m like suppose that a human solved AR AGI via first thinking about the puzzle and then writing a program to solve the the problem I I think I would say they’re doing system to reasoning when they’re writing the program not just when the program is being evaluated and yes they’re also doing like you know type type like you know system one reasoning when they’re when they’re doing that they’re doing both and I would say think it’s also the case that in the approach I’m using neither the the program evaluation nor the uh language model cleanly maps onto purely like something well described as as system one system two style reasoning or like maybe the program evaluation Maps clearly onto system two but it’s also compensating for system one weak weaknesses so like the way I would think about it is like the model it takes in some programs it has some intuition it starts generating on the basis of that intuition maybe it’s also doing system 2 reasoning internally probably it’s doing very limited system 2 reasoning internally because you know models are quite dumb uh and also they’re quite shallow but it’s probably doing I would say it’s probably doing like a little bit of something that’s well described as system to reasoning internally uh and then it then starts generating and it’s generating these tokens and these tokens are doing system two reasoning so it’s it’s it’s doing stuff like starting to outline what’s going on in the grids describing that in words you know considering a few things on the basis of what it’s described in words maybe rejecting some hypothesis in some cases or refining its guests uh while it’s generating uh based on what it’s seen so far uh and then generates this program and then the program is passed off to some process and that process is I I would say in some sense it’s compensating like as in we’re doing selection to compensate for the fact that model fails sometimes and it’s compensating both for system one failures and for system two failures like uh it’s compensating for both of these uh and I would also say that like in humans there’s a clear there’s a relatively nice system one system two divide like I think this is a very good model for thinking about reasoning in humans it’s not very clear to me that this will be a good model for thinking about reasoning in future systems constructed out of AI uh it might be I I don’t think there’s a strong reason to think it won’t be I mean you know it applies in one case why not apply another but I I think we should be careful to avoid overfixation on one model um though though I think uh it doesn’t seem like a a terrible model in the case we at we’re talking about right now so yeah I don’t I don’t think we’re a million miles away from each other I I think um it could be said that the you know the the llm component is as if it’s doing system 2 because it’s doing a kind of interpolative form of of system 2 reasoning rather than an inventive form of system two reasoning and there was I mean won’t go into this but there was a great paper by foder and and poition in 1988 called a critique of connectionism and they were just basically arguing and this goes back to the days of Hinton you know that neuron networks that they don’t do symbols because um all of the downstream operations Downstream operations do not um maintain the intentional structure of the computation that’s in that’s a different type of intentional that’s intentional spelled with an S so it doesn’t sort of like maintain the the underlying symbolic algebra of of the operations but then again like you can look at the output of of llms and they do seem to be doing symbolic stuff right you know they can do they can do all of this symbolic manipulation and and and they’re quite consistent and I think it’s a bit of an illusion to think that they work in the same way I I just think that there’s a huge space of kind of you know colloquial interpretative effective reasoning that looks as if it’s reasoning but there is a difference in kind um yeah I I I I I I think we yeah I think I think we’re not bad far off but I I do disagree here I I think I’m like well suppose suppose that I that we just play the Turing Turing Turing um like you know we we uh we do a Turing test on our our language model and we’ve got one human and we’ve got one language model and you’re like damn the language model really seems like it’s doing okay here it’s really doing a lot of stuff like maybe it’s doing maybe the reasoning it’s happening in Chain of Thought maybe it’s happening inside the language model I’m just like at some point if the language model was good enough you have to be like well I don’t know what it’s doing I don’t know if it’s interpretative I don’t know if it’s that but it’s accomplishing the same objectives behaviorally it’s doing the same thing and I’m like who am I to judge who am I to judge why like why it works I mean it’s interesting why it works uh but I I I think we should be careful to jump to conclusions about about like what the limiting factor is when we don’t really understand these systems and we don’t understand what’s going on internally my guess is that if you were to look under the hood and really understand what’s going on you would both see a huge amount of intuition a huge amount of knowledge and they would sort of be cheating in a bunch of tasks in this perspective or you know doing them in a in a very unhuman likee way but at the same time my guess is that at the at the end of the day there’s a bunch of cases where they’re actually doing the thing the like the like right way or like doing it in like a way that actually involves relatively General abilities um and and it’s you know it’s narrow now they’re compensating for this but I think I think it’s building up over time and I think I’m like you know uh you you maybe start with like shallower heuristics you build towards deeper heuristics and to deeper heuristics and I’m like system 2 reasoning is just a special case of deeper heris sixs that the model might learn over a bunch of training now the Transformer architecture it has some ways in which it it’s less well suited to this sort of reasoning but there’s no fundamental computational reason why this this can’t happen right so there’s there’s limited depth but I don’t know a human in 5 seconds also has limited depth and nonetheless a human in 5 seconds can do things that are well described as a bit of system to reasoning and and I’m like and like it seems to me like the models can often do tasks in one forward pass without doing any generation that a human would would would you know take 20 seconds to do and would have to really think about and they’re probably cheating a little bit doing a little system to reasoning mixing it all up uh that’s that’s one thing the second thing is it’s worth noting that models can use chain of thoughts they can do this reasoning in in in words that I think can in many cases correspond to pretty closely to how a human would do the problem so like when models solve math problems it seems to me like it’s very similar to how a human would solve a math problem in many cases where you’re like well uh I’m you know I’m starting I’m going to first add these numbers then I’m going to multiply by that then I’m going to like think about what the next step is and do that next step and and each of these steps happens one at a time and they potentially can change course now there’s various ways in which the models are doing it somewhat differently like there’s this um interesting work people have looked at where they’re looking at like what is causally responsible in the Chain of Thought uh and like you know what so like the model generates some stuff and it’s like well first it does like 1 plus two and then it does multiplies that by seven and you like suppose you just change the seven to be eight or you know you change the intermediate results actually really so it’s really you change one plus two to be from three to being four and then and then you keep going how often does this mess up the model and it turns out that at least for really for for the biggest most powerful models we have now changing any intermediate value only affects it by a little bit doesn’t doesn’t doesn’t doesn’t butcher it you know you might have thought that if you changeed it to four it would always get the next number wrong but it turns out that the model is sort of doing a bit of a sort of like more wishy-washy thing which I think humans also do so like for example if you’re writing out your Chain of Thought or like you know you’re writing out your current reasoning sometimes you’ll make mistakes that you’ll then not pick up on like as then you’ll sometimes do one plus two you’ll write four and you’ll kind of have known that you meant three but you know you made a mistake in what you’re writing or you thought the wrong thing and then you’ll you’ll sort of keep moving on uh it’s also possible the model sort of recognizes the error like it it just knows really strongly that that four is wrong and it’s like well there was an error there but what’s the most likely guess for the next thing it’s still the correct thing because even though there’s an error there continuing it correctly is still more likely and it’s able to sort of um compensate for the errors anyway yeah yeah yeah yeah well I mean this reminds me of I mean this is the beauty of connectionism and it goes to soft you know Andre Kathy software 2.0 article where he’s basically talking about this Progressive um you know diminishment of um Fidelity in the neural network and it still does the same thing just just half as well and I agree you know maybe in in some modes of our cognition we do the same thing and this is what a lot of connectionists think I interviewed Nick chater and he basically thinks that the mind is flat and it’s just an inscrutable you know blob of neurons just like a neuron Network and actually you know all of this kind of um type two reasoning that we think we have in terms of intentionality and consistent beliefs and so on you know that they’re Just an Illusion actually we’re much more similar to neur neural networks than than we think but um this does go into a couple of interesting directions though so um I agree with you that people both overestimate and underestimate the capabilities of neural networks in in my view oh yes yes in my view I like to hone in on on the agency question and and the reasoning question and even the um the the uh the creativity question for me is the same as inventive reasoning right so I think that the differences in whether people see them as tools or agents and for me and a agent is very important by the way so there’s a minimal agent which is just where the cause of the behavior comes from the inside not the outside so like a little bacterium going up a sugar gradient is um a minimal agent because you know the behavior came from the inside but obviously the strongest form of agent is an intentional agent so it’s actually a thing that sets its own goals and like it wants to shape the world around it and so on Y and and I I believe that neural networks don’t have the um the inventive reasoning or the the agency and and I’ll explain why I think that there’s this kind of long tale of um you know brittleness with these models and because they’re used as tools the humans don’t notice that they’re doing the reasoning so when we do the prompt we’re actually doing the reasoning and obviously the the the cause of the action and the goal came came from us and then what happens is you know you do the Chain of Thought and you and you do all of this prompting and you create this kind of trajectory so you can set up autonomy in the agent and it can still follow an interesting trajectory of possible patterns but it’s converged right it’s not open-ended in the same way we are and as I said you could just argue that actually we don’t have agency either that that’s a separate issue but you see what I mean you just kind of got this this fixed trajectory and we and we anthropomorphize and we misassign you know like the reasoning and and the agency to the models yeah okay so so I guess one one one thing that’s worth noting is I do think that there’s structure in the brain but you know there’s it’s both the case that there’s structure in the brain and that there’s sort of uh basically you know connectionist Paradise in the brain there’s like totally uniform tissue which is which is learning from my understanding you know I’m not a neuroscientist but but my understanding is that we think like people the the current state-ofthe-art in Neuroscience thinks that the frontal cortex is highly uniform um and in fact like a lot of the the reasoning that we’re most impressed by is grown out of uh what is what is basically B uniform though there’s you know some structure imposed by recurrence Etc uh and and I I think that I think that’s I think that’s worth noting uh second of all it’s worth noting that LMS are also not fully connected MLPs have structure we designed an architecture and in the same way that EV like I would say the the analogy I would use is like Evolution designed the structure of our brain and created a bunch of uniform stuff that can like learn from experience and in the same way human designers and and also potentially like you know people have done algorithmic search for architectures but but human designers and selection on on on human designs have created this this neural network architecture and that neural network architecture is uh you know relatively uniform but it’s learning like throughout lifetime so I would say like an llm training run is sort of very similar in my mind to within lifetime learning um I’m I’m stealing this analogy from uh Steve I think Steve Burns who’s someone I I read sometimes uh and and I think I’m like they seem pretty analogous to me now there’s a bunch of ways in which they’re quite different at the moment so for example LMS I think I forget the exact calculation but if you imagine each second is uh each token is 1 second maybe they live for like 30 million years I’m like that’s a long lifetime like that’s a really long lifetime if you’re an LM I don’t live for 30 million years I uh that and it’s not where where a lot of my action is coming from in addition to living for 30 million years if you start at year 30 in the elm’s life it sucks it’s totally s it totally sucks it can’t do anything and so it really needed a huge huge lifetime but at the same time like uh I think it you can also imagine that it’s basically sort of it’s sort of living each moment a little less like the the LMS learn less from each moment but it’s also computationally cheaper like uh current LMS spend less compute on each token than a human spends on each second based on our understanding of how much the brain is doing so they sort of like sort of shallowly living a much much longer time and you can kind of see why this would be good like if you could prefer to spend as much resources on twice as much time but live it a little more shallowly my guess is you’d prefer that uh anyway so this this is this is just uh thinking about that and and the in context question my guess is that indeed structure helps I think I’m just I’m just reluctant to say so you were saying like maybe the the dichotomy you were presenting is either the neural networks don’t have X or humans don’t have X but I think there’s an intermediate regime which is like humans do have a but X is learned like I think a bunch of the agency humans have is actually learned and a bunch of the ability humans have to like process and manipulate symbols is actually learned uh and and some of it some of it is is baked into the architecture some of it’s learned we don’t know which because we don’t understand how the brain works but it’s it’s it’s probably some of both and like in fact you can learn how to manipulate symbols better as you were noting earlier and I think it is general like I think I think there’s there’s some stuff that’s baked in that’s basically baked into the architecture and there’s some stuff that you can learn get better at and there’s lots of things that people can can acquire in terms of skills here and I think that the language models are probably somewhat similar uh where they can probably you know learn how to learn basically where like humans can like learn processes for better like looking at stuff and then figuring out things quickly and I think the LMS are similar so there’s you know in context learning where you have a bunch of data that you give in the context of the language model and the language model Can Come Away with conclusions that I don’t know they look awfully similar to how humans would learn from text and maybe they’re different fundamentally maybe there’s something fundamentally different going on it’s certainly possible it’s hard to rule out because we don’t know what’s going on and and and we know we can we can say some things are different so it’s it’s much shallower so like current language models they they have much shallower reasoning and when they’re processing a long document they’re not reading through it a paragraph at a time they’re reading through all the paragraphs at the same time and then doing some mixing on top of that so it’s it’s definitely somewhat different but I’m like I don’t know it seems like it has the spark of learning to me I’m like it really feels to me like if you were going to draw a line that was like here’s where learning here’s where real learning starts that line would be crossed uh and so I think I’m like they have some learning they have some reasoning they just kind of suck at it and that’s I feel like that’s a lot of my view uh now getting into the agency question which I think is also interesting maybe I’ll I sorry I’ll give you a second to respond to that and then I’ll get into agency well no I think the agency is actually um a very interesting interesting thing I wanted to clarify as well that I’m not necessarily an agency realist you know I think of myself more as a as a agency instrumentalist you know which is that it’s an as if property you know along the lines that Daniel dennet would would argue but I do think that there is a fundamental difference in kind between the kind of agency we have because I’m an emergentist you know we we’re embodied in in the physical world and you get all of these I’m a big fan of the free energy principle by KL friston so you know you get these like um emergence of self-organization and information uh you know sharing across scales and so on and you know we we’re kind of like um embedded in this very complex physical dynamical system and that gives rise to a lot of the the function and Dynamics and behavior you know which U you you now you could argue that the agency is is a very um you know real way to partition that system or you could just argue that it’s a mode of cognition but you certainly could argue that there’s a difference in kind between that dynamical system and and the kind of thing that’s happening in in an llm simply because clearly just if you look at the um the causal the causal factorization of the system we are using llms as tools they are not the cause of their own actions yeah so I mean it’s going to depend on the system right so you can also use humans as tools like I can also have a warehouse full of humans and I can say please solve task X I can give them the exact same prompt I given LM like you can you can you know have have you heard of Mechanical Turk like you know you can you can you could in fact I can write a prompt for my LM I can give it to an LM I can give it to a human results look similar to me right so so there’s a sense in which humans can also be used as tools in the same sense that LMS can be used as tools now that said uh it is the case that like and and then people can you know try to construct agents I would say both out of llms and using llms so like in some sense like the the way people typically construct agents is via like a combined system of a LM trained with RL and Scaffolding around that system which prompts it and you know tells it what it should do I would say that from my perspective the scaffolding it’s more invasive than how you would get a human to-do agency so like people typically don’t have uh you know they don’t build scaffolds around their human employees but it’s also like I’m just like humans do benefit from this stuff like if you want to get a human to be more agentic and in fact humans are not arbitrarily agentic they sometimes of just do the same old thing and don’t really think about what they’re doing you you do things like you give them checklists you periodically are like hey are you making a mistake like you know and like there’s this um uh this this this uh quote this anecdote I love from um meter uh which is an organization that works on assessing model capabilities and and and part of their work is in constructing AI agents to answer the question of like how good are LMS at agency which is that they found that very often at least at least for earlier models it was very helpful to ask the model are you confused and they would just periodically ask ask the model uh throughout its reasoning are you confused and the model would be like oh I was confused what a good point and then it would fix itself and I think this works for humans too like I think I think for humans you’ll often end up in a regime where you’re sort of sort of doing the same thing on on habit and you need a reminder from the outside that that you can do that and of course it’s possible for humans to generate the reminder it’s also possible for LM to generate the reminders you know you can train LMS or you can produce an LM system that ultimately knows that it should check that it that it’s confused at least in principle you could produce such a system now in practice what’s going on is that LMS are really dumb and so like my guess is what’s going on is basically like people try to construct agents out of LMS they’re quite dumb they’re also you know trained in a very different way than humans they kind of suck at agency and so you end up with a system that that does like kind of derpy does a little bit of agency it’s got a too high of a failure rate it ends up falling on its face all the time but it can get some stuff done right like you can in fact have a system consisting of let’s say like 3.5 Sonet that goes around and and and it’s like oh how should I start it thinks of a plan then executes that plan it it proceeds with that plan it runs into obstacles with the plan it revises the plan it notices that the thing it tried that it thought would work didn’t work based on outcomes in the world like it runs the code the code has an error it looks into the error it realizes the error doesn’t work it it it it puts in some debugging into the code to notice to figure that out and like the thing I’m describing is like you know it’s like the it it’s like the the thing working better than you than than it typically does right like I think the more typical case for a lot of hard cases is the model just falls on its face but I’m like sometimes the models they go for it you know they really they keep going the loop continues they proceed um and and I would say in this case it really feels to me like the agency is actually coming from the model itself uh to a to a high extent like you know you you gave it advice like you gave it a prompt that tells it how it should how it should should do that and you gave it access to tools but I would say it’s very similar to the human case right so like when we train humans we train humans to to to like you know follow processes like when I was learning to write software I both learned via an iterative process where I tried stuff and things didn’t work and then I learn a process but I also learned by pairing with other humans and imitating those humans and imitating how they learned to do agency and like you agency is a learn skill you know you can learn the question like you can learn the answer to the question of like when should you stop and reconsider what you’re doing when should you like give up on your current plan and think of a new plan like how should you select what are good plans like what are the properties of good plans all this stuff I I I I I think is there and and I yeah so I I guess I’m just like my view is much more like there’s not this hard boundary of no agent versus Agent there’s some continuous thing of like how good is it at agentic tasks how good is it understanding it situation acting coherently on the basis of that acting plans thinking of plans like putting together things over the long run recovering from its mistakes just all these abilities and models are getting better at them over time you know if you looked at gpt3 you would not be impressed like if you looked at the models that we had three four years ago you would not be impressed by how they worked as agents they didn’t work at agent as agents at all if you looked at GPD 3.5 you would not be impressed like that model cannot do agency at all it can’t understand its prompt it doesn’t understand how to use tools but then you look at gbd4 I don’t know there’s a big jump like it’s much better at looking at tools it’s much or at using tools it’s much better at a lot of this stuff and and I’m like well you know things are progressing rapidly like if you looked at an otter I’m like if you had an otter and you you give it access to all these tools you tell it what to do I’m like it’s going to fall on its face it’s not going to program any software uh and and I’m like you know it it’s got a combination of it wasn’t trained to to to write code like the the Otter’s upbringing wasn’t very useful for this just true for the LMS the lm’s upbringing wasn’t very good for agentic tasks but it’s also the case that the otter is kind of dumb you know it’s an otter uh it’s going to struggle to understand what’s going on uh and I I think I’m just like the situation feels much more like we have like uh a system that’s that’s like kind of just like a very it’s very dumb but it it’s got some stuff going on it’s very weak uh and like the models are getting better uh and I I also wouldn’t index too hard on current architectures you know things are evolving uh architectures will change in so far as agency is really useful um I don’t know if you’re familiar with this but G has this essay uh which I which I think it you know it’s it’s old it’s an oldie but a goodie which is uh like I think it’s like uh AI want to be uh agents not tools I might be messing up the name there but it’s it’s something like this and it’s like well people are going to want to make their AIS into agents uh and it’s just very natural and people are in fact already doing this like it’s very natural that I want to be like hey can you solve this task and I want the model to go out I want it to try stuff I want it to interact with the world I want it to notice its failures I want it to to to to learn things I want it to improve over time I wanted to take active actions that I didn’t explicitly instruct it to do that’s just going to make it much more useful like when I when when when you employ humans you don’t want them to just you know sit do exactly what you say and nothing else you want them to to really pursue things and I I think there’s just going to be a Natural Evolution where AI become more agentic more situationally aware people train them in this direction over time yeah s some of a long response but no no I mean I agree with a lot of that I mean first of all I’m I’m not a human chauvinist and I agree with you that there is a a spectrum of agential Dynamics in in in a potential system that we might build right and if you think about it the Spectrum goes from you know actual you know 10 out of 10 agency all the way to sort of like just aut autonomy or um automaticity you know where the things are just kind of um impulse response machines on on the other end of of the spectrum I agree with a lot of what you were saying about you know um this uh reflection and deliberation is is very important but a lot of it as well it goes to um I mean when I when I spoke with Kenneth Stanley he’s got this research about open-endedness and he he wrote this great book why great cannot be planned and when I interviewed him I realized he was basically talking about agency and he was saying that um you know a really good system is a system that has diversity preservation and it accumulates information all the time and when you set up agential Frameworks in llms they kind of mode collapse and they converge they stop accumulating information they they just kind of become sclerotic and you can try and counteract this by you know let’s say you might have a controller an external controller where you constantly say are you sure are you sure about this have you thought about this um or you could actually set up a multi-agent system and that will increase uh you know diversity preservation and Divergence as well so the agents can be saying to each other are you sure about this why don’t you think about this why don’t you do this but um llms though still have a fundamental limitation in the sense that they are basically just a statistical representation of their training set the um the way that they’re trained and the inducted prize and so on you know forces them to focus on the high frequency attributes so like a lot of the a lot of the um the kind of the the low probability um long tail just gets kind of snip off and that’s why when I use GPT it just tends to just not want to give me more verbosity and more information it always wants to give me the simple thing always wants to give me the boilerplate thing now you can try really hard you can be super specific you can say no no no I really want this thing I want you to do this and this and this and and please please please give it to me but I’m the driving force of that so I guess what I’m saying is yeah I I agree that we could go quite far on that spectrum of AG gental Dynamics but um the only reason why it’s different for us is is because we live in in the very complex physical world there’s just so many like sources of data generation and we’re doing active inference so we’re actually updating our weights and like us as agents are kind of diverging and then sharing information and then diverging and it it feels like a difference in kind to me yeah so let me let me respond to a few things I I think you made a bunch of specific falsifiable empirical claims uh or like maybe maybe somewhat vague but they’re empirical claims and they’re falsifiable and honestly I think a bunch of them maybe are already been falsified or like uh I mean it’s going to be a matter of degree but they they seem at least uh somewhat false to me uh so you were saying that models only give you a boilerplate response um I think you should think of this as a result of the exact rhf process that was applied and potentially some limitations of that and not necessarily something that’s very fundamental to the system so like I think the reason why uh at least like you know chat GPT has a strong bias towards giving you the boiler plate response is because a combination of like that’s what worked best for it to get human approval in the r HF process and this is both because it you know it’s kind of dumb and so kind of keeping it simple is the best approach for it it’s not that good uh and also because like uh it’s just like they wasn’t really rewarded for for doing the thing you want and I think in practice if you were to start with a model that had higher entropy and then rlh f it to your exact desires I think it would actually you would actually be much more impressed potentially um another claim you made was you claimed that uh models are just a result of their training processes they’re just a statistical approximation of their training processes I I think I have a hard time operationalizing this in a way that either doesn’t either seem vacuous or false um though it’s an empirical claim and we’re it’s it’s somewhat tricky to adjudicate but I’m like well humans I don’t know they’re also a statistical function of their training process they’re running some like statistical learning algorithm they’re doing some update like there’s some internal update rule I don’t know if you’re familiar with like predictive processing but there’s some uh in internal operation where it’s like internal parts of parts of you and like parts of your brain are learning to predict the results of other parts of your brain or predict the results of their own cognition potentially I don’t know sorry I’m sorry if I butchered predictive processing for those who are more familiar uh but but I’m like I don’t know it sounds very similar to what the the language models are doing uh in some ways and then there I think there are fun there are limitations but I think I would say the fundamental limitation from my perspective is in terms of depth not in terms of the exact training algorithm necessarily is well described as producing a statistical proximation of the training data so so so just because the training objective for a high fraction of training is predict to the next word does not mean that the resulting system does not inside of itself do computation that is that is you know uh is generalizable and and it involves involves like actually doing thinking uh for example as I were saying with predictive processing it seems like a b a large part of of human cognition is about predicting things and is just ongoingly predicting various other parts of the system uh and and and I’m like well and predicting the outside world also to be clear and I’m like it just turns out that’s a really rich way to do learning because there’s a lot of interesting Dynamics in world that you want to model and and a large part of being smart is understanding the Dynamics in the world and being able to predict the Dynamics in the world and having like a you know a simple model of the Dynamics in the world you can manipulate um now I should say LMS are very limited on depth so a limitation that they have is unlike humans at runtime an llm has only so much depth that they can still do operations in that depth so they have activations they are doing manipulation on that activations I think this is pretty analogous to human learning within a short time frame so humans can do a bit of learning over the course of a minute or two or like and that learning is I think from my understanding of Neuroscience qualitatively different than the learning they do for a longer durations and I I think so what’s going on inside an LM is you have many layers those layers are manipulating activations potentially editing the activations like refining the activations uh adding knowledge to the activations and any of these things potentially we don’t we don’t know exactly uh in the human brain my guess is there’s something that’s well described as activations or is very analogous to activations the human brain over the course of a minute is sort of manipulating these activations storing the activations representing the activations evolving the state and my guess is that human light learning within a few minutes looks actually very similar to llm learning within a few minutes in in in some ways I mean obviously it’s architecturally very different but I think they’re going to have a lot of Correspondence and and it seems to me like it’s well described as learning like there there’s something going on where it’s like those activations are being updated over time to better understand the text in the input or whatever property whatever output you have in the input uh yeah so so I think I’m like yeah uh you were saying the active inference thing I’m like well it seems like to me like the internals the of the model are actually doing something that’s sort of like shitty active inference or like low dimensional active inference and they’ve got limitations they have a limited residual stream they have few numbers of layers so they can’t really do arbitrary things but but I don’t know there there’s some stuff going on there well we’re not we’re not a million miles away from each other to be honest I mean like you know going going to the um to the extremely philosophical you could make the argument of you know functionalism and computationalism and that there are many ways represents you know intelligent behavior in many different substrates and many different forms and and so on and and we we largely agree on that I think it’s just a matter of of um complexity and efficiency so you know I agree that if you had like a you know a potentially infinite size llm and all of the training data and all of the compute in the world then you could probably create a simulacrum of this type of cognition you know but but for me I’m a big fan of the externalist tradition in in cognitive science so you know I think a lot of cognition is um uh external and embedded and embod and you know there are many effective cognizing elements I also think intelligence is is physical and again I’m not making any like grandio statements I think for me it’s just a matter of complexity I think you know it just it really helps that we’re embedded in in the physical world because that’s where a lot of the effective intelligence happens and that’s the main reason why I think you know even though there’s no reason in principle why it couldn’t happen in in a large language model which just we have a lot more you know Universe compute power if you like in in the real world yeah so so one thing that’s worth noting is uh it is the case that people who are blind or sorry people who are blind deaf and have a poor sense of touch can in fact they can do stuff they can learn they can learn to write code they can learn to operate in the world and and yeah they’re maybe more less efficient in some ways they have they have trouble learning some types of things but I’m like it feels to me like uh no specific sensory limitation feels that fundamental to me and my guess is that in fact if you took a baby and you put them in some environment I mean there a cruel experiment so you shouldn’t run it but if you put them in some environment where they basically experience things in kind of a analogous way to an LM but you know in ways that provide it for their basic emotional needs they don’t they don’t you know become it’s not it’s not uh the experiment works at all my guess is that they would learn and they would in fact learn to speak language uh and also you know LMS nowadays they’re multimodal they can they can in fact do that so the main limiting factor on LM training from my perspective is that throughout that training they’re not interacting with the world really so they’re they’re sort of they they never produce they what during their training they just sort of see some input and predict the the next token in the sequence quence or predict do some predictive objective as part of this they never do something like learn on the basis of their own interactions interacting with some complex system as you were noting but towards the end of training they they do so people do RL people can you can do RL in all kinds of environments people have done RL not not not on the basis of any private knowledge they were just like you know people do RL and on math people do RL on all kinds of tasks and RL does make the models better like if you train the model to learn how to uh interact with its own outputs recover from its own mistakes you know process the world way it helps and and currently this is a small part of training uh I think all the arguments you’ve been saying so far are are are are good arguments that are good arguments to think that you’re going to need a bunch of RL uh and my guess is that in fact it’s pretty likely that you need a bunch of RL uh and I think that in fact it’s pretty likely that you get agency Downstream of this RL and I can imagine a world where you have LMS you scale them up you get something going there but ultimately it’s not the most efficient way to do things you would have to scale too far to get you know real power so okay well let’s crank up the RL people are starting like people people are probably starting to do this I think it’s a very natural uh next step if you have the systems that can do these things uh people have been talking about post training and and post training improvements and I think you’ll just continue to see uh notable improve notable improvements Downstream of this like it seems like gp4 there’s been gp4 gp4 turbo a bunch of releases those releases have been the same price but have been somewhat more performant on a bunch of tasks my guess just just on the basis of speculation is that this is Downstream of a improvements in RL on them they they’ve been doing more RL and so I I agree in some sense the current systems they’re weak they’re barely showing it they haven’t been r that much they’re not that good at this stuff but I I think it’s important to note that like you know these things are a Continuum they will be evolving and and you shouldn’t be caught flat footed like things change fast like things were very different four years ago things will be very different four years from now uh my guess is that in fact llms will not scale to artificial general intelligence will not scale to AI that can automate R&D and and and you know radically speed up human production and the next four years probably not I think there’s some chance though and I think that chance is just like well I don’t know man there’s a lot of stuff going on I’m uncertain gp4 it’s pretty impressive gbt 5 could be pretty impressive who knows anyway yeah yeah well I mean there’s a chance that gbt 5 won’t be impressive and maybe update on that but yeah Big for sure yeah I mean you know I I agree in principle with a lot of stuff you’re saying I think it’s just the the in practice part that you know maybe we we slightly um you know disagree on and all of the stuff you’re saying you know very very true I’m a big fan of of the the predicted brain and and the the basian brain and the planning is inference stuff from from priston of course there there is a dichotomy between you know um intelligence and representation inside our brains versus what happens outside of our brains and and also I agree with what you’re saying yeah you know um deaf and blind people they can do cognition and of course that’s because we’re a collective intelligence there’s a whole group of us who do have multimodal cognition you know I think was it Einstein who was like sat on a pond and you know didn’t didn’t he like throw a pebble in the pond or something and that um embodied physical experience you know gave him the the abstraction to like you know figure out relativity or whatever I might be butchering that but but you see what I mean you know like um you know some of us can figure things out and then we can share it mically we’ve got this beautiful language you know like it it all works very very well um together but anyway um just coming back to this very interesting thing you’re talking about you know because you’re kind of saying well we’ve got RL and we’ve got llms and we train them with a reward or with perplexity and do you remember that simulators uh article by Janice I’m wrong not a huge fan of it honestly but uh oh I’d love I’d love your take on that I’m I’m I’m doing that as an intro from my marish Shanahan interview but but anyway it’s a similar thing to your less wrong article you know which was um about model intentionality and and the models subverting them but anyway like in RL and in llms um and Janice made this argument that there’s a potential Divergence from the training objective and and this kind of like intentionality um if you like you know which is seen as as an emergent uh property and this is something that I’m skeptical because I think that it’s it’s a it’s an intentional St it’s an asive property it doesn’t actually have that property but it’s a form of anthropomorphization and that’s my main argument uh as in your main argument is that like uh so so one one one view you could have is like AI trained on prediction tasks will never in and of themselves have intentions of course they might be able to you know predict or simulate as it may be agents with intentions and they might be able to be quickly and easily trained into agents with intentions but the the predictive thing now there’s a stronger view you could have which is actually if most of the training is prediction uh and you don’t have a big chunk of really powerful RL you won’t get intentions but these are these are importantly different views and right now we’re in an intermediate regime where or sorry we’re in like a regime where we have mostly predictive pre-trading and probably we have a small amount of RL based on like for example what it seems like llama is doing you can imagine a different regime where you have much more RL uh I think the human brain probably a high fraction of the processing power is spent on things that are similar to prediction but there’s also a bunch of like things that are more analogous to RL uh I would say that like uh it I mean it’s worth noting that you know referencing the the simulators post again like or like something like this if you’re a prediction of an agent is an agent itself you know if a sufficiently good prediction of an agent it just is the agent in some sense uh and now the question is sufficiently good right how good does it have to be uh and I would say that probably the way that you’re going to get in short timelines or like in you know in a short amount of time very powerful AI is via a mixture of you start with something like oh the AI sort of learn to predict human cognition it’s trained on a huge amount of human cognition data really in some sense the juiciest data if you want to learn to be an agent is imitating human reasoning human outputs which are just you know as you said entities doing a huge amount of agentic work in the world and then you now start with that as your initialization and you do some RL on top and this gets you a model that is able to uh you know able to basically you sort of build an agent out of the agent prediction bits that we’re lying around and now you have an agent and and I’m like well you you have you’ve created the thing and I agree that it’s pretty likely that you have to get to truly astronomical scale to get an agent merely out of the prediction you could in principle uh so like you could in principle get an agent which is like a prediction agent like you could have inside of your LM is a little little homunculus to use sort of a caricature that is like oh what should I think about in order to predict this token and it’s thinking inside of an internal cognitive work space that developed organically out of the Void that’s totally possible there’s nothing that you know rules it out the uh flops or flops they can simulate whatever uh but at the same time like I don’t know that probably like probably something that’s really well described as that takes a while to emerge the models are too shallow probably not happening but you can get a thing that has a bunch of like agent bits floating around like it has the like is this a good goal bit it has the like what should you do next bit it has the notice that you’re confused bit it learn to predict all the behavior for all these things you then do RL it sort of pieces it together SGD sort of munges these components together and you end up with an agent and I’m like I don’t know I’m I’m like and I’m like there’s a quantitative question which is like how much RL do you need how much additional secret sauce do you need or does it just work I think I have like a broad distribution over this question I I definitely think like my guess is probably you need more stuff than current models like I’d be like at least 60% 50% that more stuff is going to be needed but also that more stuff could happen in in research in the next few years uh it might not be needed it’s it’s I’m uncertain anyway yeah sorry sorry pushing pushing back to you yeah well no I mean I don’t I don’t really disagree with you I mean all all I’m saying is right now the source Ser of causality and and agency is the human so if you imagine like a kind of agential contour map of the universe yes you know it’s it’s emanating from the humans and in in a way it’s almost torical to say like yeah we could build a simulation of the universe maybe we would exist in a simulation and that would have agency because it would sufficiently represent all of the function Dynamics and behavior of everything in the universe and you know so that’s an extreme example there are probably many intermediate points on that Spectrum where like if you implement you know enough functions it’s you know high enough resolution that that would have agents maybe if you scaled llms to a to a you know ridiculously high point they would have agency we’re only disagreeing on like you know how far you had to scale for that to happen um so yeah I think we agree on that but but maybe you you should just talk about your less wrong paper because you know you were talking about this um you know the the behavior of llms potentially to subvert what you tell them to do yeah okay but maybe I want to just push just like jump back a little point because I I think it might be worth like operationalizing more specific things where we can be like get some like you know real clean disagreements in so far as they exist which maybe they don’t I think I think uh a view I have is that I would say that there’s like maybe about a I don’t know 35% chance that we’ll have in the next 10 years really powerful Ai and then when we look at that really powerful AI we’ll be like that really powerful AI was like basically descended from the large language model Paradigm you know maybe there’s some other stuff sprinkled in maybe there’s a bunch of stuff built on top of that maybe there’s elements that I don’t even know about in that regime are how do you how do you feel about that prediction I I disagree you just disagree okay yeah and and mostly it’s because as well of what we were saying with with reasoning you know so to have the kind of reasoning that we need this inventive reasoning we need to have um you know touring machines and touring machines cannot be trained with stochastic gradient scent so all we could do is create what you were kind of talking about a little bit in your paper which is you know we can we can create this computational graph and we can do reflection and prompting and we can constrain it in lots of different ways but um it will just it will bottleneck very very quickly that there’s a limit to how how much we can build an autonomous system what will happen is that as the source of agency in the the real world we will use these things as tools and don’t get me wrong it’s going to change our society dramatically beyond all recognition right it’s going to it’s going to change our interface with reality it’s going to change how we communicate with each other it’s going to create all sorts of risks I mean I’m completely on board of that but but they are going to be essentially like you know part of our mtic information ecosystem yeah so one quick thing do you agree that if people could easily create highly agentic autonomous systems that would for example autonomously do R&D and do and end projects people would would create such systems or like like in the absence of in the absence of specific reasons they don’t they would want to so so are you saying the question is people would people want to create aent or AI That’s right AB absolutely because a huge military and economic advantage and just to be absolutely clear by the way I’m arguing that a gental AI is not possible you know in practice and we probably agree morally and philosophically that if it were possible to create a gental Ai and there’s also this notion spoken about by folks in the exess community of of recursively self-improving AI I think that’s not possible in principle but even a gentic AI that could in principle you know do science and and become the the cause of all of the the major you know um events on on the planet of course I would be very worried about that I just don’t think it’s possible physically possible or likely to emerge in the next 50 years um well there there’s it’s it’s definitely possible in Principle as we agreed yep um I don’t think it’s practically possible and then there’s a separate question of if it’s physically possible and even that I’m I’m leaning towards no yeah so do you agree that if we so it seems like I have a brain I’m I’m an agentic AI at least I like to think of myself as an agentic AI I think if you took my brain and you and you made a a computational simulation of that which I think is possible so like uh best estimates indicate something like uh maybe need about uh 10 to the 15 10 to the 20 floppers uh flop to run my brain uh that I mean you know it’s it’s who knows we have a lot of uncertainty here we don’t know what Dynamics are important to simulate but that we have estimates that are something like that maybe somewhat lower uh I think that if you were able to scan my brain you you know cut it into pieces scanned it all up ran it in a computer I from my perspective I would running in the computer be like hey I’m an agent gay what are you talking about I’m offended by by your insinuation that I I don’t have that do you do you disagree with this view maybe you’re is that’s not going to be technically feasible uh I do disagree and I think it all comes down to our different view of cognition so I think I I think it’s fair to say that that you think of of brains basically you believe in in the concept of a pure intelligence and and you think intelligence is is inside of us I think that if you were to make a simulation you should be actually making a simulation of the entire physical Universe we can just give it we can just give it a little robot body just just just I I I don’t feel super my guess is we disagree but I think this is maybe not an interesting disagreement so let’s just say I’m I’m a little simulation but I also get to control my robot body the robot body is hooked up to my brain in the same way that humans can learn to control Prosthetics I have like little camera the camera is hooked up to my optic nerve in some nice way I really feel like I’m just a robot walking around you know feels very similar to my previous life though of course I’m now a little robot somewhat different what about now what about now yeah your cognition would be dramatically degraded as a result of your reduced dexterity and interacting with the environment uh so your view is like stepen Hawking has dramatically degraded cognition due to his uh like paralysis for example if he was born that way I think he would have yeah but oh but Hearn this this is kind of what oh oh yeah so so there there’s two things so I think that a lot of the way that we do reasoning and and cognition is a function of our you know social and physical embodiment so I think that if you if you grew up physically and socially embodied you would develop a lot of those mental faculties but I also agree with your earlier statement that even even in in absence of that connectedness socially and physically you you you know a lot of a lot of intelligence is shared mically but the more important question is yeah we could have cyborgs that extend our collective intelligence but the real important question is you know what if we just created like a super intelligent um you know simulation of of a thing would that be able to do science in the same way we do no no because we do that collectively as we were saying with system two reasoning the inventive component of reasoning and creativity actually happens on the margins like there’s there’s you need to have a population of of millions of people doing interesting diverse things in different situations in order to actually discover interesting new knowledge um I mean I would say like each person is contributing a bit of knowledge but I agree that you indeed do go much faster if instead of having one person you have a million people perhaps a million times faster or maybe somewhat less than that due to diminishing returns uh but and I and I also agree that I benefit from interacting with a small cohort of peers though I think that if you put me out in the desert with a computer I think I would in fact get a bunch of intellectual work done uh uh perhaps less but but I would get work done uh yeah I I think we have a pretty fundamental disagreement on on on intelligence down Downstream of this where I I really think like uh I agree that you need interaction with the world to get stuff done otherwise you know you’re just floating in the void um I also think that simulation might be cheaper than you’re imagining so if you just extrapolate out compute trend lines we will have a lot a lot of compute and it’s actually not necessarily that expensive to do things like have AIS interacting entirely inside of extreme diverse realistic video games and the AIS are just walking around doing their AI thing they learn for you know millions of years of video game time and emerge into our world after that in robot bodies I’m like if it that would if I if I thought that was necessary I would have I would be much less uh I I would think that AI in the next few years is much less likely uh I also encourage you to uh think about give given that you agree that if AI if a gentic AI was going to happen that would be a really scary Prospect it seems worth uh being ready to update or from my perspective given that I disagree it seems worth you know keeping an eye out and being like well is it agentic yet is it agentic AI yet and thinking about what the bar should be like I I I don’t know what your bar should be um I have some sense of what my bar would be um but I think it for example if it was the case that the AIS are autonomously doing novel research papers you’re impressed by that certainly crosses the bar uh and and there’s no human involved they just they just they went they just went off and did their own research paper then that would cross it and I think I’m like at that point maybe we should get concerned and hopefully we can we can both agree on that yeah yeah we can I think our bar is is just um a little bit different because actually we can analogize agency and and creativity and inventive reasoning that they’re kind of the same thing in in a sense because you you don’t necessarily need to have that that kind of like you know um self-directed causation and deliberation like you know you could argue that if we actually had an artifact which was producing new creative abstract scientific knowledge in in the way that we were then for all intents and purposes is it’s as if it had agency anyway because we could certainly run it in in a way that that made it that made it do that I’m just saying that right now I’m saying that for all practical purposes the only thing that could have agency is something which is physical which is to say us yeah so to be clear though it wasn’t just right now is in the next 50 years which is which is a much much more diverse span of time like I I agree that right now we do not have such systems I I do disagree that it would need to be physical maybe I’m uh both more optimistic that the world of text is a more Rich World though it’s it’s not super rich and I also think that simulations are maybe maybe better than than you’re imagining uh I I don’t know how much we should harp on this point but I I just want to clarify that yeah yeah well I think I think the the physicality is a big part of how we do things like abductive reasoning you know because the the the universe and the physical world it gives us so many kind of reference points to interpolate between because it’s in a sense like you could argue we are just doing Collective interpolation and the data set is the physical generating process of the real world and I think that you would need to replicate that as well in order to produce a similar type of intelligent yeah so it sounds like there’s a pretty natural empirical experiment which is a bit expensive to run which is suppose you take software Engineers Doing Hard agentic tasks we both agree that would cross our earlier stated bar like they’re doing research papers we’re going to put them in a sensory deprivation tank we’re going to give them just their laptop just the laptop which is you know AIS will have access to laptops they’ll have they’ll be computer using agents maybe where they’ll you know be computer using some I don’t know if we should call them agents uh we’ll give them access to the to to vs code they’ll be writing away in vs code we’ll give them access to their keyboard nothing else sensory deprivation tank feed them through an IV uh and there’s a question of okay suppose you had such systems they only interact through this media maybe they get to do video calls with their friends same way the AIS will be able to do video calls with their friends they’re multimodal these days though uh do does good work get produced my guess would be their productivity is degraded but not not eliminated h i I don’t know I mean if you did it Mass I think there would be a very quick mode collapse in in the effective cognition and also it’s the same thing as we said before which is to say the reason why they they’ve developed this cognitive framework is because they grew up embodied and embedded in sure so we can we have sorry we have to do a more expensive experiment we’re to put babies and sensory deprivation tanks to give them little laptop screens which are the same views as the that they’re going to grow up in the sensory deprivation tank with their IV I’m sorry it’s a very very costly experiment uh but we had to do it uh and and then we we come out do they end up producing good works that one we can’t run I’m afraid the sensory deformation tank experiment with the engineers we could maybe run um yeah I mean okay I I I I think I just disagree pretty strongly about the the nature of where intelligence has to come from and where to learn maybe my view is I’m I’m just more I’m more I’m more of a more optimistic about human intelligence than you are where I’m like human intelligence could diversely adapt to a variety of environments envir like there’s hidden complexity in all environments like python envir like a python interpreter is in some sense a very complex environment a computer is a very complex environment you can video call other people in their sensory deprivation tanks uh talk to them in words uh you know uh I yeah I I just think there’s and yeah I think it just like the world is rich in a variety of perspectives it’s also worth noting that as you said the things you’ve said so far are totally consistent with creating really powerful AI with uh robotic bodies as in we could just from the start train them as robots in the physical world and just give them the same senses as humans you know get get little Get Robotic bodies that are just the same as humans they have a sense of touch they have a sense of of they can see they have cameras they can hear we have the sensors for these things now currently our as you noted earlier are actuators they’re terrible robotics it’s a it’s a mess uh it’s not very good but like it’s improving over time but I’m like well I would say clumsy humans with a bad with that aren’t very good at at at moving around they still can do things you know they don’t they don’t seem totally impaired to me uh and I would say we could just train AIS physically in the physical world uh using the exact same setup we can also do things like train them massively in parallel and connected ways like uh one way to compensate for the weaknesses of AIS is that you can train them on a huge number of different robots you can have uh maybe millions of different robot bodies that are really the same AI that periodically get synchronized uh and and that would help it learn and help it learn faster and and I’m like this is plausible in the next 50 years you know computers are advancing robots are advancing uh it it would be a very expensive project uh and I’m not necessarily advising this project I think this would be a scary project for the reasons we said earlier but uh uh I guess I just I just think I’m like if you believe in human intelligence it really seems like the the separator can’t be that strong yeah no I know where you’re going with this and I agree that what you’ve just said is is plausible and I think that there would be um economic and you know kind of computational and physical restrictions just to you know scaling out a collective intelligence of robot armies to that point but you know like what you’re saying is is plausible I just think you know and even then you know it could be said that those things in principle could be thought to be agential and they could in principle be thought to be sharing information creating information um improvising new meaning new semantics creating new language you know whether um in our agential gravitational field may you know maybe they would start to create their own agential gravitational field all of this is possible but I think part of the X risk argument you know I don’t want to talk about xris too much but you know would be a super SL you know if there any problems the of theal you know if there were any problems we would just get rid of the robots um maybe the people who worry about this big takeoff scenario think that there would be like a recursive self self-improving intelligence yeah I I would say the the term I would use would basically be like uh I would say explosive growth due to like um you know uh like uh like I forget what the term is but um you know Tom Davidson has a has a report on this that I think basically articulates this which is basically like if you have systems and those systems are capable of improving themselves or improving other versions of themselves uh then what could happen is you have the cost drops over time so first it cost x amount of money or used x amount of Hardware to run these AIS and then somewhat after as they optimize things it now costs X over two uh and then now that there they cost X over two you can run twice as many and so now they’re making progress faster and then those ones have the cost again now you have X over four and now they’re making progress faster and the loop repeats uh I would say that we have seen this throughout human history so uh the population of humans over time has followed a hyper exponential trajectory it has not just been growing exponentially the rate of growth has accelerated over time uh and and and I would say also the rate that that’s true for economic production so the rate at which new ideas are being created has been accelerating over time as our previous ideas have helped us create new ideas so so the main mechanism by which is this has happened I would say is uh Tech is is is increased in human population so previously there were a tiny number of humans you know at some point maybe there were even only a few hundred humans doing a a bottleneck uh early in our evolutionary history and then over time the human population has grown and as the human population has grown we have been better at get generating new ideas because there’s more people to generate new ideas and then those new ideas have made it so we can support a bigger population so there’s the Green Revolution now we can support a much bigger population earlier than that there was agriculture uh you I I think something basically analogous though potentially much faster will happen in the case of a where what you’ll see is you’ll start with having AI systems which are uh basically able to make it cheaper to run themselves so maybe they uh tweak some Cuda kernels and the Cuda kernels are now faster maybe they develop a new architecture which runs a bit faster uh and then that will mean that you can now run more of those systems those systems or or better systems potentially though more is simpler to think about uh similar to the human population case and now there’s this population of AI agents that’s growing hyp exponentially because every the they they produce improvements that that double how many could run and then once there twice as many running the rate of progress is faster and they proceed now now this is an empirical claim and it’s dependent on various specific empirical contingencies so for example the question you have to ask yourself is when you double the population of Agents working on the problem uh or double the amount more simply when you double the amount of work that’s been put into a problem do you get improvements that are sufficiently large or does it sort of taper off uh and it will eventually taper off you know there are physical limits but it might taper off very late uh so I think a simple situation to imagine here is I would imagine that for humans humans are in fact on a trajectory which will slowly uh reach a reach like advanced technology and there’ll be more humans and advanced technology and you’d spread to space and you’d keep advancing and that would actually go very far uh and it would it would happen over you know hundreds of years but but it would would prede uh and I think the same could be true for AIS but potentially in a faster time scale for various reasons um now I think this would involve both would would potentially require both advancing hardware and software in tandem so the AI would be working on you know GPU R&D they the gpus would be getting better and better as the population of AIS gets bigger they are get they’re getting better and better at advancing GPU R&D the rate of GPU R&D is increasing the population is increasing and it and it goes and it goes off I think recursive self-improvement is not an amazing word to describe this because I imagine that the process is more like a whole economy of AI agents building up an economy that accelerates over time uh and and yeah I don’t know I think this is this is this is quite plausible um and I indeed think this is a a big source of the risk yeah I I think many of the things you said there are plausible I mean I push back a little bit that you said that our scientific discovery is increasing exponentially um I think that um there’s a kind of ephemerality to it maybe a better way of describing it is we’re not discovering new areas there’s a convex Hull of no knowledge and what we’re doing is we’re spending more effort kind of exploring inside that hole and actually in doing so it creates a kind of basin of Attraction and and there are you know forces that are stopping us from discovering new paradigmatically changing knowledge I agree with what you said that you know the Industrial Revolution and several other things in our history created a paradigmatic change in how we organize ourselves and AI could be that thing and also there could be other things and it won’t necessarily be the AIS that Discover it it could be us that could Discover it and it could happen at at any time um but then I’m interested to kind of ask your opinion on I mean there you know like we’re getting a new government in the UK Tomorrow there’s going to be a new Prime min to coming in and as far as I’m concerned it doesn’t make any difference because it’s going to be all the same people working in the uh in in the government doing the same thing and there are all of these like externalities and like you know financial markets and blah blah blah and you know they don’t really have much control to change anything and similarly we can’t really change climate change because you know we’ve got a whole Collective of people following their own gradient of of interest and so on so even if even if this were the case and there was some paradigmatic change in technology I mean seriously what are we actually going to do about it yeah so uh you’re saying suppose it was the case that we will actually get powerful AIS there will be some sort of feedback loop of the sort we’re describing that feedback loop will uh I think the the technical term is fum instead of fizzle uh it will grow it will you know um accelerate over time rather than decelerating it’ll be growing at a hyperexponential rate what should happen so so my I mean I obviously have a a bunch of views on this uh I think the number one thing that seems really important for a lot of people to be tracking is that we really need to know how good current AI systems are we need uh transparency into the level of capability uh and for this I would say what we need are good benchmarks that a broad Co like a broad diversity of people agree that if these benchmarks indicated that the AIS or like indicate indicate like if the AI crossed This Bar in the benchmarks then now we’ potentially be getting close to the point where this this is plausible where the AIS could in fact like accelerate AI R&D accelerate R&D in a variety of domains by huge factors like factors of 10 uh and then once that happens I think that uh dramatic response from society is warranted so so minimally I think the weights need to be secured I think that algorithmics like algorithmic Secrets become a matter of national importance uh to to Western countries uh and become potentially a huge National Security risk uh I would say that algorithmic secrets are already a huge National Security risk because this is likely to happen uh has a high chance of a reasonable chance of happening in the next few years but you know you can take what you get and I would say at least we can secure the weights and secure the algorithmic secrets once this is happening and then from there we’re going to have these agentic AI systems uh and and and and we’ll have some concerns so these agentic AI systems we might not understand how they reason we might not understand how they work they might have goals that we didn’t put in them because they do have goals they are accomplishing things over longtime Horizons uh if these systems did have these goals did have these things they were accomplishing over long time Horizons I think we want to be able to make a case that the situation is safe a case that these systems will not autonomously on their own cause bad outcomes also a case that humans won’t be able to misuse these systems won’t be able to steal these systems in ways that are catastrophic and my proposal my current proposal for how we should we should do this sort of this sort of safety case or or as it might be sufficient countermeasures to Def diffuse a risk case there’s some emperial questions about how you want to do this uh uh the thing I would be excited about doing is arguing that our countermeasures are sufficient to prevent these AIS from causing bad outcomes even if they try to cause bad outcomes so in the case of human organizations uh it’s often the case that we’ll have uh organizations where we’re worried about spies or worried about you know uh dis grenal Dex employees and we need to be able to be confident that these these employees won’t cause huge problems even if they wanted to uh like this is sometimes called like Insider threat uh concerns um and I would say we should basically handle a a large fraction of AI concerns as an Insider threat concern we should say these AIS we want to make sure that we have countermeasures we have monitoring we have oversight we have like checks and balances sufficient such that even if all of the AI all at once decided to coordinate against us decided to work against humanity decided to seek power on their own we would be able to stop them and we would be able to either catch it or prevent them from succeeding in some way i’ I mean if I was to decompose that I would I would factorize it into um preventing the the creation of new power and managing existing power so what I mean by that is if the AI became agential and you know um even though limited as as you as you kind of granted um self improving or whatever um how quickly would that happen I think it would happen very slowly because you know obviously we already know that intelligence is about making these These Lumps these jumps in abstraction space and every single jump would require um iteration in the physical world and there’ be various economic constraints and so on so I think that process would happen quite quite slowly but you’re obviously arguing that it would happen very quickly but then the other point you were making was about the the management and governance of existing power structures and that’s very very similar because even now if you wanted to create create AI um it’s almost as hard as making a a bomb you know right now you need enriched plutonium or uranium it’s very expensive it’s very difficult to get a hold of so now you’re going to need to have potentially billions of dollars on compute so it’s probably easier to build yeah it’s probably easier to build a nuclear bomb than than it is to to build the kind of agential AI system you’re talking about so in in that sense I think it’s it’s more interesting to Pivot over to the governance problem which is very similar now right so we we have various actors or um virtual agential bodies there might be governments there might be military organizations or whatever and they have incredible power and agency which is to say they have the ability to like shift world events and you know create um devastating amounts of violence and so on and and that’s why we have this kind of um Mutual controlling structure and governance and so on so I guess I’m arguing that don’t we already do this now yeah yeah yeah so so the situation I’m talking about is just a natural extension of of current systems or like of current systems in some sense so I think the the main difference I’m talking about with what I think might be the default I think that on the default trajectory it’s both pretty likely that there’ll be insufficient security it’s pretty likely that when open AI creates extremely powerful models those models will not be adequately secured they will immediately be stolen by a large number potentially maybe one maybe many foreign adversaries and and therefore undermining the potential like lead that Western countries might have uh and and I agree that National Security interests will be strong will potentially uh notice this and and and and secure the weights uh but I think I think it might be a time crunch I think even if we even if National Security eventually wakes up to these concerns and eventually realizes this is a huge issue or notices this it might be too late to ensure that the weights have not already been stolen or that the algorithmic secrets have not already been stolen and it’s going to be tricky to maneuver into a regime where the weights are secure so that’s one thing the second thing is that normally human organizations do not assume that all human all employees working for them are misaligned actively conspir ing against them and potentially willing to coordinate with other human employees to undermine them so currently human organizations work on a combination of Goodwill people having sort of you know not being like superow seeking people like you know kind of wanting the mission of the company not trying to actively undermine it and and and subverted at all points I think it’s plausible that AI systems will have these these wants and desires though it’s not obvious I think it’s I think it’s unlikely but plausible and it’s a it’s a concern worth worrying about and I think that will actually pose substantially different risks I think it’s very different to run an organization where you have hundreds of millions of AI agents running at 10x speed doing things you don’t maybe don’t understand by default where there they would if they could all coordinate against you to to screw you over versus running Google where you have 100,000 human employees you can potentially look at what they’re doing you kind of understand what they’re doing it’s precedented I think it’s a much scarier situation beyond that so now so far we’ve been talking about sort of maybe AIS that are not wildly different in their capabilities from humans uh and these AIS I think can be like controlled uh in the way I described with behavioral countermeasures I think it’s very likely that after you have AIS which are of this sort of human level shortly afterward due to AI accelerated AI R&D you will have AIS that are much more powerful than that you’ll have AIS that are much smarter than humans that are the equivalent of humans running for many many years at 10,000 speed 10,000 x speed or are equivalent to much they’re much cre more creative than humans uh minimally you can imagine just like the best humans in any domain matched together uh all operating at once given your views maybe you would imagine that human operating in a physical environment in some simulated world for a while before coming out with some ideas though I think that’s probably not necessary or not not the most efficient thing I think those systems pose even more risk because now they might be by default the most economically productive the most militarily useful ways to use these systems will involve them doing actions that we don’t understand like they’ll be so far beyond us and their state of Technology might advance so quickly that they’ll now be doing things like autonomously building factories for devices we don’t understand which are much more powerful than our previous technology and I think I mean I I don’t know I mean that sounds really scary to me if that’s happening my guess is you agree it’s scary and you just disagree about the probability that that’s going to happen I I do I think we disagree about the trajectory of how this thing will evolve so if you think about it even if we did what you said right so we we have this like Army of um robots that are you know developing an agential contour map around them if you like um I think about it like a vend diagram so the the the agency contour map develops very slowly over time so initially there will be tools so all of the robots will be working for individual humans their utility and function will be for humans and they’ll be bottlenecked by sharing information to humans in a way that humans understand and I agree over time that they could develop more and more agency and they can start sharing information with each other and so on but this this Divergence will require so many iterations um just being able to physically replicate themselves will require factories which are managed by humans which we control and so on so what you’re talking about is is possible and principle but I’m just saying that that the the scale is is much slower the other thing I’m worried about you were talking about like um you know deception and and you know like Bad actors and um what was the word you used earlier inside a threat was the word you used earlier and all of this is true but I also worry about this kind of move towards not only paternalism but I’ve also seen moral paternalism in in the effect of altruism and rationalist type communities which is basically saying that you know we can we can come up with this moral abstract framework and um this framework is the best way to kind of have a calculus about how the world works and predict things in in the future and we know better than you and it’s the same thing with The Insider threat and and the the alignment in order to do effective governments governance to me it’s quite worrying that it’s a kind of centralization of power and it’s like a reduction of our human agency and I find that almost more perniciously negative than the I guess I’m saying like the Cure is is is worse than the the disease as in you’re saying that we should let the aic’s control if they so desire well no I’m just saying that on the basis that agential AI could seize control we’re kind of advocating to um enforce a very rigid Force form of governance which ER I was just talking about enforcing that on the AIS and also preventing the weights from being stolen I wasn’t proposing uh surveillance on humans oh oh sorry I thought you were talking about Insider threat and misalign people yeah yeah so so so I think the thing I was saying was that I we’re worried that AI might be misaligned I was proposing applying the standard Technologies of Insider threat protection from Human organizations to those AIS and using what I would call like control measures or making a control safety case which would like arguing that the a couldn’t cause problems even if they wanted to uh I also think that because AI companies will be basically in the world I’m imagining the most important companies in the world and the most important projects in the world and will be basically Critical Defense contractors and critical n infrastructure for National Security we should also be potentially taking aggressive measures to ensure that that employees at these companies can’t cause bad outcomes uh but this is just standard for defense contractors like if you work at rathon or whatever you’re you you can’t they want to make it super can’t steal the missile plans and and and send them abroad uh and I’m not propos I’m not proposing necessarily any invasive surveillance on uh normal normal citizens I I do think that you’ll need to do potentially compute governance but I think um the situation I’m imagining is more like make it so that no one can buy huge huge amounts of gpus so that we can you know uh not not lose control over really powerful AIS in so far as that’s that’s possible or in so far as as huge amounts of gpus uh can make that possible but but I think I’m like this will not be infringing on the Liberty of typical people if typical people are like I I I don’t think that the people should have in the same way I don’t think people should necessarily have the right to buy and Rich plutonium I don’t think people should necessarily have the right to buy uh 100,000 h100s uh S I didn’t mean I didn’t mean to misrepresent you then I think that’s absolutely true um because the other thing I wanted to pick up on is it’s not just the code right so like for example if you’re um IBM now and your source code for whatever AI Suite that you know let’s say they’ve got some big office suite and it the source code got licked on on the internet um the executives at IBM would be very worried but they need be because no one’s going to use that source code it’s it’s impossible to use it it’s a similar thing with um AI so the source code that open AI use I’m guessing is actually quite simple uh you could write an incredible AI system probably in about a thousand lines of code um the the the data and and how they curate it and refine it is very important but actually it’s it’s about the operational knowledge and architecture and hardware and like the whole kind of supply chain around that so I I I think like when you when you factor that whole thing in and as you say it is on the order of like maybe thousands or tens of thousands of h100s and so on so even even if another country did steal that code it’s not like they’re going to be able to do anything with it right yeah so first of all I think the most threaten the most concerning situation is not they steal the code it’s they steal the model and they steal the inference code and then maybe they don’t use that inference code they rewrite it but like I think you should be Imagining the most concerning situation you have this AI this AI is capable of greatly accelerating A&D it might lead to dramatic a dramatically different world in in in in a few years five years three years maybe even shorter uh and it’s also already massively accelerating defense technology uh offense technology it’s capable of offensive cyber it’s capable of potentially making uh super humanly powerful bioweapons via just huge amounts of cognitive labor you have that system that system is just the weights are stolen the inference code for it is stolen it’s now running on a Chinese server inference or or or you know whatever North Korean server maybe even uh and and and inference is much cheaper than training so it does not take many gpus to do inference on a modern modern Transformer so for example the the biggest publicly known model is llama 405b that model uh I think you can plausibly run it on 16 h100s possibly fewer with aggressive quantization uh 16 is not many it will be easy to get 16 h100s uh I think there’s a question of how much work will you be able to do so you won’t be able to do that much inference with 16 h100s but you’ll be able to do some uh I think if you have thousands of h100s you might be able to do a lot of inference and really a lot of cognitive work so based on the estimates for flops I gave earlier you can run a human at real time on 1 h100 so that means and I think current AIS are actually cheaper for inference than that so I think if you have a 10,000 h100s you could potentially with with with AI systems in the next few years be running like I think it would be uh perhaps like millions of years of cognitive work per year uh and so you maybe have like millions of extremely smart humans that are incredibly knowledgeable or like the equivalent of millions of very smart humans I think that poses uh substantial risks like currently stuff is partially bottlenecked on top ml engineers at open AI every year open AI has better algorithmic improvements they release better models on the basis of better algorithms if you instead had AIS doing those algorithms things could go very fast there could be millions of them it seems like a pretty concerning regime I think things will be continuous but could be very fast um addressing another point I think there are like multiple components you need to be worried about being stolen so earlier you were saying there’s the code the code is simple I think I agree that in some sense the code is simple but I think the code contains like a large number of insights that are important so algorithms improve over time so like if you were using 2017 algorithms you just took the transformer architecture you implemented it naively you scaled it up your results would be like a lot a lot worse than what we get right now uh so an organization called Epoch AI has done estimates of how much algorithms improve over time uh their estimates indicate that the Improvement rate is is is is is is somewhat below an order of magnitude every two years so that’s like as in the amount of bang for your buck you get is increasing by a rate of about 3x or maybe 4X per year or yeah probably closer to 3x per year uh that’s pretty fast like uh that means that in in two in in in in two years you might need 10x fewer gpus um and and so I think it’s also a concern which is like well initially when you really have powerful AI that AI will take a lot of gpus to run it will take a lot of even even more ungodly quantity of gpus to train but that will be decreasing over time we’ll be in somewhat Race Against Time with these algorithmic improvements advancing uh and and then once systems are very cheap to run it really becomes or very cheap to train and very cheap to run the governance problem becomes much harder as you were noting before if compute governance like tracking the supply of computer chips can be used to govern really powerful technology that’s that’s great uh and that will last for some period hopefully will last for a while where we can have some handle on AI we’ have some control over AI via compute uh but it will eventually fade as as technology more powerful would be my guess yeah yeah I mean um one thing that does stri I mean coming back to the to the arc challenge for a little bit you know one of these like big leaps forward could be that we figure out how to compress the amount of comp computation that we need to do you know for a for a similar amount of intelligence this is kind of what you’re talking about because you know right now we can agree that if we had like bazillions of of compute power then in principle we could do all of the things but you know that’s that’s still kind of like limited by economics and hardware and gpus and so on so maybe there’ll be an invention let’s say something that we might discover on the arc challenge where we can actually do a significant amount more there is already a trend actually to um make smaller models but I still think that there’s a tradeoff there and and the smaller models are doing less intelligence a commensurate amount less um intelligence but just to comment on the the other thing you said the my main because I just want to make it clear to the audience that I do disagree with you on this point that I still think you are thinking of of human cognition or humans as islands and you’re kind of making a um a commensurate relationship in terms of the amount of computation happens in a brain and like we can reproduce that in a in a GPU and then like you know we could have 10,000 of them and then we could have like blah blah you know all of this kind of stuff um I I just I disagree with that in principle but we’ve already covered that earlier so we don’t need to go over that again yep yeah yeah I mean for what it’s worth the place where I expect the progress to come from is basically mostly at this point in large AI Labs uh because maybe I’m more optimistic about the llm Paradigm that than you seem to be uh and and I would note so you said earlier that small models are just basically there’s something limited about them relative to to bigger models I I want to push back on this a bit so so a while ago uh Nidia trained a model called I think like Megatron nlg very uh so it’s really really big model uh 540 billion parameters it’s actually bigger than the planned llama 405 billion parameter model my understanding is that that model was actually worse than gpt3 or at least comparable to gpt3 and that people made 7 billion 10 billion parameter models now which basically strictly dominate gpt3 like it’s very hard to find a task in which they’re worse than gpt3 uh and my understanding is basically that it is actually the case that you can make smaller models that strictly dominate historical larger models and my guess is it will be the case that the current like you know recently llama 370b was released my guess is that uh according to the sort of trend lines I was saying earlier in a few years you’ll be able to create a a model that’s 10x smaller 10x cheaper to run which is uh basically like uh as good as llama 370b uh I mean this is an empirical claim we can just look at this yeah yeah I mean on on that the way I think about it is um part of the reason we over param overpar parameterize uh deep learning models is to make them as you know statistically tractable to train so SGD works much better when when you have more parameters you know but when I think about what what is what is the model actually doing well it it’s it’s a bit like a zip file of of the data set that that you’re training on so you know what it does is it will kind of like a manifold for all of the data but it does an expansion it’s more than a manifold so you have like these inducted priers that do these like you know in a self- attention Transformer it’s doing a bunch of Petry symmetry permutations so it’s kind of like you know as well as embedding the actual data that went in it’s embedding like lots of like Arrangements of the tokens and so on so so you get this like big convex hole around the data distribution and and then there’s the case of okay well we can make these things more efficient right so they need to be trainable but perhaps even after we train them we can sparsify them we can distill them we can do a whole bunch of stuff so there’s presumably some kind of like information theoretic like Optimum trade-off between you know I’m representing as much of that distribution as possible and you know I I can I can get down the compute but surely there’s a limit there right yeah so uh getting on that so there’s there’s scaling laws so what people have found is there’s a relationship so suppose that you want to get as good of a model as possible just like it it’s just as smart as possible or you know smart as it may be as performant as possible at the end of training uh it turns out that you want to use basically as much data as you have parameters based on the last publicly known scale scaling laws which are the the chinchilla scaling laws or like basically people do something similar and so I would say actually in this regime interestingly models are in some sense not overparameterized so if you train um a 7 billion parameter model on uh a number of tokens that is a multiple of the number of parameters so people would normally train the compute optimal amount to train a 7 billion parameter model is I think about 20x more than their parameters so it be be around 140 billion tokens um and so that’s that’s that’s quite a large number of tokens now in practice you can do better by training on more data and our scaling laws do indicate that this plateaus so it turns out that if you train like a seven billion parameter model with current methods on an infinite number of tokens then based on the extrapolation from our scaling laws I think it’s the case that that model will never be better than llama 370b no matter how much you trained it uh with current with the current architecture and current current improvements however there improvements other than that sorry I might be getting the numbers wrong here it might be that it’s never better than like what llama 405 would be but like there there is a limit here um so the functional form of these scaling laws is basically there’s a data term and a and a parameter term and you can basically reduce either term by increasing parameters or increasing data uh and and and and and and then uh you have to increase both at the same time to be to be compute efficient uh and that gets you more performance um so yeah uh yeah I I definitely agree that there there’s and also sorry to be clear there’s definitely going to be a limit somewhere right so even if the scaling laws indicate things scale forever well we only have so many orders of magnitude we’ve seen so far things can fall but but we have seen a lot of orders of magnitude so I think people have seen scaling laws over maybe like uh six or seven and maybe even slightly more orders of magnitude I think extrapolating out another six or seven orders of magnitude on the actual loss number is is plausible now we don’t know how that loss number translates into Downstream performance could do could could could do whatever there might be limiting factors to scaling out that far right So eventually run out of data uh at some point you’re in a truly ungodly compute regime uh but but but yeah there there’s more your question that I wasn’t getting at there but that’s Le well I mean I just wondered if if you can comment as well about what you think the I mean first of all when I spoke with Aden Gomez at cooh here he was saying that things like llm CIS Arena um it’s almost like we are dumb you know like the the models are smarter than than we realize and we’re the bottleneck it’s not like the language models are saturating I mean what’s your take on that you know what what do you think is is the likely you know like we look at commercial models coming out and so on what what’s what’s going to happen yeah so I think I’m I’m going to talk about a scenario that seems plausible to me and scary but is maybe not the it’s not like all the probability mess which is uh LMS are getting better over the next few years various benchmarks start saturating so uh basically like currently we’ve had a bunch of other benchmarks saturated mlu is saturated maybe we start seeing like models are performing Pretty similarly to humans on S bench sbench you know it’s it’s a pretty diverse task it’s not the most diverse task soon after that models start performing uh quite well on basically like hard agentic tasks that are like would take humans eight hours and would be hard like for example ml engineering tasks like getting an inference server running on faulty Hardware uh and have and getting high throughput and that involves like doing things that are basically R&D like doing things like oh debugging things improving the algorithms writing novel kernels and the LMS are starting to do that that ends up happening maybe uh under under a faster timeline scenario maybe in maybe in a few years we start seeing those AIS at the same point as that’s taking off AIS have been helping people in of AI Labs accelerate R&D and now ai labs are running somewhat faster than they otherwise would have been they’re hiring aggressively the industry is growing in market cap and addition to that there’s this new factor which is that AIS are helping out internally uh that starts accelerating soon you end up having AIS that have sped up the pace of R&D inside Labs by maybe a factor of three via mechanisms like the AIS are basically acting like Junior software engineers at the AI lab where you say hey can you like go implement this test it debug it make sure it’s running and then if the metrics are X do why and the AI goes off and it does all that and then over time the the level of state and and understanding of these systems will will evolve and these systems will now have basically more understanding of the previous state of research and the the prompts you’ll give them will be even more open-ended so they’ll be more like you know uh please go and can you do some research on how to speed up this kernel go and it’ll just you know know about all your stack it’ll just have that all all already in its understanding and then proceed and at this point like you can get potentially pretty big speed UPS so there can be human bottlenecks but once you have fully autonomous systems the human bck sort of sort of disappear and your remaining bot necks are just uh how much compute the systems have uh and how much how many of them you can run uh and you’ll potentially be able to run a truly enormous number uh just just based on how current scaling laws work it turns out basically you can if you imagine that we’ve we’ve trained a system on 100 trillion tokens you’ll be able to for uh run them at inference for also about a 100 trillion tokens with roughly the same cost as you train them on and that means that you’ll now have this system that can do maybe what is the equivalent of 100 trillion seconds of human time which is an insane amount of human time and then you’ll have that system you’ll do a bunch of R&D that will result in more powerful systems and then things will accelerate around this point probably governments will get involved uh probably uh governments will notice what’s going on they’ll be like whoa these systems they can now like accelerate defense technology they can now uh contribute and do massive cyber offense campaigns like the likes of which the world has never seen because there have been Personnel constraints before uh things are going very fast something needs to happen and then something will happen and now from here the world can branch in many directions the direction I would like to see with the direction I would hope for is we can keep these systems basically uh like controlled we can keep them so they can’t take over the world if they wanted to they can’t sees power that that we don’t uh deliberately Grant to them we potentially uh understand try to better understand them and then I think I would I think that once we have systems that are accelerating the pace of R&D by a factor of 10 or 30 we really should slow down like I think that’s a very scary point to be at and I think it’s really important that we are very careful and slow about the advancements beyond that especially advancements into a regime that we cannot control and sorry and we should be careful about the advancements to that point I think it’s going to be very hard to coordinate to slow down prior to getting that I think it might be possible to slow down aggressively after that and then maybe over the next 10 years so I’m this is like three years in AIS are 2 Xing R&D 5 years in AI are 10 Xing R&D uh maybe 30 Xing R&D and then we they hopefully slow down maybe slow down even before that uh have slow advancements over the next 10 years figure out a lot of stuff figure out how to better control and understand AI systems ensure that the AI systems are aligned with our best intentions and then at the end of that we potentially build incredibly powerful systems technological advancement becomes uh we basically unlock all technology or or unlock like a huge amount of Technology uh you can potentially do like aggressive space colonization and then all kinds of of crazy stuff might happen um yeah yeah I mean first I I I I welcome the views in the sense that you know mlst is about diversity of thought and we do talk to people have completely different views so obviously just to make it clear to the audience I I personally don’t don’t agree with that but but you know I would you’re a very intelligent guy and I I respect your your Viewpoint and you you did actually say in the notes that you think around 2036 that uh that’s your median forecast for ML R&D I was I to be clear I was giving an aggressive scenario which was like a a a concerningly plausible but aggressive scenario my median view is that things will take longer than that yeah I mean I I think I I suspect that it’s a little bit like um there’s the myth of the 10x software engineer and the re you know just you know how software engineering doesn’t scale up very well because there are so many bottlenecks everywhere and you know like I’ve got to wait for John you know to re to review my pool requests before I can get any work done and stuff like that um that that’s why I kind of I ensue it that there will be lots and lots of scaling bottle next to this and I think a lot of your um your intuitions are based on the idea of kind of like you know pure internalized intelligence so so it’s almost like in lie or absence of those scaling bottlenecks yeah I mean I I I think that’s basically a fair caricature I would say that it’s worth noting these systems will be like running experiments interacting with the worlds but I agree that I imagining a world in which the intelligence is in some sense internal to the system the reasoning the research I think that’s accurate that that effectively describes how I operate other people may be different um yeah as far as bottex so there’s a bunch of possible bottlenecks that could happen so one thing is that scaling might stop before it reaches these systems due to data limitations or compute limitations because basically there’s not enough funding or even just Supply constraints on gpus because the Fabs are too there’s not enough Fabs uh but but but supposing scaling does get to the point where the AIS can be as smart as smart human uh like researchers um now there’s a separate question which is do you have bottlenecks due to difficulties parallelizing so you know it’s it’s hard to scale an organization it’s hard to hire a lot of people uh I think there’s a bunch of ways in which AIS might be relatively less bottleneck in this way though it’s an open empirical question so for example AIS will be able to share all the same state have all the same weights they’ll be able to be trained to aggressively coordinate with each other they’ll potentially be able to use non- language uh uh interchange formats though I prefer that we stuck with language I think that’s much safer uh and much better um I think that they will be able to basically be learning all the stuff in parallel we’ll be training them on a bunch of different things in parallel in addition to this even if it’s the case that things are quite bottlenecked on Personnel it is still the case or sorry quite bottleneck on um not being able to scale up to more Personnel like as in you have millions of AIS but you can really only productively get you set of a thousand and beyond that you have strong diminishing returns there is still returns to intelligence so imagine that your your AIS are are are doing this uh and um you have you know a thousand smart AIS and next year you have a thousand really smart Ai and the next year you have a thousand really really really smart AIS you can still get advancement from uh not from huge scale but instead from uh from that I would also say that in in various ways if you’re running fast enough I think a fast Dumber system can usually approximate a smarter system to at least some extent so for example you can do things like sample a bunch of different possible approaches and pick which one is best now that this this gets you some advancement it’s not going to get you an infinite advancement but it gets you some improvement and there’s just a question of how far does this sort of thing go yeah it’s a little bit like your your AR solution indeed yeah yeah which is very very similar but I also think that you know if if it did roll out in the TR trory that that you’re talking about I mean there would be an intermediate massive disruption to the job market and there would be a a revolution because people wouldn’t be needed to do their their jobs anymore right so in in a sense we we almost don’t need the whistleblowers further down the stream yeah so I think there’s a few a few things that make me slightly more concerned than that than that is imagining so one thing is that it’s plausible to me and I’m concerned about cases where the big AI project enters basically a top secret situation and has low transparency I think that’s very concerning and I I think we should try to ensure sufficient transparency that independent experts can basically understand what’s going on understand the systems well enough to know whether or not they’re safe you could have like a third party auditing regime the second thing is it’s not obvious to me especially in these faster situations that people will lose their jobs so economic growth does not necessarily result in job loss often involves some displacement and some new jobs uh it’s plausible to me that with things moving this fast uh employment will be kind of sticky so people will be reluctant suppose your business is booming I think typically you don’t want to fire people when your business is booming even if there are not huge returns to to to to the Personnel you have uh and you might want to keep them around try to shuffle them so I think I think it’s not obvious you’ll get a labor market disruption my guess is you probably will under even faster situations but like I I I think it’s like an open empirical question I’m not an economist I think things can be sticky I wouldn’t make strong assumptions here uh now I agree that there will be a lot of disruption in the world from other things than jobs so for example I I think it’ll be clear if these systems are deployed that they’re very powerful I think that people will potentially have you know AI girlfriends AI boyfriends people will be like what are the kids up to these days you know uh and and I think that that that that it will be clear that some change is going on I think I both worry that Society will not respond early enough and I also worry that Society will respond in the wrong ways so I think that there’s some interventions that will be actively counterproductive and I think there are some interventions that will be very helpful and and I think both these interventions are on the part of government on the part of Civil Society on the part of Academia and on the part of AI Labs uh and I I I think I think getting it right is going to be I mean if the scenario I’m outlining is right I think it’s going to be really important to get this right because this will be potentially the the the largest most important event uh over the last several hundred years possibly for all of human history very interesting well um Ryan we’re nearly at time and and this has been an amazing conversation but could you just finish by reflecting on you know the arc challenge like maybe what what do you think is going to be the solution what do you think going to happen there and any final Reflections on on your particular solution yeah my guess is that the frontier of like Arc uh Solutions will be basically uh based on taking large language models that are on the on the the best language models uh at the time probably large multimodal language models putting some tooling and Scaffolding giving them access to various tools maybe doing something like running code but if not that maybe just at least giving them various ways to visualize things and having that be interactive with them taking those systems using like a lot of runtime compute because there’s I think with ar AGI it’s very easy to leverage runtime compute relative to other cases and then uh proceeding with that that’s my guess for what the frontier is going to be I think uh you know GPT 5 is supposed to come out maybe some chance before end of the year pretty good chance before end of next year probably will be multimodal I think it’s very likely that system will do much better on this this task uh and and the mechanism of actually like being like smarter in some real sense uh I think you know um 3.5 Sonet recently released by anthropic seems to perform better in my original solution used GPT 40 for various reasons for one Sonet wasn’t released at the time I was writing it uh and and and that did does seem to notably improve at least in some cases over 40 uh and I expect that that that will push the frontier there um now in parallel there’s uh so my my basic guess is that most of the progress will be driven by basically like expensive Sol using large language models uh and that’s what’s going to happen I think there’ll be also a bunch of cleverness and I think that at each point you’ll be able to basically compensate for the weaknesses of your language model by sort of putting in some cleverness making sure that they’re prompted right making sure that they’re that but over time the cleverness will become relatively less important compared to scaling and I also think runtime compute will will stay big yeah I I wanted to comment on something you said something very interesting a few minutes ago which was that you know it’s possible for a kind of Dumber system at scale to to approximate a smarter system and I I actually think that that could happen when we have you know like societal Collective use of um llms which do active inference so for example the reason why Fran chle um fastidiously guards the data because he knows that the knowledge Gap and the fact that the data is hidden is is the only reason why the llms can’t do it right but um just imagine that if if we if if this was Unleashed in society and lots of humans were doing the system to and they solved the problems and then like we did the active inference and we just kind of put the um as Jack Cole does basically do do the active inference and then put the knowledge back into the llm then at that point the llm could do the arc Challenge and then you can just roll that out to to you know what cholle calls developer Weare generalization so when um human agents discover new problems they’ll figure it out and then the data will go back into the Nexus of llms and then essentially we’ve got this functioning collective intelligence which is an interaction of humans doing the system 2 and llms kind of representing it so you could argue well what’s the point I mean if that’s just going to do everything we need then why why do we need to have this kind of special type of system to reasoning inside the AI yeah so uh earlier we were talking about accelerating mlr andd and we were talking about bottleneck I think if humans are in the loop that will indeed be a huge bottleneck so when I have my collaborator do some project and I’m like hey can you do this like maybe something like that I don’t send there like spend my time watching his every move and being like nope nope I don’t I don’t constantly point out ways in which you should do it differently and in fact if I were to do that I should I would probably be oftentimes faster to just do it myself uh and and and and also I just would get way less speed up and I think this will be true for AIS I think it will be the case that having um uh like currently various occupations and VAR various domains are limited by by by Personnel limited by amount of cognitive labor I think that the more that the AIS are autonomous the less this will be limiting and so I think people will optimize strongly for systems that can run autonomously they can correct their own mistakes uh it’s also worth noting that I think the systems will advance advance Beyond human level uh intelligence in system 2 reasoning and then at that point it will look particularly uh weak to to use the systems for that um yeah uh I agree with a sorry also I should note that like I agree with a view in which probably the systems that will first be doing all of this will be using basically all the tricks in the in the book so they’ll be humans doing subp Parts they’ll be trained online like they’ll be constantly being trained they’ll be RL ongoing they’ll be using like whatever the best way to represent knowledge for those systems are it’ll just be all mixed up it’ll just have all these things going at the same time uh just just because like whatever the first system that has that gets there will be using a lot of tricks I mean I don’t know exactly what the tricks will be but my guess is there’ll be a lot of tricks uh but but over time I think the main driver will basically just be uh general intelligence of the core system probably it’s not super obvious this is true but but that’s my default guess but um humans have this fascinating kind of juxtoposition of both autonomy and consensus so part of you know the way that we improvise meaning the way we discover um knowledge um not only is it grounded in in the world but it’s also grounded in in our kind of culture and and society and so on so you know we’re we’re trying things out and then we’re sharing information and and the whole thing is is kind of spreading out why would it not be the same for AIS like why would they not argue with each other and have disagreements and just get caught up in knots oh yeah I’m not I’m not saying that won’t happen I’m just saying that it might look a lot more like first AIS are introduced into human society as sort of like a slightly strange like engineer like but they’re like sort of a new sort of type of like or engineer or scientist or whatever there sort of a new type of cognitive labor with some of different properties like they’re Dumber they’re faster they’re whatever uh they’re better at some types of tasks than others then over time these systems will become more autonomous I think that it could look like a society of AIS existing autonomously I think over time probably if you just let things go as fast as possible uh the AI Society will diverge from Human Society they’ll you they’ll talk to each other in different ways we won’t understand what they’re doing they’ll be referencing research papers that were written uh so soon ago that and and so quickly that we don’t know what they are built on top of a body of work done in the last year that we don’t understand and I’m like at that point like yeah they’ll be AI Society I think I think it will be it’s not obvious to me how much it will go into like a huge number of small systems communicating versus a smaller number of centralized like highly coordinating systems uh I think it could go either way I I think there’s there’s there’s downsides to synchronization there’s upsides to synchronization uh but but I think it will diverge from the human sort of mlex um I should I should note though also AIS have coordination tools that are very different from Human coordination tools potentially so for example you can merge the weights of different AIS you can exchange activations they can all be the same weights trained using SGD like you can synchronize gradient updates gradient updates can be additive uh so I think it’s like AI Society could potentially be very different from Human Society uh like it it’s possible that AI Society is much much much more closely linked like it’s much more like a bunch of like humans linked up with their brains connected with high bandwidth connections than than our current Society where I just write I like laboriously write a Google doc and then send it to my colleague who writes some comments like we might it might be very different than that yeah but it it does outline I think like two two of the issues I have I mean we’ve already spoken about this concept of a pure intelligence where they they are islands and they can be in a you know catac you know like a what you call tank yeah sensory deprivation but but the other the other thing is is this notion of rationality because I was kind of teased that out I was trying to tease that out with my question that um there’s a lot of subjectivity when we’re doing epistemic foraging and also for intelligence as a system to work well there needs to be diversity preservation and Divergence and you need to have you know people with different views and so on so so surely the AIS would also have different views and and that would actually be a good thing and and I think that the framing that that gives you the opposite intuition and you know a lot of rationalists and EAS and stuff have this as well you know they think that there is an objective like rational way to do things and that the AI would kind of hone in and ground on that and this subjectivity bottleneck wouldn’t be a problem yeah so I think I I agree with parts of what you’re saying and and I also uh agree with parts of the like there isn’t like there is a right way to do things to some extent so it’s worth noting that uh you can have one system that represents a bunch of viewpoints simultaneously so you can prenti be like I entertain a bunch of hypothesis at the same time and now humans empirically have a bunch of cognitive biases and have a bunch of limitations so for me at least it’s often a much better cognitive move to entertain really focus in on one hypothesis and really think hard about that hypothesis and sort of do this one at a time but it’s not obvious that other systems have to have this property right it’s not obvious is the best way to organize things it could be it could not be uh second of all um I think I think that there’s a sense in which we want to have a bunch of different hypotheses floating around but I think there can be and I and I agree that there we won’t necessarily uh know what the right way to do that is we won’t know what the right way is to like think about that we won’t know we won’t at each point know what the exact right answer is but it is the case that there are things that are dominated there are dominated strategies right there are things you can do where it’s clearly worse than some other strategy from some Vantage Point uh and I think over time things will become more and more efficient so for example like uh I think there have been a variety of sectors in which aggressive top- down optimization has radically improved efficiency so for example like delivery truck drivers my understanding is it used to be the case that you’d be like okay here are some Packaging and you just let the delivery truck driver figure it out now there’s like aggressive tracking of everything people are constantly doing AB testing there’s like automated routing to figure out exactly where to go every part of the system has been engineered optimized searched like learned to the extent possible because just shaving off 1% of cost is just hugely valuable and my guess is that AI will Trend in this direction so right now we train AI with reinforcement learning and and this will continue and my guess is that that will in fact select AIS for having cognitive patterns having cognitive strategies that empirically perform better across a wide variety of circumstances uh and you’ll basically get systems that are smarter and smarter and and better and better at uh various things as they get Scaled up as they you know as as algorithms improve um and so so while I agree that there’s there’s no like foreseeable obvious right way to do things I think that there are there is like a space of like strategies that are better than some other strategies across a wide variety of situ like there’s just like sometimes there are strategies that are better than other strategies in a wide variety of situations so no free lunch means that they can’t be better in all situations but unfortunate like the world has lots of of there’s there’s no free lunch always but there’s lots of cheap lunch or whatever right like humans have lots of her istics that work across a wide variety of environments uh heuristics towards Simplicity are good heuristics towards a bunch of things AIS will learn these heuristics they’ll get smarter and there’s just a lot of Headroom Ryan greenblat it’s been an absolute honor thank you very much for joining us today yeah thanks for having me e