<no title>

but the thing is the model is a reflection of the user there’s a great example in chess play so because these models learn the statistical distribution of chess moves if you play smart chess moves the model will actually play like a smart chess player and if you play dumb chess moves the model will play like a dumb chess play it’s the same thing with coding if if you ask the wrong questions if you write crap code and you get it to complete the code or if you do anything stupid the model will be stupid because the model is a reflection of you this just a mathema AAL fact if you don’t like it I’m sorry I’m sorry you don’t like it but it’s just a mathematical fact okay the whole Crux of Turing machines is that they are allowed to at runtime have an unbounded so a potentially infinite amount of compute we don’t fix it ahead of time we don’t say you get to run for you know 82 steps and then I terminate you today is an interesting episode It’s Just The Two co-hosts Of mlst having a chat about the new models from open AI about test time inference about reasoning and if you hang around to the end or maybe you can just skip to it if you want to see that we um we actually test out all of the llms on a very interesting brain teaser from dagar’s early years at at MIT and we also show a little bit of us playing with some code in cursor and I don’t think we’ve ever shown us writing any code or doing anything like that ever before on MLS too so it’s a bit interesting anyway I hope you enjoy the show any any thoughts dgar why should people watch this show uh well I mean people should watch this show because um I think uh you and I are are maybe we try to offer like a bit of a more uh critical perspective on on open AI so we we’ve been through a couple hype cycles and we try to be um you know a little bit more grounded and what to expect so people should watch it to get a probably a different opinion from the bulk of opinions are getting right now and some some food for thought um and maybe also just um what to watch out for when they’re when they’re using these tools themselves so so when when we talk about things like the the uh computational classes and and and and I’m going to rant here for a second and say again that many of the things in theory of computation in my educated opinion really sit on the cognitive Horizon of the vast majority of even smart people okay like like I and many other smart people have said theory of computation is a class that blew their mind and you know Turing was a a genius you know beyond beyond Geniuses because he invented like this field really okay and so those things like these critical points that are made there like the difference between computational classes and folks these are not related to complexity class classes don’t get confused okay MP complete blah blah blah none of that has anything to do with what I’m talking about I’m talking about turing machine versus push down automata versus finite State automata Etc these are not trivial in the slightest they are they are not technicalities they are extremely powerful and vital and important fundamental concepts of computation the brave search API brings affordable developer access to an independent index of over 20 billion web pages powered by real anonymized human page visits to filter out data and refreshed daily with tens of millions of new pages we’ll get started with 2,000 free queries monthly at brave.com API okay but we we’re still arguing in the regime of there is a space of effective computation which is not reachable and it would be useful if we could reach them let’s go back to the conversation with Schmid huba now he was making the argument as many do that it doesn’t matter if you have a finite amount of computation because the universe is finite everything is finite we can just build bigger things and a lot of this comes down to your opinions about uh infinity and you can bring in like you know decidability and hting problem as well but explain why is there you know why is this important well I mean and this is like kind of a similar com comment that um you know J Mark Bishop made on on LinkedIn right when we posted it was probably that video or something about turning machines a while back he said oh yeah but I can build anything out of you know nand Gates and so therefore I can build anything out of um out of uh um neurons including like the computer that said these people should know better like they honestly should know better because they should remember they should remember the days I I’m even young enough or old enough to remember remember let’s say um uh the days when computers literally had tapes okay where you would build a computer and you would program a computer and that computer is a finite thing okay it sits there in a room it’s got like a fixed number of vacuum tubes or transistors or whatever the hell okay it’s it’s a finite volume finite Mass thing sitting in a room okay it’s clearly a finite entity as like all computers are but these computers had a tape deck okay and that tape deck was like the tape from the Turning machines and it was a rewrite working memory so the computer could sit there and turn the wheels of the tape right backwards and forwards and go along and write stuff and read stuff and guess what it could run out of memory and a little light would turn on Okay or a little counter which would say insert another another tape and you take off the tape and you set it in a box and you go over to your box of uh empty tapes or tapes you don’t care if they get overwritten and you put it on there and the compute continues you don’t need to reprogram the machine you don’t need to rebuild it you don’t have to retrain it like you would have to do with the neural network that program is constructed such that it will just continue the computation and this can go on forever like the machine can keep telling you add next tape add previous tape add next tape add previous tape add previous tape add previous tape insert previous tape right and you just go back and forth forever like for some unknown amount of time and if you run out of tapes in the room well you drive over to the warehouse and you grab another box and if you run out of tapes in the warehouse you go to the factory and you build some more the point is all during this time you never had to reprogram the machine neural networks are a like when we train neural networks when we train them and it’s critical it’s not just the neural network it’s neural network plus the procedures we use to train it okay they learn algorithms that don’t know how to do that they don’t know how to do that they can’t tell you I’ve run out of memory during inference time I need you to add more and then you add more and it does something it says I’ve run out give me another tape we don’t know how to train algorithms that know how to use an expandable an expandable a potentially infinite amount of memory this is the critical difference and those guys should know better they should know better okay but so touring machines suffer from this decidability problem right they could they could run forever but you’re saying there is a class of programs they can run because of course the program stays the same we can just expand the memory expand the memory and many of these programs you know if we look at the class of useful programs that we might want to do here on Earth they could still be represented with a very large finite amount of computation in in a neural network so the argument is basically for a large class of useful problems we can use neural networks what is the main argument why we need to have this different mode of computation well I would I would first say you know we’ve already the world’s already run into plenty of problems that um um that require turning machines and require memory like this thing about insert tape and whatever you know it’s it’s a it’s a real it’s a real phenomenon right like adding more nodes adding more storage and you don’t have to reprogram reprogram the machine and we have many you know many theoretical results about problems we care about that require touring machines they require touring machines in the sense that you know when some input comes in um and it can be some finite input by the way some finite input is given to the machine um the amount of time and storage that it needs to compute the answer to the problem that we care about is a variable and it’s a variable that that depends on the input and in fact it’s a variable that we may not even know ahead of time right so there’s a large class of of things that do this so no matter how big you make your your neural network um you know uh it’s always going to be the case that you can run into a problem that that neural net a problem you care about okay that that neural network just isn’t going to be able to solve because it doesn’t have enough you know memory basically and people say oh it doesn’t matter you know we’ll just build them like huge we’ll make them huge man like super massive the size of the entire State of California and then at that point like it’ll be able to solve like every problem human beings will ever care about okay maybe maybe you can maybe uh but it’s pretty unlikely because these problems right uh when you try to take when you try to take a turing machine program one that has like a variable amount of sort of compute space and stuff like that and you try to instead convert it into like a single forward pass sort of neural network that doesn’t understand iteration and read write memory and things like that what happens is you get this kind of exponential blow up of the size of the program um and this is like nothing new folks like again people should know like computer scientists should know this I’ll give you a perfect example okay in your computer chip today your your CPU you know whatever it is I know you’re always bleeding edge Mac so it’s probably like M3 or something you probably have special access to it but like in that chip things like say multiplication of numbers well you can build a very simple circuit that can sit there and read and write to memory and do multiplication but it’ll take a while like it’ll take you know whatever in in steps or something instead you can blow up the size of the circuit make the circuit say quadratically larger or exponentially larger and then do the multiplication and like maybe even one step all right and we do these kind of trade-offs like we you know we do them cuz something like multiplication you care a lot about and so you build a chip that has a very large circuit and you solve what for what’s called it’s timing closure so that you can you can still calculate the multiplication in like a very small number maybe even one if you can get there fast you know passes through the chip in a certain amount of time okay but it requires like much much larger you know circuits and and especially the types of like complex reasoning that we really think about when we talk about reasoning you know solving difficult scientific problems or or brain teasers or complex logic problems or whatever usually fall into the category of things that require this type of like iterative turning type of computation right just because things like you know uh logic is turning complete and Lambda calculus and Etc so you just you end up there like you end up needing you know turning machines but that doesn’t mean all useful things need that so I have a lot of sympathy for somebody who argues um there’s value in finite but but structured computations and things that are structured in a way that are very similar to the problem you know so like if you could imagine if we were doing some type of three-dimensional um uh problem solving it might be cool if we had memory architectures which were actually three-dimensionally wired and and somehow like yeah cool like I mean I’m sure that all all that kind of stuff is is very cool let’s just start with your definition of reasoning what is it for you well before yeah so I will give that before I give that I’m not I understand the word is used to mean you know lots of different things okay but if you if you literally just go and look in in the dictionary or the wiki or or whatever you’re going to see that almost all those kind of Common Sense definitions okay they they have they have two com you know component components in common um one is that they all refer to a process a process okay so we have to think about kind of iteration that kind of unfolds here a process and then the other thing is it’s a process that applies logic okay and sort of rational you know sets of rules okay so to me okay to me the most you know General process that applies logic is an effective computation in other words it’s the computations that Turing machines can perform that’s the you know the another word for all the possible computations that turning machines can do is is effective computation you know I can’t really tell you why it’s called effective okay that’s just the name for it um so to me reasoning is an effective computation in pursuit of a goal or inference of knowledge okay but but many of the computations that neural networks do of course are a subclass of the types of computation you can do on a touring machine so they are still effective computations and they are in many cases still in pursuit of deriving knowledge or achieving a goal so why is that not reasoning well so I that I I can’t disagree with that okay it’s it’s and I would say these people are kind of like being the the the technical you know the the kind of arguing technical points but kind of missing the the forest for the trees it’s like you know sure um the the soda vending machine okay at the gym you know you probably shouldn’t be buying soda if you’re at the gym but the drink vending machine at the gym okay is performing a subass of effective computations namely it’s a finite State automata and with the goal of collecting money and and and dispensing a drink is that reasoning I mean this is this is the problem right because what we don’t want is an anything goes definition of of reason right right so I said reasoning is knowledge acquisition the new open AO models don’t reason they simply memorize reasoning trajectories gifted from humans now is the best time to spot this as over time it’ll become more indistinguishable as the Gap shrinks which is to say the knowledge gap for example a clever human might know that a particular mathematical problem requires use of symmetry to solve the open AI model May might not know this yet because it’s not seen it before in that situation when a human hints uh the model and tells it the answer its Chain of Thought model will be updated and the next time in a similar situation it will know what strategy to take and this will rinse and repeat as they sponge reasoning data from users until many of the holes in the Swiss cheese are filled up but at the end of the day this isn’t reasoning it’s still cool though I said now this is the interesting thing right so a lot of a lot of reasoning is just kind of knowing what to do because I know you’re saying it’s an effective computation some of the people responded to this saying oh you know Tim what are you talking about reasoning is is just you know mapping an input to an output in data space but wed subo would talk about this hierarchy between data information and knowledge right there’s a relationship between knowledge and reason reasoning is all about this is what I know about the world and I need to perform a computation and in order to do so I need to um basically make a new model to understand some um effective model right to to do the thing I I need to do so I’m going to be composing those models together I’ve created a new model and then I perform this effective computation that’s what reasoning is because there are many there are many problems that are quite binary so if there’s a b and c and in order to understand C I need to understand A and B I need to know about A and B right if I don’t know about A and B I can’t do c so in the absence of A and B I can’t actually solve that problem I agree and there there’s so I would say that uh the the drink machine you know is is not performing reasoning okay or if we have to really be if we really have to be uh you know technical about it it’s it’s per it’s performing reasoning at at level you know um zero . you know something another okay and then and then an effective computation that sits there and does um let’s say uh some geometry calculations and some World modeling and and some image recognition and and some some other turning complete you know computations that may not Halt and whatever then that’s doing reasoning at you know level8 something another I mean we could have a whole scale here if we needed to right but if we’re just going to kind of break things down into some binary the soda machine’s not reasoning and neither is a dictionary okay if I go and like and I look up my problem in a massive dictionary and it has the answer and I spit it out that’s not reasoning you know it’s not performing the computation that led from the problem to to the result it’s it’s just doing a lookup and I mean so now we switch from from say the the Deep Mind paper we were initially talking about to um 01 right and and I think it’s kind of important to emphasize what you said about what 01 is doing and how it was trained right which is when they trained it they took a whole bunch of problems um ran a bunch of you know passes in very creative mode so like temperature one or whatever where it could kind of come up with um you know all kinds of fantasies about like you know justifications for an answer and then they selected those which had the correct answer okay so we’ve got like all these trajectories of reasoning and by the way those were all by some definition reasoning lots of them were complete nonsense and led to the wrong answer okay so then they selected those which were which gave the right answers and then use those to to retrain retrain the model so it was more likely to give pattern of quote reasoning um that uh that gave right answers and so like you said it’s sort of building up this dictionary of of kind of um specific program specific rationals let’s call them rationals it’s building up like a a massive database of rationals okay here was a problem here was my rational it gave the right answer this massive database of kind of little rationals and then when you give it a new problem it sort of does this context sensitive hashing and matches the rationale that’s kind of closest to it and then sort of fills in the blanks like an ad lib you know sort of let me just stick in all the stuff in there and then hopefully it works well sometimes it works sometimes it gives complete nonsense and is that reasoning I don’t not in my opinion this is not what we’re talking about when we talk about reasoning you know we’re we’re talking more about the application of a set of first principles to some inputs in a process of applying logic to derive the answer right yeah and to be a bit more specific just in case the the the folks haven’t played around with this yet it’s doing the generation of Chain of Thought templates and for those that don’t know about Chain of Thought it’s a type of in context learning so a lot of models sometimes do better or they generalize better if and the original version of this I think was called scratch Pad where break down a problem it might be um addition or multiplication and you kind of say you need to do this and then you need to do this you need to think about this so it’s kind of decomposing a problem into a rationale that could be applied in many future situations so you kind of say to the model um think out loud or apply this thinking protocol to it and then the model works better but the thing is with prompt engineering and Chain of Thought the human supervisor always had to put this into the model so what they did was they came up with quite a clever idea they said well why don’t we get the model to prompt itself with Chain of Thought right so what they’ve done is and as I understand I mean I’m just guessing what their architecture here I think it’s really simple I think that you have the base model which is just trained to match the statistical distribution of text it’s just basically a autocomplete and then they do this next thing called rhf which is where they basically shape the distribution by supervising it with examples of conversation trajectories that are good so saying this is good this is bad so you know go here don’t go there I think all they’ve done is the same thing with Chain of Thought trajectories now of course the question is where did they get the Chain of Thought trajectories I’m I’m guessing that they’ve done a bunch of synthesized data and they’ve hired a whole load of people to reflect on what these cognitive templates are they’ve done rhf and the interesting thing is this is all in one model now it gets a bit more complicated than that so the model has a mode a lot of people my original intuition was that there was more than one model there was like an adversarial Chain of Thought model which was injecting prompts and doing this big search I don’t think it’s it’s that it’s really simple so very similar to how rag models go into a different mode so when they detect that this is a rag type query they go into a different mode and they start self- prompting themselves in in a very specific way which increases their faithfulness it’s a similar thing with this so it goes into a thinking mode it starts generating these of thought and they hide the Chain of Thought but it’s all going into the the the Shar you know in the context for this one model and the UI is quite interesting because it says I’m thinking about this I’m thinking about this I’m checking the policies like I’m checking this I’m uh refactoring the answer I’m doing this and then it just gives you the result and I think that’s how it works so it’s really simple yeah well I mean it’s way you know well I mean we don’t know because because you know they don’t open AI isn’t exactly that open about uh you know what they’re doing behind the scenes and how how things function you know it’s it’s almost like there’s a missing symbol in front of the company name which is like the the not symbol you know the the till a or an exclamation point or something like not open AI um so so I don’t know I’m going by like you know blogs and whatever else they’ve they’ve kind of published published on the site um so I think it’s something along those lines um and I mean it’s kind of like of course it’s more the same you and I have talked about this so much on the show this idea that really you know neural networks are doing this kind of um interpolation this sort of locality you know locality sensitive um lookup tables um and then and they’ve just gotten so massive now they have these massive tables of of um lookups that they can do and then if at runtime you’re kind of allowing it to do that a few thousand or hundreds of times or whatever over some some period of time and then kind of selecting you know the best one like you were doing a lot early on you know when you and I were using um GPT for um um you know programming and whatnot right where you would you would ask it to generate something you would you would kind of give it back to it and said this this this is pretty buggy see if you can fix it you kind of iterate with it right like really just kind of in a loop and then after a certain number of iterations it would would kind of improve itself so I don’t know I don’t know why people get so excited about about this kind of stuff it’s in the long term folks this is not the path to AI okay like sure but a couple of things on that first of all you were talking about the the supervision so when you talk with a language model you know you’re generating some code the problem is is actually quite ambiguous and poorly specified because in codee can I can I just interject really really quickly there and just say that like that doesn’t but that can be self-supervision or like in other words you can you can set up a system where you just ask a large language model for some code and you’ve decided ahead of time you’re going to feed it back to itself five times each time telling it it’s buggy try to improve it right so it’s it’s there doesn’t necessarily have to be any human supervision it can be it can be a kind of iterative self-supervised finite thing right oh we can but it it it my point is that that never works well so the reason why language models work so well for generating code is it is a a didactic exchange of knowledge Discovery right so I tell it to do something it gets it usually it gets it wrong and I say no I actually wanted this and then it might come back with something interesting and I’ll say no you just accidentally deleted a load of my code but I quite like this thing um why don’t you keep that thing but go back to that thing like the biggest problem with Gen AI coding by the way is is it constantly violates the Golden Rule which is if it ain’t broke don’t fix it it always wants to delete loads of stuff that it thinks isn’t important because it doesn’t know all the assumptions and the reason why it was there in the first place and then it it doesn’t do the thing you want so anyway the reason why having this tight supervision process is important is because the model will diverge so having a reasoning system that actually goes many many steps and what about this and what about this and what about this it’s actually not that useful right because it’s going to be wasting loads of Cycles which you’re paying for by the way doing things that you didn’t even want it to do in the first place um I so I I I can’t go that far I I agree with you that it’s very inefficient I agree with you that that it’s very inefficient um but I think if we if we really knew how to train um neural networks uh so that we could train them upon like train them knowing that they’re going to run for some you know unknown it has to be unknown some unbounded amount of time and kind of looking for a stopping condition once we get to that point I think the training process can can kind of grind away some of that that efficiency I think what’s happening right now though is as far as I can tell as far as I can tell but I again open AI is not that open is that when people are using these things what they’re doing is really the stochastic gradient you know optimization is done and then it’s over with and then they take that thing and they try to hack it to do this this iterative work in in a variety of like kind of hacked ways you know probably because if you actually tried to train it to do this you know inference time or to train it under the conditions that it would be doing this inference time iteration then it becomes too difficult to train right and so we’re we’re stuck in this situation and so that’s probably a lot of the blame for the this this very real inefficiency that you’re talking about I think right I mean another another interesting thing is that because it’s all in one model the the the models actually really struggle in this autonomous setting and they really struggle with complexity and depth so if you took Claude 3.5 and you told it to be like you know 01 preview right you said I want you to generate all of this chain thought and I want you to uh it needs the tree needs to be depth 10 and I want you to just keep going and keep going it actually will it won’t do it and part of that is because these earlier versions of the models were trained to be concise right that they’ve had this rhf done to them they don’t want to be theose and they don’t want to do it but even if they were theose they would actually kind of collapse under their own weight because of the complexity of this of this tree um going up well I’m going to so I’m going to have to take your word for that cuz I don’t I don’t know that that’s the case but I would not doubt that in the slightest and and believe it or not like this may sound kind of nutty possibly I’m not sure but to me this this relates to the points that Steven Wolfram makes right which in in his in his let’s say uh Universe of thought right um you have computations that that really are they’re either in the category of they they sort of do nothing they’re just like super simple ordered they draw like a a straight line okay or they do they do total just noise they just generate you know noise or they’re on this kind of like chaotic chaotic boundary right where they they do interesting things the the kinds of computation that do interesting things and those interesting things um Can can exhibit like you know irreducible behavior and and the most kind of complex behavior that that we’re really interested in and I think that stuff like general intelligence and you know the hardest problems they sit in these in these very chaotic kinds of zones and so as you start to take a computational machine and you push it closer and closer to that greatest of capability it just yeah it becomes like a nightmare to train and operate and it’s sitting on this boundary where it can easily fall off into either total nonsense or Flatline right like it’s almost like you’ve got you you’ve configured claw to fly too close to the sun and it got too close to to what’s required for general intelligence and then it just it just flatlines and it starts you know repeating to itself like I’m bored I’m bored I’m bored I’m bored I’m bored over and over again or just outputs total gibberish like you know this the strength of the wifi signal indicates that he was farting throughout the entire dinner and you’re like what you know like that’s that’s probably what’s happening yes well that that leads me onto onto the next thing which is that um there is an interesting kind of mode collapse so because they have and I’m I’m assuming that this is what they’ve done they have got a combined model which is both a Chain of Thought reflective prompter and a standard language model um it gets kind of stuck between the two modes and the modes interfere with each other now of course if I was designing this thing I would have had um you know a separate Chain of Thought Chain of Thought reflector and I would have had them kind of interact with each other in an adversarial way maybe even in a multi-agent way they probably do that during training but not during inference but anyway my point is is that the um the 01 preview and 01 mini they’re very good from the first shot so you give them a problem you don’t need to specify it very much and One Mini in particular for coding is incredibly verbose it will you know it’ll generate a whole bunch of C coot trajectories and it will give you loads and loads of code I think it’s been designed for coding because it just gives you an insane amount of code and and then you you you refine it and you Circ well actually I want to move on to this other adjacent problem now and it basically mode collapses it’s completely useless not completely useless but useless in many cases and it’s because in its context there’s all of this coot um data right this Chain of Thought data and and that actually interferes with with the with the new thing you want to do now with something like Claude 3.5 Sonic that doesn’t happen it’s got a 200k context you can just continue onto a new problem and a new problem and if anything it helps you having all of this stuff in context because it’s all contextual information it knows what the assumptions were it knows why you did things and it just continues on and it’s incredibly reli reliable with um you know the new GPT models you actually need to um basically start a new session and in many cases as well if you’re using something like cursor it’s got an apply model so it runs another model just to apply your edits to the code and that’s currently broken well I say broken it doesn’t work very well on on on the new models but what I’m what I found myself doing is I start with 01 preview and it will say something interesting and then I’ll switch over to Claude and I’ll go on from there it’s not possible to continue in the real world we are dealing with ambiguity all the time time and and we do a lot of creative thinking and so on and it is it’s like a symphony it’s a dance right you’re dancing with a language model you’re sharing information you’re updating your models and you need to be supervising this thing every single step of the way how many problems are there really where you can just say to a model look go and do me this you know go and do this thing here’s some software these are the requirements it needs to do x y and Zed just go away and create it for me there’s an infinite number of possible software programs that you could write that fit this set of requirements how on Earth is it going to write the correct program it’s not yeah well I mean and then and then you start to to ask okay well somebody’s going to argue no it is correct because it it g it met all the requirements and you’re like okay sure but then as soon as we added this other you know pretty reasonable use case it just like totally broke down and has to be you know recoded because it’s not following any kind of you know general principles I mean a lot of this is is sort of like this this philosophical debate right which is if you’re you know if you’re trying to um walk across town and get from from the bar where you were just you know drinking too much you know back home okay if 10,000 drunks perform a drunkard’s random walk um some of them will make it home okay you know most of them won’t would you say that the ones who made it home were reasoning and they you know were reasoning their path home like no I mean it’s just by chance some of them made there so anytime where you’ve set up a problem um wherein it’s very easy to check if you have the right solution of course you can randomly generate Solutions and if you generate enough of them and you can check if it’s the right one eventually you’ll find a right one like that’s that’s just not what we’re talking about when people are talking about you know reasoning and so then neural networks take you a step kind of closer right which is because well we can do better than a random walk we can do a whole bunch of training and so then the kind of sampling the generation is much better than random you know it’s it’s heavily skewed towards a population where one out of 10,000 is the correct solution instead of one out of 100 million you know and then is it reasoning I mean these are these are some of the things people have to kind of at the end of the day it’s this it can be an argument over semantics but it can also have extremely important practical um effects on how you design a system right okay like you can choose very good very good but but this this gets to to the next point which is that people say well humans don’t reason either and and I’m I’m amendable to this view right because I I I think that all intelligence is collective now of course our brain is a collective our bodies are a collective you know we’re made up of all of these little mitochondria I was listening to Richard Dawkins on Sam Harris’s podcast the other day apparently they they they come from bacteria mitochondria is basically a bacteria that’s one Theory Yeah Yeah well yeah okay but you know we we’re we’re just made up of of all of these little you know autonomous things that have their own agendas but you know the the bacterium actually want to pass their their genetic material onto future Generations through the female line actually not the male line Dawkins were say but um but but anyway you know it also happens outside of the brain and the body so we are just like Ryan greenblatt’s 3,000 uh you know computations the way that we discover knowledge and we do reasoning a lot of it is just Serendipity and chance and because we have this amazing power of memetic information sharing try not to use big words here basically we discover something interesting we share it with the collective and that information survives after I die so as a collective we are accumulating more and and more knowledge and it survives after we die it’s a beautiful thing so you could say well Tim you’re not really reasoning you know you’re just a you’re just an automaton you’re just you’re just using reasoning bits from other people and this is true because since I met you Keith duger I’m smarter right you’ve taught me how to reason in many many ways you you’ve taught me lots of heuristics of thinking that I use every single day um does you know because because I’ve memorized those things quite you know this abstract knowledge does that make does that make me reason less well so in in my opinion there’s a uh well thanks for saying that I mean if if that if that’s true like cool like you know you’ve taught me a lot of things too I mean I my life is radically different from having met you and participated in in in MST so thank you for that I’ve thanked you many times you know personally but uh so here it is again um I think there’s uh there’s an error there which is there’s an assumption that if something is following a deterministic process it can’t be reasoning and I don’t agree with that so I I don’t have any problem with the idea that um that a robot can one day reason okay that for that matter like I don’t have any problem that a computer program like just you know can can perform reasoning I don’t have any problems with this idea it can be 100% deterministic a computer it can be in silico perform reasoning I also don’t have a problem whatsoever as you know with the idea that I’m a machine I am I’m I’m a biom machine okay with of of absurd complexity which has nothing to do with me it’s our genetic endowment right from an an insane history of of life okay incomprehensible almost maybe it is um you know I’m a machine I’m a bio machine I consist of uh um wetwear and software and you know whatever else like but I can still perform reasoning so I’m not saying machines can’t reason what I’m saying is the the types of neural networks that we can train today um are not doing the kind of reasoning the effective computation kind of reasoning that we need for general intelligence okay for to solve problems in general in general problems require turning machines okay like they they cannot be just finite state ofata or you know you know fsas with another finite amount of compute or whatever so I don’t know I don’t I don’t see the conflict it’s like yeah I have a genetic endowment and an endowment of teaching I was taught those things that you mentioned you know hertica that we perform um during this activity and come on people you all know that you’re reasoning you know if you’re watching the show you’re probably at least the kind of person who spends some of your life thinking deeply over time about a problem applying rules and heris sixs that’s reasoning that’s what we’re talking about right is this unfolding process I mean you know touring argument to one side the neural networks are still performing in effective computation and here’s the kicker right a subass of effective computations you said before that the reductio ad absurdum is we memorize everything and that’s not reasoning right and then on on on the other Spectrum we do pure reasoning so we build something that can solve the arc challenge that can you know from a base set of knowledge it can Auto magically compose together through meta learning or knowledge acquisition a new model to solve every single problem efficiently because Chet says that the efficiency of of the reasoning process is is intelligence and then we’ve got all of this gray area between the two poles right so the gray area is now I’ve got these reasoning trajectories um and I you know do rhf on my language model and the the language model has intuition has creative intuition so oh this is an interesting problem um maybe I should use this um reasoning Motif maybe I should use this reasoning Motif and and it’s internally doing a tree search right and it’s it’s evaluating all of these different paths and it’s it’s finding those motifs through something akin to a locality sensitive hashing table lookup right so it’s kind of like pulling these things from its memory and if it works why is that not reasoning how is that any different well I mean it’s again maybe maybe we need like a a level a level of of reasoning it’s performing a very shallow a very shallow form right like it’s doing this kind of lookup in its library of templates and applying the closest matches and then you know selecting uh selecting the winner or whatever the one that the one that scores the best okay that’s so has a very very shallow depth let’s say in in kind of computation time it has a very wide like a lot of this maybe one way to think about this that that may be a little bit better um is in the good old SpaceTime sort of tradeoff right like I talked about earlier with with circuitry so you have you have a computational problem and you’re trying to solve it well you can solve it with a small circuit that runs over some let’s say linear amount of time okay or you can solve it with an exponentially large circuit that that solves it in one step right now what I’m saying is um we call the former reasoning so we when it’s something that happens over some like linear amount of amount of time let’s say or kind of like unfolds over time we call that reasoning whereas we don’t typically call the exponentially blown out circuit that does it in one step you know reasoning okay why like why is that fair like why how come why can we do that how come I can call one reasoning and and not the other um the reason why okay is that if you go the route of building exponentially large circuits that can do things in one step the problem is that you have to know ahead of time the the range of the inputs right because I have to build the circuit large enough that it can handle all possible inputs that I will see in the future and in general you don’t know that like I don’t know that okay whereas if you go the route of building the reasoning circuits the things that can perform this this computation over time we do have the luxury of things taking longer than we expected like time as it happens is potentially infinite it so far it keeps unfolding you know we always get another minute another minute another minute and that’s the biggest difference right is either you have to know ahead of time all the input sizes you’re ever going to see um or if you come across an input that oh crap it’s too big well let’s go back and retrain it let’s create gp76 you know cuz we we hit an input that was too big now we got to spend um you know uh5 trillion to um retrain it and then we can go solve this problem well instead if you’d have just spent $100 million building a machine that could you know do iteration for some unknown amount of time you wouldn’t need to go back and do that again you just would maybe go buy some more tapes so it has like a very practical important difference right yeah but I mean something you were saying about that spectrum between you know the the massive blowup Space versus the efficient effective computation this leads to to to my kind of knowledge acquisition definition because I think there is a relationship between knowledge and and reasoning and a lot of it is about the Fidelity of the models yeah the the Fidelity the Fidelity of the models that that you build so I mean obviously if you look on the Stanford encyclopedia philosophy it would say that knowledge is a a Justified true belief but in in this case a model is just is just a high fidelity representation of the world that allows you to perform powerful inferences about the world because the model is correct in in some sense now this model could be a symbolic model it could be a basian model you know we’re not being kind of picity about that but if you actually compose the models correctly and you have uh you know um an efficient accurate inference about the state of the world then you’ve done reasoning well so there definitely was um reasoning that went into the training of 01 right I mean so so on the one hand like yeah there’s there was a reasoning process that built these lookup tables and you know the you know um like useful models and accurate you know High Fidelity models and templates and all that kind of stuff so reasoning went into training it right we’re really just just talking about what’s happening at at runtime right oh yeah but I I think you could make that argument much more strongly about the you know GPT 4 ow that doesn’t have this thing so let’s let’s give an example right um the wed example wed sa of course is a rationalist rationalists tend to believe that there are you know Universal rules about how the world works and they tend to be binary so the examples that wed gives tend to be something something along the lines of you know the pen is in the cup I move the cup over here the pen is over here so this is a deductive closure right so what I mean by that is we have a whole bunch of models right so contains in located at and we know that if the thing you know if the the container contains the thing and it’s located at that then then the thing is located over there so we’ve performed a deductive closure here what we’ve done is we’ve composed together all of these deductive rules in a trajectory a bit like um you know this this Chain of Thought thing and we’ve derived new knowledge about the world now I think that the new open AI model can do that right so it’s going to do a trajectory and it’s going to do I mean of course the the validation step is a prediction so it’s not going to be knowledge in in the domain of certainty but it can do some kind of inference to generate knowledge based on the user input I believe it can do that right so before I talk about that I want to step back slightly because you know the Stanford encyclopedia the philosophy which you mentioned kind of giving the um the standard definition of knowledge right which is like Justified true belief I think um the we true becomes a real sticking point so I think for our purposes here I would slightly change that to Justified useful belief that way we can we can talk about you know 01 so for example I mean uh you know if you if you put some some prompt in information into uh 01 that’s complete lies and and doesn’t correspond to actual reality then it’s not true in the capital true kind of sense but um but in our sort of fantasy world that we’ve created you know it’s true so so in a sense it’s useful for uh you know creating inferences within the prompt space like like that I gave it okay and so all right so we put in some some stuff into um 01 and and you’re saying that because it’s doing this process this kind of multi-step you know process where it applies a template um from our from our like say our starting data our starting axioms plus all the prior knowledge it has that’s baked into the the llm right so it does a computation and it kind of um has some you know State there okay which is derived from that according to these templates and then it kind of repeats that for a while um I think it’s under this definition of you know Justified useful belief is it is it Justified the justification is in the rules that it’s applying these templates okay that’s the source of the the justification um is it useful I mean definitely like 01 provides a lot of useful uh GPT did for that matter too you know it can it can be very useful U maybe with some supervision but we’ll leave that we’ll leave that aside okay um so it’s it’s Justified it’s um it’s uh useful is it a belief um we’ll just say yeah like it’s assigning probability one essentially to each of those you know sort of states right that it that it outputs um uh I I think it’s completely fair to to call that reasoning okay I mean so there are like it’s doing it’s doing these um very limited um short and temporal depth very shallow you know Kinds of Kinds of reasoning okay in a very similar way to the drink machine it’s your gymnasium okay like you put in a quarter it enters the state that I believe one quarter has been inserted Ed you know you put in another one it enters the state I believe two quarters have been inserted you know once you get to a dollar or whatever you press a button the user is indicating they want a uh a a vitamin drink okay like therefore that’s my belief I perform this action right so obviously 01 is far more sophisticated I don’t mean to actually compare it to a drink machine right but I mean I think it’s fair to call this this set of steps and and the knowledge under the kind of modified definition that we gave it um reasoning like I’m I’m fine with that as long as people understand the core point that that is a shadow a shadow of the kind of reasoning that your machine up here can do because your machine up here can do this process for a very very long time okay you can be Andrew WS in your addict for however many years it took him to Sol fir mats last theorum and nobody said up front you’re going to be up in the attic for only 3 months like it turned out it was a lot longer than that right okay so I agree with your with your point so denovo reasoning we are a collective intelligence we have this iterative capability right so there are billions of us on the planet and we are doing this epistemic foraging and we are mically sharing it when we find things but did we did we learn that phrase from from friston from Carl friston was who was it that said epistemic foraging I know he loved that car that phras that was was it him that said epis forging it’s beautiful right it’s beautiful it’s beautiful and and as we’ve discussed of Kenneth Stanley it’s a very creative serendipitous process so we’re this collective intelligence but the thing is you know Ai and humans we there’s a symbiosis between them so every time we learn something right we we write about it we tweet about it and it gets ingested into open Ai and then open AI knows it because this is the thing right right now is as I as I tweeted it’s the best time to spot knowledge gaps in in the new gbt model because it doesn’t know stuff they’ve only bootstrapped it with a bunch of synthetic data the real bootstrapping process is now so they are going to reflect Chain of Thought templates from the users and over time it’s going to learn more and more and more and more and more and of course it’s not going to reason creatively because it doesn’t think the way that that we do but the million doll question is there’s a kind of convex hole around colloquial you know practical reasoning so it will be able to solve you know in the reduced set of effective computations that you talking about it will be able to solve many novel practical problems that are out of distribution and you could say that it is reasoning so the million dooll question is how far does that convex hole get you I think I think there’s two there’s two questions that which is how far how far does it get you if we think let’s think about it this way just to visualize it let’s think about how far it gets us is this expanding ball of you know problem space like how far can it get us but my other question would be how much of it is Swiss cheese okay like like sure the boundary May the boundary of the ball May expand to include like oh you know let’s let’s let’s give the Fanboys you know uh the the uh benefit of the doubt everything we care about you know it’s going to um solve like medical problems and science problems and you know political problems and like all kinds of things like the the ball includes every the boundary everything we care about but there’s going to be tons and tons and tons of holes in there okay because because open AI or whoever AI just yeah we didn’t quite run into that situation yet and so we didn’t patch the fact that it’s missing a template that uh that covers that one because we have to remember these templates they’re so like fine grain and specific like think back to our episode where where we were you and I learned right from bisri that it’s creating this Patchwork honeycomb like you have to think about about it is just this massive honeycomb of all these little cells where it’s learned to kind of fit everything right and and lots of situations are going to arise where when it gets when it gets pinpointed into a particular cell it’s broken so that’s the disadvantage to reasoning based like reasoning that’s very um spatially in the computational sense wide so it has like very very very many many many many many many templates but they’re all like really stupid and sort of shallow you know and then it just tries to find the right one and applies it that’s the disadvantage versus that versus versus a small set of Highly reliable first principles that when iterated give you an answer like one is much more parsimonious right so like the what humans develop is a knowledge Corpus that’s highly parsimonious it contains a small set I mean small using you know versus a versus a neural network it’s it’s there’s obviously you know lots of detail in physics and and biology and Science and whatever but what science does is it finds parsimonious principles um that when applied in sequence and iteration in combination um um reason out you know the answer and and there’s a sense in which that’s much more reliable and doesn’t come as a giant ball of Swiss cheese with all kinds of holes in it right that we don’t even know they’re there like we don’t know the holes there until something blows up and then like we figure out there’s a hole cuz nobody bothered to look it’s kind of interesting you know there’s that Mor ofx Paradox which is something like I think I think I’ve got this right the um you know easy problems are hard and hard problems are easy so ironically uh coding challenges and some of the things that PhD students work on might be easier to solve than a lot of common sense reasoning when you’re dealing with ambiguity by the way this is another reason why have you seen on Twitter many people say oh is uh is Claude 3 done today you know anthropic have have kind of dialed down the complexity of the model today it seems really stupid today it’s like no I’ve got a news flash for you it’s stupid because you’re tired it’s because you haven’t had your coffee it’s getting late at night you’ve now Switched Off you’re not reading what Claud is saying back to you you’re you’re an autopilot you’re just going generate generate generate generate you have no idea like your basically your resonance and your understanding with the model has diverged and that’s why you think it’s stupid so because of this didactic exchange with the language models imagine if they had 10 running in unison right you would no longer be able to understand or follow what was going on they are running out of money because they are serving all of this model infrastructure and you know they’re hemorrhaging money all over the place and what they’ve done is quite genius so they’ve trained these very small models and now the customer is paying for the inference right so they’re scaling inference but what they’re also doing is they’re scaling um the amount of money that they make and they’re effectively turning a capex into an Opex so rather than needing to train these massive models and just you know they have to pay for it whenever the customers use it they can now scale at inference time all of this compute and not only are they transferring the the money of using the models onto the customer they are basically stealing the reasoning motivs from the customers so it seems like a win-win situation right so yeah we um you and I thought would be fun to to kind of just test it a bit and I mean uh you know the claim was that it that it now reasons and so I was just sitting here thinking ah what’s a good good test of reasoning I mean there’s there’s basically already all kinds of tests right like there’s I mean the arc challenge there’s the whatever that other one’s called the simple reasoning test there’s there’s all sorts of them but I just thought a fun one just to to try out would be one of my favorite um brain teasers um so I’ll read this it’s a fun problem um I’ve written it out a bit more verbose than what what I normally would if we were just you know telling a person um but but the idea is uh um to make it a bit more of her Vose to try and help to try and help out 01 a bit so this is a fun brain teaser here it is there is a pillar with four holes precisely aligned at North South East and West positions the holes are optically shielded no light comes in or out so you can’t see inside but you can reach inside at most two holes at once and feel a switch inside but as soon as you remove your hands if all four switches are not either all up or all down the pillar spins at ultra high velocity ending in a random axis aligned orientation you cannot track the motion so you don’t know in which rotation the holes end up versus their position before the spin inside each hole is a switch the switch is either up or down and starts in an unknown State either up or down when you switch when you reach into at most two holes you can feel the current switch position and change it either up or down before removing your hands come up with a procedure a sequence of reaching into one or two holes with optional switch manipulation that is guaranteed to get all the switches either all up or all down in at most six steps note the pillar is controlled by a hyperintelligence that can predict which holes you will reach into therefore the procedure cannot rely on random chance as the hyperintelligence will outwit attempts to to rely on chance it must be a sequence of steps that is deterministically guaranteed to orient the switches all up or all down in no more than six steps that’s fun right okay well let’s run it it’s thinking let’s see how see how it does and this is another interesting thing is that many times these models don’t even think and the amount that they do think seems quite random and when they don’t think you end up with the base model which is even worse than gbt 40 so the whole the whole thinking thing is hidden from you the Chain of Thought even though you’re paying for it and it’s inside the context window of of of this session you don’t get to see any of it you only get to see these little motifs and by the way this ux is really important because when I’m using cursor and you don’t see these little thinking steps flashing up it feels like you’re waiting for a long time so the The Genius of this is is ux that they’ve done you can expand it out and it tells you what the steps are but it doesn’t tell you the actual Chain of Thought behind the scenes so it’s saying understanding the task laying out the obstacles evaluating the switch alignment restating the problem mapping the plan formulating a strategy weighing symmetric patterns mapping out methods ma mapping out possibilities mapping I mean you know I actually Wonder whe whether this is just basically got nothing to do with what it’s actually doing behind the scenes I don’t know anyway it’s it’s come up with an answer in in 57 seconds all right let’s let’s see what it comes up with here all right step one reach into the holes of position north and east action switch both switches up okay reach into North and South switch both switches up I can tell it’s wrong already because every single action is like switching up up up up up up up up right so this doesn’t work uh we can we can post we can maybe post the answer on the YouTube thing or whatever but I’ll just read it out so Step One reach into North and East switch them up step two reach into North and South switch them up um uh step three reach into Northwest switch up step four East South switch up step five East West switch up step Six South West switch up okay why this works let’s see why it thinks this thing which won’t work works by systematically manipul ating all six possible pairs we ensure that every switch is set to the up uh multiple times specifically three times independence from pillar rotation since the pillar’s rotation only changes the position of the holes but not the switches themselves what’s that let’s see covering all the pillars guarantees regardless of rotation every switch will be set up okay so it so it seems to think that uh I guess somehow it thinks that the the switches and the holes are not you know part of each other right like so so maybe we need to we need to explain to it you need to say like no you don’t understand the switches are in the hole in a specific hole and they rotate you know with the hole now we we did this before too um and it didn’t have that misunderstanding last time uh last time it it sort of understood that the the holes and the switches were related um it was just very confused about the fact that it had to be a guaranteed solution it kept thinking it could just randomly you know it could rely on chance right like it’s just going to no if I eventually if I keep switching them up I’m going to hit them all and and so we had to that we modified it now to say do it within six steps which is still more than you need and and it’s still you know not getting it and that was the purpose of your kind of um hyperintelligence constraint here which is to say don’t try and think about this probabilistically you should be able to deduce a precise solution well I could even get rid of that now CU it still didn’t get it and so we had to cut down the number of steps like normally when you tell it to a human they they understand this idea that it has to be a guarantee not a statistical thing and you don’t tell them like any number of steps cuz that’s like a hint you know like six is enough like it’s less than six actually but you don’t want to give it a um you don’t want to give a person a a hint right you just you kind of leave it up to them maybe they think it takes 38 or something but but we had to give this guy hints to try and get him to understand let’s RIS thought for 74 seconds apologies for the misunderstanding earlier let’s revisit the problem with the correct understanding that the switches are fixed within their respective holes and rotate along with the pillar this means after each spin the holes and the switches inside them change positions relative to you so it cannot track which hole is which from one step to the next so take it away all right well you know so let’s see here what’s what’s got so um Step One reach into any two holes I can’t see and set both switches to up okay again reach into any two holes and set both well we can tell us wrong again because it keeps just doing up up up up up so it keeps just setting everything to up this doesn’t work like you know you you so what’s you need to why does it think it works though let read that okay um coverage of all the holes since the pillar can only rotate in four possible orientations and you perform six steps you will have reached into the hole each hole at least once regardless of the hyperintelligence no I mean that’s not even true if there wasn’t hyperintelligence like it spins around and ends in a random position forget that there’s hyperintelligence you know for example if you reach in suppose you and and I’m I hate to give hints here but like we’re going to not tell the answer because we want our users to enjoy it a little bit but you need to rely on symmetries so imagine you reached into opposite holes so you know let’s say East West okay um and you did something and you took your hands out and it spins around and stops again you don’t know where East West is it may still be East West or it may still be north south this is clear from the setup of the problem but this guy seems to think no you can just you know reach into some holes and you’re guaranteed to uh to get all of them eventually that’s not true it’s not true yeah but this is an interesting example because what if we hinted in each I me by the way we tried this the other day even if you Hint it still doesn’t get it right but um you can see how this kind of process works of stealing the reasoning Motif so we could hint that it uses symmetry we could guide it to the solution because that’s what Lang that’s how the language models work basically the reasoning comes from from us we we prompt it you know we do reasoning we prompt it we steer it in the right direction and then open AI will reflect a coot trajectory and they’ll do rlf on it and then next week the model will be able to solve this problem and so on yeah maybe I mean what I’ve noticed too and it’s from us playing around with it is uh it’s pretty good at counting like it understands um count counting things so it understands it does well on problems where it’s like okay you know you’ve got some confusing number of cookies and then you know uh and some confusing combination of people eating some cookies and not other cookies it’s kind of good at counting up um how many things are there and sort of figuring out um what’s left over or that type of thing it gets into problems when you ask it to do sort of sequential reasoning where it has to kind of think about a a let’s say A you know not simple setup that then changes to another step that it then has to reason about the relationship between the prior step and this step some problems were were the time like the time steps are all laid out for it already right where it’s like you know there were four cookies on the table and Sally at the blue one and Jimmy did this and like you lay out all the steps it can kind of Reason through that counting if you ask it to imagine if you ask it to imagine time steps it sort of gets into trouble at that point let’s try it with mini just for a laugh I think mini is very optimized for code so it works great for code very bad for stuff like this right so it’s thinking well I think so this point I just made I think um is something cuz a lot of people let’s suppose we have Ai and it’s it’s doing you know great at certain classes of problems and you want to know what classes of problems it doesn’t do well at um until we get this kind of turning complete thing I rant about all the time it’s not going to do well with things that require imagining across time steps and kind of you know tracking across time steps let’s see what it’s got here I think I saw it was all it’s all up again right it’s all like things you can just keep up or down like it keeps it this guy still thinks you can just sit there and just always switch up up up up up up up you got to do more than that folks that’s the only other hint I’m going to give reely on symmetries and uh GB and so solving this problem requires reaching in the holes yeah solving this problem requires reaching in the holes feeling the current position and making a decision what’s this tp4 let’s see all right let so okay let’s see the the basic idea is to enforce symmetry between the switches as soon as possible so that the hyperintelligence can no longer exploit the differences okay okay that’s that’s good um once you achieve symmetry either all in the same or pairs of identical switches you can eventually mve the switches to a uniform State here’s the guaranteed procedure okay step one it’s already looking more promising so far reach into two adjacent holes Northeast feel the switches if they are um if they are the same both up or both Down lead them as is if they are different one up one down change one to match the other either both up or down okay okay so after this step the two switches in north and east will be the same this is correct until it spins okay now it spins step two reach in into the other two holes South and West it just failed like you don’t know but we we’ll give it the benefit of the doubt you don’t know where South and West are now they’re wherever right you know they’re they’re uh they’re now east and south okay or or west and North who knows repeat the same procedure if the switches are the same leave them if different make them the same either up or down result after this step the two switches in the southwest will be the same uh now you have have two identical pairs of switches okay you don’t but okay um now reach into any two switches that were not paired previously okay yeah it doesn’t work it doesn’t work so yeah let’s TR at least at least this one mentioned some things that were you know useful like I really liked how it it figured out looked up that you can rely on the symmetries to to prevent um chance you know manipulation right what was that that was GPT 40 yeah that was gbt 4 so um okay this is Claude And I think that Claude is still by far the best model even okay well it’s G it’s giv something much more complex here let’s check it out um okay let’s make sure it first digest everything so it knows that we can reach into at most two holes at once we can’t see inside we can feel manipulate the switches the pillar spins randomly if all the switches aren’t aligned after we move our hands yes we can we can’t rely on chance due to the hyper intelligence and due to the limited number of steps FYI we need to solve this okay great let’s see what it’s got Step One reach into North South uh set both switches into the up position okay this is the way I typically solve it it’s same first starting step okay reach into the East West hole set both at this point if all the switches were aligned were done if not the pillar will spin well it didn’t figure out it spins after step one but that’s okay uh step three reach into the North and South holes again so it didn’t figure it out it’s it’s um it just keeps doing let’s see step five reach into okay hold on a second we hold on I it does actually finally do something different let me let’s go back up here CU I need to to follow this and make sure all right so step three um okay so step two reach into East West okay so again these are two that are that are uh across each other so it’s possibly the same result is step one at this point if all switches are align we’re done reach into the North and South holes again if both switches are up leave them okay so once again we’re still a no op so steps one two and three have all collapsed okay okay into you’ve just reached into the same two holes over and over again and made sure they’re up step four reach into East West holes again um if those switches are up leave them okay once again collapse at this point we are guaranteed that the north south pair and the East West pair are aligned internally no we’re not okay um but let’s see what it does step five reach into the Northeast holes if they are the same um both up or down were done if they are different set them both to up okay if we reach this step reach into the South and West holes yeah it doesn’t work so it doesn’t work okay shall We Tell it that it’s spins off to every switch yeah sure try to re try to tell that say uh make sure you understand it spins after every step um so so you don’t know whether north south end up in north south or east west so your steps above 1 two and three could all just be reaching into the same set of holes maybe this will help it okay I apologize I apologize for the misunderstanding in my previous resp respon you’re absolutely correct it always says that it’s very sick of fantic yeah yeah the random spinning sometimes it doesn’t have any choice yeah you go ahead and read yeah the um the random spinning after each step indeed makes the problem more complex doesn’t it just let’s devise a new strategy given this new understanding here’s a revised procedure that will guarantee success in six steps or fewer reach into any two adjacent holes let’s call them A and B switch a and switch B to down step two reach into the same two holes A and B if both switches are in the same position both up and down set both to up and if they’re in different positions leave them as they are gone wrong yes it’s it’s it’s still gone wrong so it still seems to somehow think it can reach into quote the same two holes I mean right so we we can’t seem to teach any of these llms um to solve this problem maybe after we post this I’m sure somebody will uh it’ll be cool to see if somebody com comes up with a prompt that’s that’s you know carefully constructed enough to even get it to to solve this problem I mean with like tons and tons of hints um yeah but but this is kind of what we were saying earlier though that what’s the point in having an autonomous reasoning thing if it doesn’t even it’s going off spinning its Wheels doing stuff without an understanding of the problem and this is so common when you’re generating code or doing a whole bunch of stuff you always have to say no you got this wrong you got this wrong actually I want this now there was this assumption over there mhm yeah I mean I’m not going to go so far as to say it’s it’s not useful at all like I think there is um there is utility let’s say in the coding let’s say in the coding space um it does do pretty well its syntax right like it’s able to write syntactically correct um code even even code in languages that has like atrocious syntax you know like and and so if you don’t know the syntax of a language but you you can like you couldn’t write it syntactically correct but you could at least understand what’s there then you might be able to go through like a a procedure to kind of get some syntactically correct you know complicated typescript or something um or or you know brain for whatever I don’t want to be overly yeah I don’t want to be overly critical that these models do understand code there is there is a semantic margin and even though it’s annoying but this is why it needs to be a supervised exchange right because you need to say you misunderstood yeah you need to say no you misunderstood no this thing should be like that and you go and this is why having a dense context is use is useful because you’re not in every single session having to re-explain and in the future I think there might be a kind of you know databases have a transaction log so all of the you know inserts and deletes and so on they go into this big log and then you can materialize the um the index at any time I think coding might be like that in the future so when these things have a huge context length it will know why the developers did all of these different things because they’ll be this log of all of the prompts right and the and the language model will know everything and you know it it it would actually in some sense have a better understanding than a new developer coming onto the team because it you know even though those prompts will interfere with each other and conflict in in many ways it’s very useful context yeah so a key word that you had said previously autonomous just went in one year and and out the other for me so yeah this you’re exactly right like this is this these are great examples why we can’t just let these things go off autonomously at least for anything we care about right like this human supervision this interactive dialogue is essential and then in that context they can be they can be very useful yeah totally totally agree and I think you’re making an interesting point about um they’re Absolut they they do seem they do seem very good at digesting mounds of messy nasty content that I would rather not have to digest myself so exactly like you said for example if you could throw it into a GitHub right and just say hey yo um O2 will you just um digest all the commit logs man and try to explain to me why this particular function here is so messy maybe it can give you the history of it right well when it was first developed it was meant to support you know Bluetooth classic but then then then it had to re support Bluetooth low energy at the same time and they’re very different protocols and you know Jacob made this change and then Sally found a bug in it and you know maybe it can like really explain all these like nasty Corner cases right that would be pretty useful you know I would I would love that it would be um yeah here’s here’s an example so I do do a whole bunch of um automated llm processing on mlst shows I need to do this because I have editors that go ahead and edit this stuff and of course uh I need to pre- agree in advance what references we’re going to be using and and and whatnot yes you see this is a very interactive process because if you’re not carefully looking at the diff if you’re an autopilot here you’re in a world of trouble that is the first thing so if you think that this is going to be autopilot for programming you’ve got another thing coming so I can accept that and then I might and this is the other thing with these um 01 models right so they’re really good on the first hit but now if I start saying okay um I’ve accepted the code changes but you um stripped out my documentation with examples I want you to extend the original to include the new behavior and update the examples if needed very cool well I think we should round off it’s been a pleasure Dr Dugger we’ll we’ll be doing more of these I hope yeah thanks so much cheers well we uh all right let’s uh St the recording