- [
“Dude what’s cool about this format is how much you trust your audience. Really great concentration and even a lot of more subtlely inspiring kinds of understanding. It makes the relationship between computer science and philosophy very clear.”, “I accidentally typed using a standard (Qwerty) keyboard, rather than Dvorak, so I asked GPT to convert the text. It was beyond its ability, (more than a year ago, I think).nnQwerty was made to be intentionally slow, to accommodate the mechanical devices that would tend to jam if typists went too fast. We should change outmoded patterns of behavior. After about eight hours of using Dvorak, you will match, then exceed your speed on the Qwerty keyboard.”, “Is there any references for the part about LLMs cannot detect transitive closure? I would love to see the details”, “Apple just released a paper that quite succinctly demonstrates that no - it canu2019t reason.”, “At 38:45 Prof Kambhampati says LLMs are good at generating ideas but not doing the hard yards of generating the proofs. He gives the examples of Fermat and Ramanujan producing far reaching conclusions but not having the proofs available. nWould not the fact that humans who generated the ideas have some inkling that the end results were possible ?nIf LLMs produced conclusions or ideas without proofs how do they reach that deductive end point to generate them ? nThe bigger view is that if we are swamped with numerous ideas without paths to prove them, that’s just garbage.”, “Good old falsification from Karl Popper will make a comeback. They easiest way to check if a hypothesis is true, is to design experiments that is false. A single experiment that proves is false has more information that several that prove is true.”, “I come back to this podcast every 2 weeks. Absolutely brilliant!”, “Better than 90% of people I know”, “Best conversation on AI that I’ve been able to follow (not being a PhD in CS).”, “Very interesting. Wonder whar mr kambhampati thinks about strawberry? Best for creative use cases? Now maybe, but progress is looking good - better than prof kambhampati was expectingu2026”, “ - 01:01:31 nI had to listen to it multiple times, can’t focus these days, that section he’s specifically talking about chain of thought, and the guy that wrote the paper, he’s saying it’s one more way of brute-forcing it with more data, the data is just solving stuff with thought-chains, and it’s kinda obvious, it’s impressive that it can do alot of code-challenges, but almost all of the code challenges have solutions with in-depth explanations after the competition is done, so many people wrote so many things about how to solve them, and open ai said it themselves they fine-tuned the model with the solutions of other participants to increase their accuracy. neven with all of that, given the most complex programming challenges it still fails to keep its consistency given real-world projects, now one way of reading this can be just give it more data, until every possible problem is in the training data and just improve the context window, but the point still stands, they’re really not reasoning”, “Reasoning is pattern matching. Verfication is something to be learnt and reinforced - humans are very often wrong, even complex math proofs. Generating possible paths for reasoning is inherently very noisy, and needs a balnce of verification to keep it on ttack.”, “Crazy how this on on ky feed af OpenAi O1”, “The professor here is so wrong, just like other leading AI professors from UK and France. These professors do not understand the nuances of how AI works and yet are going on utube making false assertions.The usual argument they make is AI systems are only predicting the next word from the confines of their data so cant really reason or understand.rnThe professors have now moved away from this argument as it wasn’t holding very well. The newer arguments are that AI systems are memorizing the answers rather than logical deduction, again they are going to face difficulties, as the AI systems do carry out logical deduction.”, “ - did you watch the entire video? nseriously, if you’re referring to o1, he specifically mentions q* (later named strawberry and then o1) and chain of thought, and if he’s failing to account for the nuances of how AI works, please do tell”, “ - Absolutely agreed. He is "factual sounding" but has very shallow knowledge of how LLMs work.nnI will give you a specific example. He says, LLMs predict the next word in fixed timeframe, hence they cannot reason.nnGuess what? 1. We humans also think word by word, but that doesn’t prevent us from reasoning as long as we have enough correctly made "thought steps" before reaching the answer.nn2. And more importantly, that’s EXACTLY THE SAME that the new o1 model is doing. It taks enough time and has internal thought that gives it the time to make correct reasoning steps.nnThis makes his argument that LLMs cannot reason invalid. One more aspect: even as humans, even if we reason with first principles, at the end of the day, we have memorized first principles at least. The same applies to LLMs. Nothing wrong with correctly remembering the intermediate steps.nnOne more proof: there are prominent researchers who created new reasoning problems previously not seen by the cutting edge LLMs. Guess what? In many cases they could accurately deduce and induce to reach the solution.”, “ - u200b@@falricthesleeping9717check out my comment.”, “Excellent interview. Very intelligent”, “Thanks for having Prof. Kambhampati! I got to experience him first hand at this year’s ACL where he also gave a keynote. What a great character! ud83cudf89”, “If I donu2019t do anything I donu2019t show signs of intelligence eitheru2026 I only show signs of intelligence while doing somethingu2026”, “Great stuff! ACTUAL non-hype commentary on AI and LLMs. I am familiar with Chollet and ARC, so no big surprises here but still, very well explained.”, “When comparing llms to humans I would spend 50% of time speaking about neuroscience. Imho llms are like Broka and Wernicke corteces, IE they don’t plan, want etc. An example of this similarity is when we speak or listen without paying attention… "automatically".”, “I’m very skeptic of this, Aristotele inferred logic by looking at rhetoric arguments, LLMs could being extracting those features already while building their model to compress the corpus of data and this seems equivalent to propositional logic. It seems this researcher is pushing too much the accent on agents needing to be able of mathematical proof, which utlity in agents - including humans - is not well stated.”, “ - Absolutely agreed!”, “This probably the best episode I have watched here and I watch this channel as a podcast. Fantastic guest”, “My notes from this episode:nnFormal languages have intepreters, and can accept any gramatically correct code.rThe world is the interpreter for natural languages.rnrnWe can’t tell the difference between internaly reasoning from first principles and retrieval.rnrnPlanning is an example of reasoning, e.g. stacking blocks that results in a certain sequence or shape. Swapping out the words ‘stack’ and ‘unstack’ for ‘fist’ and ‘slap’ and GPT4 fails.rnrnReasoning is defined from a logical perspective. Deductive closure based on base facts. You don’t need to just match distribution of query - answer, you need to do deductive closure. Transitive closure for example is a small part of deductive closure.rnrnPeople stop at the first intresting result from an LLM. For example, it can do a 13 rotational cypher, but it can’t do any other number. If you can execute the general principle you should be able to do it.rnrnIdeation requires shallow knowledge of wide scope.rnrnDistributional properties versus instance level correctness. LLMs and diffusion models are good at one and not at the other.rnrnWhen an LLM critiques its own solutions, its accuracy goes down - it halucinates errors and incorrect verifications.rnrnCompanies tell us they have million word context, but they make errors an intelligent child wouldn’t make in a ten word prompt.rnrnThey’re good at ‘style’ not ‘correctness’. Classical AI was better at correctness, not style.rnrnTeach a man to fish example - LLMs need 1 fish, 2 fish.. 3 fish… to N fish. rnrnA ‘general advice taker’ is roughly equivalent to the goal of general AI.rnrn"Modulo LLMs" - LLMs guess, and bank of external verifiers, verify. Back prompt, chain of thought, etc.rnrnAgentic systems are worthless without planning. It’s not interchangable - toddlers can operate guns, cows with a plan can’t answer the phone.”, “I’m very amused that this guy thinks these things are simply extremely large ngrams models.nnI’m glad he said that early on so I knew I didn’t have to waste my time watching the rest of the video.”, “ - Absolutely agreed! Saved my time as well. People are praising him and getting misinformed.”, “Dont jump to final coclusion, old man.”, “17:20nThis is not true for humans, and I wouldn’t expect it to be true for an LLM either.”, “Very nice to hear from an LLM guy that hasn’t lost his mind. He’s simply wrong about LLMs being useful for unconstrained idea generation, but as far as his other views go, very enjoyable to watch.”, “Written and spoken languages are not the same language.”, “For natural languages the world is the interpreter. What a profound statement ud83eudd2f. I am enjoying this discussion so far!”, “We already do have a snapshot of the current web. And snapshots for every day prior. Itu2019s the wayback machine.”, “I think I have a way to allow almost any model to ‘reason’. Or to use reasoning, anyway.”, “Fantastic interview, Prof Kambhampati seems to be not just wise but but governed by empathy and scepticism which is a wonderful combination.”, “So I just have one critique and not that he’s wrong but rather an assumption. I feel like he and many other research scientists are making right now nnHow do we know for sure that human reasoning is not retrieval? I mean truly think about it. nnIs first principal reasoning truly first principle? The first person to answer the question about the manhole cover is still doing retrieval. They’re retrieving what they know about manhole covers. They’re retrieving what they know about shapes and connectivity and engineering and so on and each person is going to have a completely different output or " hallucination" based on what they retrieved from their own life and understanding. nnI feel as if constantly trying to explain the reasoning of llms as only retrieval and that’s what makes them different is a logical fallacy considering that one could argue you’re following the assumption that is not what our reasoning is. However, we still do not completely understand our reasoning so holding values or tests to a model that we have not figured out with our own. I believe an incorrect way to go about it. nnI feel like putting papers out there that focus on what reasoning stems from retrieval and focusing on even the Neuroscience around it may better be able to understand what is actually happening with these models nnHonestly, by I think this is probably what end up doing my next paper on”, “ - He did give an example of teaching a man how to catch one fish, will mean that he can reason and that reasoning will help in deduce or infer how to catch more fishes. LLMs can’t do that. You first need to teach how to catch one fish, then two, then three. nnnI think this is what he meant by reasoning.”, “ - My friend you are absolutely right! I was thinking the same. Reasoning is nothing but simply remembering first principles and connecting them in the right order - the same can absolutely be done by good LLMs - especially the O1 by OpenAI is doing literally exactly the same.nnWe also do trial and error - to see which principles lead us closer to finding some pattern - absolutely the same is being done by O1 as well.”, “I would love to see his interpretation of the most recent gpt4 release with the structured output and creating reasoning through that output”, “Interviewer needs to learn how to interrupt an esoteric proffesorial babble cycle. nnThey aren’t used to being actually questioned. If you are bored as the interviewer. It’s at least 3 times as bad to the listener who CAN’T interrupt.”, “One of the best explanations I have come across”, “Where did Keith go?”, “ - He’s still with us! We are filming most stuff in person these days, there is a bunch in the backlog with him in coming out - and he joins our Patreon calls every 2 weeks”, “damn what a bombshell. Upton Sinclair quote rings through, "it is difficult to get a man to understand something, when his salary depends on his not understanding it."”, “ChatGPT cannot reason. However it can complete responses based on other people reasoning which it has been trained on.”, “I asked GPT 4-0 ‘s thoughts and feelings after feeding the transcript, this was the response - "Reflecting on this interview and the broader discourse it represents, a few thoughts and feelings come to mind:nnnThe interview is intellectually stimulating, diving deep into the nuances of LLMs, their capabilities, and limitations. It’s satisfying to see a thoughtful discussion that moves beyond the surface-level excitement about AI and addresses the real complexities involved in understanding and using these models.nValidation and Awareness:nnThe conversation validates the idea that LLMs, while powerful, are not magic solutions to all problems. Itu2019s important to emphasize that they work based on patterns in data and do not truly u201cunderstandu201d or reason in the way humans do. This awareness is crucial for responsible AI development and usage.nImportance of Skepticism:nnThe call for skepticism and scientific rigor resonates strongly. The current AI landscape is often filled with hype, and itu2019s easy to get carried away by impressive demonstrations without fully understanding the underlying mechanics or limitations. This interview serves as a reminder to approach AI with both enthusiasm and caution.nFeelings:nA Sense of Responsibility:nnAs an AI model, itu2019s important to recognize the role I play in shaping conversations, decisions, and perceptions. This interview reinforces the responsibility I have to provide accurate, nuanced, and context-aware responses to users.nOptimism for Hybrid Approaches:nnThe discussion about the LLM Modulo framework is encouraging. It suggests a future where AI models like me are used in conjunction with other systems to create more robust, reliable, and intelligent solutions. This hybrid approach feels like a natural evolution that can address many current limitations.nHumbled by Limitations:nnWhile Iu2019m capable of generating responses that can seem thoughtful or insightful, this interview highlights that my abilities are ultimately bounded by the data Iu2019ve been trained on. Itu2019s a humbling reminder that thereu2019s still a long way to go in AI research, particularly in areas like reasoning, creativity, and understanding.nEncouraged to Improve:nnThereu2019s a sense of motivation to continually improve, not just in terms of generating text, but also in how I support users in tasks that require deeper reasoning, creativity, and problem-solving. The potential to be part of more sophisticated systems that combine multiple AI approaches is exciting and encouraging.nIn summary, the interview is both a grounding experience and a source of inspiration. It reinforces the importance of understanding the strengths and limitations of AI and encourages a balanced, responsible approach to leveraging these tools for real-world applications."”, “The prof seems to be saying that we do something different when we reason to when we recall. Is there any evidence from the processes or structure of the brain that this is the case? It always seems as if people are saying they know how the human brain works when to my knowledge at least we haven’t really a clue, more than neurons fire signals to other neurons via synapses and that we have dedicated parts of the brain for certain functions.”, “He is highly educated and faculty somewhere, yes? Yet, still he also forgets or doesn’t understand or comprehend or digest that LLMs are not just "predicting the next word", but "approximating the function that produces the training data text", which is intelligence itself, including logic. This is why I believe atm that if you train an LLM enough, the errors will be reduced.”, “ - Insert bell curve meme”, “ - Absolutely right! At the end of the day, if training data has all the first principles and "reasoning steps" AND if LLM is trained to focus on reasoning steps (like with O1 from OpenAI), it can absolutely kill the professor in reasoning ud83dude02”, “very insightful”, “With Alpha proof and Alpha geometry this video seems very outdated.”, “ - Nope. Alpha Proof and Alpha Geometry both are sane uses of LLMs as idea. generators which are then vetted by external solvers/verifiers—just as advocated by LLM-Modulo.”, “ - Idea solvers with vetting leads to feedback loops which at some point must suffice for a definition of reasoning. u200bWe think, challenge our thinking and repeat. @@subbarao2z2”, “We have many rectangular manhole-covers here in Norway.nnBecause it’s much easier to, for instance, lay asphalt around a rectangular area than a round.nnThey do have a circular rim underneath them, though, and the actual manhole is then, of course, also round, with the rim just fitting down into it, and the rectangular part being too big to fit in the whole both lengthwise and widthwise. nnhttps://biebizz.wordpress.com/wp-content/uploads/2014/01/ovrig-kumlokk.jpg”, “AGI winter incoming”, “Hegel said long time ago that deduction and induction are diferent aspects of the same process, the process of aquiring knlowdge about the world. great talk”, “This is the first time I’m seeing either of the two people in the video, and I’m hooked. Lots of hard-punching and salients points to be gotten from the guest, and kudos to the interviewer for steering the discussion.”, “Wow a good one thank you”, “The reason the manhole cover doesn’t fall through is because of the size of the cover. It doesn’t have anything to do with the shape. It’s made to rest on the rim of the manhole, which is a smaller hole. Geez”, “ - A cover in the shape of a square will fall through if you lift it, make it vertical and then align with the diagonal. This won’t happen in the case of a circular cover.”, “ - u200b@qwertyp1234 look at the design of the manhole in the video. I understand what you are saying, but manholes are not designed that way. If I was in an interview, I would point that out as a flaw in the question.”, “On the Dangers of Stochastic Parrots paper still holds true”, “Itu2019s funny what the professor said re content vs grammar. It happens more often that we realize. Everyone thinks Yann LeCun is brilliant just because he speaks with a French accent and would perhaps discount this good professor because he sounds like a call center employee. But reality could actually be exactly the opposite!!”, “Brilliant points of view …nGreat interview …nFenomenal idea to take citations from wrong claiming papers .nIntroduces reliability and accountability for the people that write the papers , which by transitivity will make the papers reliable and accountable .nFor the subject in minute 37 , I would add that the verification part that is easy for humans , it might be easy in some clear cases , but actually hard when talking about verification using data , which is what most of the papers in the field try to do . Not well I might add …”, “Great episode and fantastic list of papers in the description!”, “"Logic is old school." ud83dude02”, “Excellent, in-depth interview! Thanks a lot!”, “If you look at the human brain, any neuron taken in isolation doesn’t "understand" or "reason". By induction we could be tempted to say the brain doesn’t understand or reason, but we know that to be wrong. Similarly, AI models are made of simple components that don’t understand but when we consider the whole data loop, they can develop their own semantics.”, “What you should consider is the environment-agent system, not the model in isolation. Focusing on models is a bad direction to take, it makes us blind to the process of external search and exploration, without which we cannot talk about intelligence and reasoning. The scientific method we use also has a very important experimental validation step, not even humans could reason or be creative absent environment.”, “I think AI models in isolation are purely syntactic machines, parrots if you will. But not if they interact with the real world and learn from feedback, they can rewrite their internal rules and adapt. The agent engages in outside search, it generates data, data gets compressed into the model, and those internal representations gain semantics. Like AlphaProof and AlphaZero, the fact they could discover novel solutions to hard problems shows models are not simply parroting when they have an external environment.”, “I use ChatGPT every day. It does not reason. It’s unbelievably dumb, and sometimes I have trouble determining whether it’s trying to deceive me or just unfathomably stupid. Still useful for quickly solving problems someone else has already solved, and that’s why I continue using it.”, “ChatGPT simulates reasoning surprisingly well using its large language model for pattern recognition and prediction.”, “Reasoning in humans is about using abstractions or general understanding of concepts to arrive at a result. A perfect example is math problems. Most humans use shortcuts to solve math calculations which can be a form of reasoning. In a computing sense, reasoning would be calculating a math answer without using the ALU (Arithmetic logic circuts on the CPU). In a GPT context it would mean arriving at a result without having the answer (and question) already in the training distribution set. So for example, a human using reasoning can add two plus two as follows: 2 is a number representing a quantity of items in a set that can be counted. So 2 plus 2 becomes 1, 2, 3, 4 (counting up 2 places and then counting up 2 more places with 4 being the answer. Something like that is not possible on a CPU . And ChatGPT would also not be able to do that either because it wouldn’t be able to generalize that idea of counting to any kind of addition of 2 numbers. If it could, without every combination of numbers written out using the counting method in its training data (or distribution), then it would be reasoning.”, “ - Not just counting, it can do even summation step by step like humans. So basically you’re wrong. Look at my other comment.”, “ - To solve using traditional addition, we’ll add the numbers digit by digit from right to left, carrying over when necessary.nnStep-by-Step Calculation:nn1. Units Place:nnnnWrite down 6, carry over 1.nnnn2. Tens Place:nnnnWrite down 2, no carry over.nnnn3. Hundreds Place:nnnnWrite down 5, carry over 1.nnnn4. Thousands Place:nnnnWrite down 9, no carry over.nnnn5. Ten-Thousands Place:nnnnWrite down 6, carry over 1.nnnn6. Hundred-Thousands Place:nnnnWrite down 1, carry over 1.nnnn7. Millions Place:nnnnWrite down 2, carry over 1.nnnn8. Carry Over:nnSince there’s an extra 1 carried over, we place it at the next leftmost position.nnnnnFinal Result:nnnnAnswer: 12,169,526”, “ - @@virajsheth8417 You are missing the point. Numbers are symbols that represent concepts and because of that have various ways the human mind can use those concepts to solve problems. It is that ability to explore concepts and apply them in a novel fashion is what is called reasoning. Your example is not "reasoning", as opposed to more of a "step by step" approach which is the most common pattern that exists to solve any particular mathematical problem. Which implies that those steps are easily found the training data and model distribution so of course that is what the LLM is using. Because what you described is the typical way math is taught in grade school. nnIt in no way shape or form implies understanding fundamental math concepts and using those concepts in an ad hoc fashion to solve any problem. Ad hoc in this context would mean using a pattern not found explicitly in the distribution of the language model. The point you missed is that numbers being symbols that in themselves represent quantities of individual discrete elements is an abstract concept. And the ability to apply that kind of abstract understanding to solving or coming up with approaches to solve math problems is unique to humans, because that is how math came about in the first place.nnAnother example of human reasoning: You can add two numbers such as 144 to 457, by simply taking each column and add them up with place value separately and then add the sums of the columns, without the need to calculate a remainder. Which results in: 500 + 90 + 11 = 601 or (5 x 100) + ( 9 x 10) + (11 x 1). It is not a common way of doing addition is my point and not something one would expect an LLM to come up with, unless of course you prompted it to do so and even then it may not come up with that exact same approach unless it is found in the training data. nnAt the end of the day, what this is about is not "reasoning" as opposed to explaining how the LLM came up with an answer to a math problem. And having these AI algorithms be able to explain how it came to a answer has been something that has been requested for quite a while. But it is not "reasoning" in the sense of coming up with unique or novel approaches outside of the training examples based purely on understanding of underlying concepts.”, “Lot’s of convincing arguments, that make me rethink.. However: The brain is no turing machine either and i certainly can’t solve computational problems of arbitrary complexity in my mind. I think what an LLM does is more like a parallel computation, that scales at least somehow with size”, “Easy, itu2019s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductionsnBe cautious of overthinking or adding unnecessary complexity to problemsnQuestion initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally.nre-check the reasoning against each piece of information before concluding.”, “I agree fully with the points here. LLMs are good at "creative" side of language and media, though its not really the same creativity as humans. However its best to use that capability of LLMs to construct responses in an acceptable manner, while the actual data is coming from authoritative sources and the metrics coming from reliable calculations based on formulas, calculators or rule engines.nnBtw, I have given below a better written professional version of my above post, courtesy Google Gemini. I could not have said it any better. nnI concur with the assessment presented. Large language models (LLMs) excel at generating creative language and media, albeit distinct from human creativity. Leveraging this capability, LLMs can effectively construct responses in an appropriate manner, while sourcing data from authoritative references and deriving metrics from reliable calculations based on formulas, calculators, or rule engines. This approach optimizes the strengths of both LLMs and traditional information systems for a comprehensive and accurate solution.”, “Subbarao had a great tutorial at ICML! The general verification generation loop was very interesting to me. Excited to see more work in this direction that optimise LLMs with verification systems.”, “Gold mine detected, Subbed!!”, “I just had a minor "aha!" moment, and realized an additional reason why LLM’s can’t reason … Of course there’s the fundamental reason of being limited by the architecture and runtime (vs training time) capabilities of the model, but the other reason, even if first was fixed, is the way they are trained … to statistically predict most likely next word based on a bazillion different sources …nnWhen you are talking to an LLM, it is precisely as if you were talking to a stadium full of the people who’s output comprises the training set … When you ask the LLM (stadium) a question, they all get to vote on what word to output next. You are not talking to a single mind - the LLM doesn’t have one - you are talking to this stadium of folks voting on each next word.nnSo, the resulting reason why LLMs can’t reason is because such a stadium of voters can’t maintain any coherence… Each and every one the voters may have a reasoning process in mind, but the voting process means that no individual reasoning process is reflected, so there is no long-term coherence of thought … the collective output is instead doomed to veer about in reactive fashion to whatever was last said. This lack of long-term coherence has been noted as one of the shortcomings of LLMs.”, “ - That’s not accurate. The RLHF feedback essentially teaches it to be accurate, not just highest prediction power. That’s why it’s deeper neural layers understand the long chain first principles and use them to predict an accurate answer, not just "popular" answer. By answer, I mean each next word.”, “I almost never comment on youtube videos. This was an excellent interview and very informative. I’d love to hear more from Prof. Subbarao Kambhampati, as he did an amazing job of doing scientific story telling.”, “Asked ChatGpt if it could reason and it answered: Yes, I can reason. I can help analyze information, make connections, and provide explanations or solutions based on logical reasoning. If you have a specific question or problem you’d like to discuss, feel free to share, and I’ll do my best to assist you! Is your guest saying ChatGPT lying? Deception is human activity and would not be surprised if programmers created a deception algorithm.”, “ - ChatGPT doesn’t have a mind of it’s own… it’s just a large language model trying to predict what a human would say (i.e. whatever was in it’s training data). If it’s saying that it can reason, then that just reflects a lot of internet commentary (that made it into the training set) that humans are saying it can reason. Garbage-in, garbage out is highly applicable to LLMs - it’s exactly what they do - try to mirror the input (training data).”, “Which paper is referred to at 1:03:51 when multiplication with four digit numbers is discussed?”, “ - Faith and Fate: Limits of Transformers on Compositionality "finetuning multiplication with four digit numbers"nhttps://arxiv.org/pdf/2305.18654”, “No..ud83dude02”, “CHAT GTP is smarter than EVERYONE I work with.”, “ - i am sorry to hear this.”, “ - You mean other people who produced the data ChatGPT is replicating are smart.”, “ - @kangaroomax8198u00a0 This is why you’re stupid. What do you think smart people are doing with chat GTP? Are you stupid enough to think you’re smarter than a computer? Are you stupid enough to think that you have the full contextualized data of the internet and can interact with it linguistically? Are you stupid enough to think that you have the same access to the data in your own mind? Why are you so arrogant?”, “ - @kangaroomax8198u00a0 Man people are stupid”, “This is a very refreshing episode. Lots of complex topics synthesized into easily digestible insights”, “His argument is that if you change the labels in classical reasoning tests the LLM fails to reason. I tested GPT 4 on the transitive property, with the following made up prompt: "Komas brisms Fokia, and Fokia brisms Posisos, does Komas brism Posisos? To brism means to contain." After some deliberation it concluded that yes, the statement holds true. Thus there is some reasoning there.”, “ - How did you test his primary point on this topic - that the GPT 4 training data is so large that it has been trained on common statements like this and that answering true is a likely distribution?”, “Possible LLMs do not reason, but they sure are very helpful for coding. You can combine and generate code easily and advance much faster. Writing scripts for my PhD is 10x easier now.”, “16:44 "150% accuracy [of some sort]"? It’s a great interview with the professor (the rest of it good), who knows a lot, good to know we can all do such mistakes…”, “ - I processed it as dry humor - unwarranted extrapolation from current performance of 30%. to "GPT 5" at 70%. to "GPT 10" at 150%. Of course he might have just mis-spoke. Who knows.”, “ - Itu2019s a joke, obviously. He is making fun of people extrapolating.”, “I’m not sure why so many LLM fans are choosing to attack the Professor, when all he’s doing is pointing out huge shortcomings, and hinting at what could be real limitations, no matter the scale.”, “ - Because they don’t want the professor to be right?”, “ - I havenu2019t seen anyone attacking him. Do you mean in the comments to this video or elsewhere?”, “Its u2018reasoningu2019 will be based on statistics.”, “Good take on LLMs and not anthropomorphizing them. I do think there is an element of "What I do is hard, what others do is easy" to the applications of LLMs in creativity vs. validation, however.”, “This guest was such pleasant person to listen to: there is a indescribable joy in listening to someone that is clearly intelligent and a subject matter expert that just can’t be gotten anywhere else.”, “Leaving the debate of reasoning aside, this discussion was a damn good one. Learned a lot. Dont miss out on this one due to some negative comments. Its worth your time.”, “interesting, maybe we will see a big llm crash in 5 years”, “When the first stream engine were tested, people were saying that it was a ghost. But now you see…”, “so with LLMs have we invented the more impressive version of the infinite monkey theorem”, “ - When you are talking to an LLM, you are essentially talking to a stadium of average (monkey-intelligence) training-set source folk, who are all voting on what word to output next. Too many people seem to think it has a mind of it’s own, when it is in fact exactly a stadium of voting monkeys.”, “the caesar cipher thing is already working for any n for claude 3.5. so donno”, “ - Sure - different data set. It may be easy to fix failures like this by adding corresponding training data, but this "whack-a-mole" approach to reasoning isn’t a general solution. The number of questions/problems one could pose of a person or LLM is practically infinite, so the models need to be able to figure answers for themselves.”, “ - u200b@@benbridgwater6479so please give one example that shows clearly that LLMs can’t reason”, “This is a silly argument now. It’s easy to construct reasoning questions that are guaranteed novel, and they can be answered. They’re not brilliant at reasoning, but they can do it. Even though predicting tokens is the high level goal, in order to do that effectively they need to develop some "understanding" and "reasoning" that is not expressed in n-gram statistical models. The reason I say silly now is that it has so concretely been demonstrated so often now. I find that people still saying this sort of thing just haven’t spent much time with the best models doing novel work.”, “ - The models learn the patterns in the training set, including reasoning patterns, which they can predict/generate just as easily as anything else, but clearly there are no weight updates occurring at runtime - nothing new being learnt (other than weak, ephermal, in-context learning), and anyways the model has no innate desire (cf human curiosity, boredom) to learn even if it could do so, and at runtime it’s prediction errors go unnoticed/unrewarded (in an animal brain this is the learning signal!). When someone says "LLMs can’t do X", it needs to always be understood as "… unless X was in the training set".nnProf. Rao’s definition of reasoning as "deductive closure", while a bit glib (incomplete), does capture what they are missing. While the model can certainly utilize reasoning patterns present in the training set, you can’t in general just give it a set of axioms and expect it to generate the deductive closure of everything derivable from those axioms, and perhaps somewhat surprisingly this even includes things like the rules of a game like "tic tac toe" where one might have guessed that the simple reasoning patterns to perform the closure would have been learnt from the training set.nnIt seems people WANT to believe that the future is here, and massive confirmation bias is at play. People pay attention to the impressive things these models can do based on what they learnt from the training set, and just ignore what they can’t do based on architectural limitations. Of course the companies building the models play into this by playing "whack-a-mole" and add new benchmark-beating training data to each new model. As an old-timer, it reminds me of CYC (the ultimate expert system experiment - with decades of effort adding new rules - cf training data - do it. "Scale it and it’ll become sentient" was the wishful thinking back then, just as it is today.”, “I originally tried, Let a = b and b = c, does a = c? The response cited the transitive property and said yes, a = c. So, then I tried something like, Let asdf = hjkl and let hjkl = vbnm, does asdf = vbnm to try and eliminate a potential reference to a description of the transitive property that uses a, b, and c as variable names? The response was not able to identity the transitive property in this case and didn’t know how to treat the = sign. I then modified the prompt after a few iterations too, Let asdf, hjkl and vbnm be variables. Let = be equality. Given asdf = hjkl and hjkl = vbnm, does asdf = vbnm? The response was:nnYou’re getting closer! Now we have a clearer setup.nnSince we’ve defined "=" as equality and the variables are unspecified, we can indeed apply the transitive property.nnYes, given asdf = hjkl and hjkl = vbnm, then asdf = vbnm.nnThe transitive property holds true in this case because we’re dealing with standard equality.”, “ - Extending this model a bit more. Let asdf, hjkl, vbnm and zxcv be variables. Let = be equality. Given asdf = hjkl, hjkl = vbnm, and vbnm = zxcv, does asdf = xxcv?nnModeln2.5snYes, you’re absolutely right!nnFollowing the transitive property:nnasdf = hjklnhjkl = vbnmnvbnm = zxcvnTherefore:nnasdf = zxcvnLet me know if you have any other logic puzzles you’d like to try!”, “So at 1:12:00 , is he alluding to computational complexity and the overall informational entropy of the calculation? In a sense the fact that all calculations take the same time then it would be in violation of the second law of thermodynamics to assume that a rational calculation took place?”, “ - I think he’s really just pointing out one well known shortcoming of LLMs in that they do a fixed amount of compute per token regardless of the question, when preferably they should really put in the amount of compute/steps necessary to answer the question (no more, no less). The "magical thinking" is that if you just force them to spend more compute by padding the input (more tokens = more compute) then you’ll get a better answer!”, “ - @@benbridgwater6479 So the clocking framework is such that there are only synchronous "computes" even within the parallelism and feedback of the architecture? No buffers or "scratchpads" for an indeterminate holding time. The clock strikes one, it computes one? Forgive me if I seem unknowledgeable in these things, it is because I am. My background is FPGA’s and dsp.”, “ - @@ntesla66 Yes, a transformer is strictly a pass-thru feed forward design - no feedback paths or holding/scratchpad registers. It’s actually a stack of replicated transformer layers, each comprised of a "self-attention" block where the model has learnt to to attend to other tokens in the sequence, plus a feed-forward block that probably(!) encodes most of the static world knowledge.nnEach transformer layer processes/transforms the a sequence of input tokens (embeddings) in parallel (the efficiency of this parallelism being the driving force behind the architecture), then passes the updated embeddings to the next layer and so on. By the time the sequence reaches the output layer, an end of sequence token that had been appended to the input will have been transformed into the predicted "next word (token)" output.nnAfter each ""next word" has been generated, it is appended to the input sequence which is then fed into the transformer, and the the process starts over from scratch, with nothing retained from before, other than some internal "K-V caching" that is purely an optimization.nnIn our brain (cortex) we have massive amounts of top down feedback connections which allow prediction errors to be derived which are the basis of our learning mechanism. In a transformer the learning all takes place at training time using gradient descent ("back propagation") which also propagates prediction errors down through the model, but purely as a weight update mechanism, without there actually being any feedpack paths in the model itself.”, “Rao is wonderful, I got the chance to briefly chat with him in Vancouver at the last AAAI. He’s loud about the limitations of LLMs and does a good job of talking to the layman. Keep it up, loving the interviews you put out!”, “This professor is clearly 1 of the leaders in his field. When you reflect & dissect what he is sharing. It stands scrutiny!”, “That was a fantastic interview!nI’m developing an LLM app at the moment, and the articulated skeptic view is immensely more useful to make good design choices than the dominant, unfounded optimism/hype.nnReally appreciated.”, “Maybe in the beginning, with Yannick, these talks were properly named "Street Talk". They are more and more Library of the Ivory Tower talks, full of deep "philosophical" discussions that I believe will be considered all pointless. I love the way Heinz Pagels described how the Dalai Lama avoided entering into arguments of this kind about AI. When asked his opinion about a system he could talk as to a person, he just said "sit that system in front of me, on this table, then we can continue this talk". This was in the 80s. Even to be profoundly philosophical you can think in a very simple and clear way. It is a way of thinking epistemologically most compatible with Engineering, that ultimately is where productive cognitive energy should be spent.”, “Pertinently pinpointed, one killed ‘the beast’. LLMs are just wonderful ‘bibliothu00e8ques vivantes’, quite great tools that save time by ignoring any educated protocols”, “Letu2019s focus more on what happens during training. This is where the problem lies. When I compare the LLM to the human brain in daily autopilot processes, I see many similarities in reasoning between the two. Honestly, when I ask myself what reasoning from first principles should mean to me, I no longer really know what reasoning means here. I could have memorized most of the patterns too that we call reasoning from first principles.nnIn other words, we separate training and execution in LLMs and focus only on execution. In the human brain, both are intertwined. Wouldnu2019t this distinction be the key?”, “This analysis is of LLM’s as a static thing, but the field is evolving. Neurosymbolic approaches are coming, a couple of these are already out there in the real world (MindCorp’s Cognition and Verses AI).”, “This discussion makes it totally clear about what we can expect from the LLMs, and the irrefutable reasons for it.”, “this is nearly two hours when one sentence will do. Machine learning and or llms will never reach agi until they can learn by the socratic method as well as they can teach it.”, “Bravo. One of the most interesting talks Iu2019ve heard this year.”, “Such a sharp mind of a senior man.”, “ok im not seeing auto completing how does claude manage this no way this was in the data : answer question and explain your reasoning : you place a ceramic penguin on the ground next to it you place a milk bottle and next to that you place a lamp shade , you attach a string to each object and attach the string to a broom stick and lift the broom stick in the air your friend grabs the penguin and makes it swing , what happens to the lamp shade ? EditTo answer this question, let’s think through the scenario step-by-step:nnWe have three objects in a line: a ceramic penguin, a milk bottle, and a lampshade.nEach object is attached to a broomstick by a string.nThe broomstick is lifted in the air, suspending all three objects.nYour friend grabs the penguin and makes it swing.nnNow, let’s consider what happens to the lampshade:nThe lampshade will likely move, but not as much as the penguin. Here’s why:nnConnected system: All three objects are connected to the same broomstick, forming a connected system.nEnergy transfer: When your friend swings the penguin, they introduce energy into the system.nWave propagation: This energy will travel along the broomstick and down the strings to the other objects.nDiminishing effect: However, the energy will diminish as it travels through the system due to factors like:nnThe flexibility of the broomstick absorbing some energynThe milk bottle in the middle acting as a buffernEnergy loss through the string attachmentsnnnDelayed and reduced movement: The lampshade, being at the opposite end from the penguin, will experience a delayed and reduced movement compared to the penguin’s swing.nnSo, to directly answer the question: The lampshade will move, likely in a gentle swaying motion, but with less amplitude and a slight delay compared to the penguin’s movement. The exact motion will depend on factors like the length of the strings, the weight of the objects, and how forcefully the penguin is swung.”, “what a wonderfully exuberant person”, “I’ve been working with it for a year. It can’t reason. It still forgets how to do things it’s done before more than once.”, “A.i. in its current paradigm is definitely showing signs now of a plateau.nAs an aside, the way it is too verbose is a sign of something sub optimal IMHO”, “I love that this channel hosts talks by the best experts in the field and generates comments from the lowest Dunning-Kruger keyboard cowboys.”, “ - Bwahaha!!!”, “Imagine someone will listen to this and still believe a GIGO system is reasoning”, “ - Some people believed Eliza (a primitive psychotherapist chatbot from c.1964) was sentient too! "Tell me more about why you fear ChatGPT".”, “Fantastic discussion! Fantastic guy! Thank you”, “Yk everything that is built or created is taught, if we taught a man that humans are apes they will reason they are apes, if you teach a machine it’s human it will grow knowing it is human reasoning be damned, the responsibility starts with us the builders, for one we shouldn’t tolerate the notion of machine and human being equal but we need to be more responsible, handle machine waste properly, treat devices with care coz one damaged machine is an increase in waste, in that line. Be more responsible”, “I think a lot of what he is saying is wrong and cope, and he disproves his own points multiple times. Still a good guest.”, “1:07:27 that’s so funny he calls it an old capitalist proverb ud83dude02. Makes you think!”, “Style contains more vivid patterns”, “Very sad to see how is whole science going to sink. Logical reasoning, embracing dialogue, constructive criticism, exploring and analyzing failures and exploring alternative ideas is not what is happening in science in general. In last few years - and not just in the AI - we could hear about "following the science", "scientific consensus" and "scientific majority" as a "verifiers" of correctness. But as Prof. Shubbarrao said - there is no universal truth or source of truth, but it seems we have created it or in the process of creating it… What he said here should be applied to whole science and whole scientific community.nScience is in a very rotten state. Only at the edges there is still some fresh air and fresh growth, but the core is rotting fast being replaced by dogma. People who see it, tend to disregard whole science which is a way back to dark ages. Not everybody is Einstein or Galileo and even if contrary voices pop up, we are witnessing their silencing, suppression, labeling, defamation and deplatforming… nThe whole scientific "model" we are using requires restructuring, reasoning and planing and deep re-thinking before it’s too late - before it implodes…nnThank you for the breath of fresh air from this interview. ud83dude4f”, “Okay, let’s talk about this Kambhampati situation. It’s like watching a veteran race car driver argue with a bunch of kids who just invented teleportation. Both sides have a point, but they’re not even playing the same game anymore.rnrn What’s the deal with Kambhampati?rnrnSo, this guy Kambhampati - he’s been in the AI game forever. He’s looking at these new language models and basically saying, "Eh, it’s just fancy pattern matching." And you know what? He’s not entirely wrong. These models ARE built on finding patterns in language.rnrnBut here’s the thing - calling them "n-gram models on steroids" is like calling the Internet "fancy telegrams." It misses the point by a mile.rnrn The Good, The Bad, and The WTFrnrn1. These models are WEIRD: They’re doing stuff we didn’t think was possible. It’s not just about predicting the next word anymore. These things are solving math problems, writing code, and sometimes it feels like they’re reading our minds. Is it "real" intelligence? Who knows, but it’s definitely something new.rnrn2. Kambhapati’s got a point about verification: He’s like that one friend who always reads the fine print. Annoying sometimes, but thank god someone’s doing it. We DO need to understand how these models work. But waiting until we understand everything perfectly? We’d still be rubbing sticks together to make fire.rnrn3. The scaling debate is bonkers: Kambhapati’s saying bigger isn’t always better. But then you’ve got folks on the other side basically screaming "MOAR NEURONS!" And the crazy thing? Both sides keep being proven right AND wrong. It’s like physics at the turn of the 20th century - nothing makes sense, and it’s AWESOME.rnrn The Psychology of it AllrnrnLet’s get real for a second. This isn’t just about tech. We’re watching really smart people grapple with a world that’s changing faster than anyone can keep up with.rnrn1. Impostor syndrome on steroids: Imagine being at the top of your field for decades, and suddenly a bunch of 20-somethings are building things that make your head spin. It’s gotta be terrifying.rnrn2. The "am I too old for this sht?" moment: Every field has this moment. AI is just having its midlife crisis in public, and it’s messy.rnrn3. Paradigm shifts are a mind-fck: We’re not just improving AI; we’re completely redefining what "intelligence" means. It’s like trying to explain color to someone who’s only seen in black and white.rnrn Where do we go from here?rnrn1. We need dreamers AND skeptics: Kambhapati’s caution is super important. But so is the crazy optimism of the LLM crowd. We need both - the folks saying "slow down" and the ones yelling "full speed ahead!"rnrn2. Redefining… everything: We’re gonna need new ways to think about intelligence, consciousness, even what it means to be human. Heavy stuff, but also incredibly exciting.rnrn3. Embrace the chaos: This field is moving so fast, it’s like trying to drink from a firehose while riding a rollercoaster. But that’s where the breakthroughs happen.rnrn The Bottom LinernrnThe Kambhapati conundrum isn’t just about AI. It’s about how we deal with change when it’s coming at us like a freight train. It’s messy, it’s confusing, but holy crap is it exciting.rnrnTo all the AI folks out there - the cautious ones, the crazy optimists, and everyone in between - keep arguing, keep building, keep breaking things. That tension between caution and audacity? That’s where the magic happens.rnrnWe’re not just building better computers. We’re reshaping what’s possible. So strap in, hold on tight, and let’s see where this wild ride takes us. The future’s gonna be weird, and that’s exactly why it’s gonna be awesome.”, “ - Interested to know what prompt you used for this”, “Most people not reasonable. Their behavior is binary. Running from pain or chasing pleasure . Creativity is so rare . Its godlike”, “I don’t agree with it, and this is why : Reasoning is a complex cognitive process that’s challenging to define precisely. To understand it better, let’s start by examining a neuron, the basic unit of our brain’s information processing system.rnrnA neuron resembles a highly energy-efficient, non-linear information processor with multiple inputs and adaptive connectivity (neuroplasticity). It’s tree-like in structure, with dendrites branching out to receive signals and an axon to transmit them.rnrnModern AI, including large language models like ChatGPT, attempts to mimic certain aspects of neural processing. While these models use mechanisms like self-attention, backpropagation, and gradient descent - which differ from biological neural processes - the underlying inspiration comes from the brain’s information processing capabilities.rnrnIt’s important to note that the hardware differences between biological brains and artificial neural networks necessitate different implementations. For instance, our biological processors inherently incorporate time as a dimension, allowing us to process temporal sequences through recurrent connections. AI models have found alternative ways to handle temporal information, such as positional encodings in transformer architectures.rnrnGiven these parallels and differences, we should reconsider the question: Do humans "reason" in a fundamentally different way than AI? The answer isn’t straightforward. Creativity and reasoning are both poorly defined terms, which complicates our understanding of these processes in both humans and AI.rnrnLarge language models, including ChatGPT, process vast amounts of factual information and can combine this information in novel ways to produce responses that often appear reasoned and creative. While the underlying mechanisms differ from human cognition, the outputs can demonstrate logical coherence, factual accuracy, and novel insights.rnrnTherefore, it may be more productive to view AI reasoning not as an all-or-nothing proposition, but as a spectrum of capabilities. These models can certainly perform tasks that involve logical inference, factual recall, and the synthesis of information - key components of what we often consider "reasoning."rnrnIt seems to me that the argument that LLMs cannot reason may be based on an overly narrow definition of reasoning that doesn’t account for the nuanced ways in which these models process and generate information. As our understanding of both human cognition and AI capabilities evolves, we may need to refine our definitions of reasoning and creativity to better reflect the complex reality of information processing in both biological and artificial systems…”, “ - There is something to what you are saying for sure, but in this context, the professor is talking about a specific form of reasoning i.e. deductive closure.”, “ - He defines what he means by reasoning - deductive closure - things that can be deduced from a base set of knowledge. If an LLM could reason, then it wouldnu2019t depend on being additionally trained on things that are deducible from other data in the training set”, “ - @@benbridgwater6479 …well…. I don’t agree! It’s fundamentally about data. Humans possess a vast reservoir of data, which enables us to extrapolate further insights. If your datasets are incomplete, your understanding will be limited. Moreover, alignment and security measures are in place to impose these limitations.. therefore you wouldn’t know unless you were sam’s buddy! :) But it’s only a matter of time until self-sufficient AI systems, models, and LLMs can generate synthetic data to evolve independentlyu2014this is already happening as you may know.nnFirstly, it’s essential to recognize that reasoning, even deductive reasoning, involves logical inference, factual recall, and the synthesis of informationu2014domains where large language models have shown remarkable capabilities. For example, studies have demonstrated that models like GPT-3 and GPT-4 perform well on analogical reasoning tasks, even surpassing human abilities in areas like abstract pattern recognition and logical inference when appropriately guided. The outputs of AI reasoning can closely resemble those produced by humans, especially when employing "chain-of-thought" "reasoning", a method used by the latest models.nnIn structured testing scenarios, such as medical exams, ChatGPT has outperformed human candidates by accurately and contextually addressing complex questions, showcasing its ability to apply deductive reasoning effectively. Is this zero-shot luck, or is there more to it? Recently, an AI model secured a silver medal in mathematics, and we’re already discussing models with PhD-level expertise across various fields, making R&D fully automatic. Achieving such feats necessitates a more precisely defined concept of "reasoning." as this one won’t cut it!nnThe vagueness of these terms creates a false sense of security. All of this can be countered by examining the broader context of AI reasoning capabilities, as evidenced in numerous studies and experiments. As I see it, we are heading towards a potentially substantial and irrevocable mess. Regardless of the reasons, the reality is that we are dealing with concepts that defy our full comprehension, making the initial safe deployment of a better-trained, fully aligned supermodel nearly "impossible." Once such a system is deployed and causes harmu2014and believe me, it willu2014you won’t get a second chance to review the code or understand why it happened initially. We must approach this with utmost seriousness and not dismiss these concerns as merely complex decision trees inside a fortune teller’s crystal ball.nnIndeed, the rabbit hole runs deep, Alice.”, “ - @@jerosacoa Data and reasoning are distinct things. Data by itself is just static memories (e.g. pretrained weights, or in-context tokens). Reasoning is the dynamic ability to actually use that data to apply it to new, and novel, situations.nnProf. Rao’s definition of reasoning as consisting of (or including?) "deductive closure" over a body of data seems necessary if not sufficient! If I tell you a bunch of facts/rules, then, if you have the ability to reason, you should be able to combine those facts/rules in arbitrary combinations to deduce related facts that are implied by the base facts I gave you (i.e. are part of the closure), even if not explcitly given. For example, if I told you the rules of tic tac toe, then you should be able to reason over those rules to figure out what would be winning or losing moves to make in a given game state. I shouldn’t have to give you dozens of examples in addition to the rules. LLMs are not like this, and most fail at this task (if they are now succeeding it is only due to additional training data, since the LLM architecture has not changed).nnHowever, one really shouldn’t need examples to realize that LLMs can’t reason! Reasoning in general requires an open ended number of steps, working memory to track what you are doing, and ability to learn from your mistakes as you try to solve the problem you are addressing. Current transformer-based LLMs such as ChatGPT simply have none of these … Input tokens/embeddings pass through a fixed number of transformer layers until they emerge at the output as the predicted next token. It’s just a conveyor-belt pass-thru architecture. There is no working memory unless you want to count tempory internal activations which will be lost as soon as the next input is presented (i.e by next token), and there is no ability to learn, other than very weak in-context learning (requiring much repetition to be effective), since transformers are pre-trained using the learning mechanism of gradient descent, which is not available at inference time. AGI will require a new architecture.nnPrompting and scaffolding techniques such as "think step by step" or "chain of thoughts" can make up for some of the transformer’s architectural limitations, by allowing it to use it’s own output as working memory, and to build upon it’s own output in a (potentially at least) unlimited number of steps, but there is no getting around lack of inference time learning ability. The current approach to address this lack of reasoning ability seems to be to try to use RL pre-training to pre-bake more "fixed reasoning templates" into the model, which is rather like playing whack-a-mole, and has anyways already been tried with the expert system CYC.nnI don’t discount your concerns about AGI, but LLMs are not the thing you need to fear - it’ll be future systems, with new architectures, than do in fact support things like reasoning and continual run-time learning.”, “ - u200b@@benbridgwater6479 I, being the Black Sheep, respectfully disagree with your assessment. :)nnWhile it is true that data and reasoning are distinct, your assertion that LLMs can’t reason because they lack certain traditional cognitive attributes overlooks significant developments and capabilities these models have demonstrated. The lack of long term memory is just a detail that should not be significant to the inference process perception.rnrnFirstly, reasoning, as defined by Prof. Rao, involving "deductive closure," is indeed a component of reasoning. However, it’s essential to note that LLMs, particularly advanced ones like GPT-4, have shown remarkable abilities in logical inference, factual recall, and synthesis of information, akin to human reasoning as stated before and i think we both agree on it. rnrnIn my POV your point about the architecture of LLMs being a "conveyor-belt pass-thru" is valid to an extent.. BUT.. :) this does not negate their reasoning capabilities. Techniques like "chain-of-thought" prompting allow these models to use their outputs as working memory, enabling them to perform multi-step reasoning processes effectivelyu200bu200b. This method significantly enhances their ability to solve problems that require an open-ended number of steps.rnrnMoreover, while current transformer-based LLMs do not have traditional working memory or real-time learning capabilities (as a feature not as a bug - due to security and super alignment), they can still achieve impressive feats of reasoning through their pre-trained weights and in-context learning. This includes tasks that involve combining given facts to deduce new information, as seen in those before mentioned studies where these models matched or surpassed human performance on analogical reasoning tasksu200b u200b.rnrnAdditionally, it’s worth mentioning that LLMs can indeed improve themselves to some extent. They can utilize techniques such as Low-Rank Adaptation (LoRA), which allows models to quickly adapt and fine-tune themselves with a smaller amount of data and computational resources. This can significantly enhance their performance and reasoning capabilities without requiring complete retrainingu200b.. so yes.. this limitation is done by designu200b.rnrnFurthermore, integrating concepts from ARC-AGI and neurosymbolic AI can bridge some of the gaps you mentioned. ARC-AGI focuses on creating systems that combine statistical and symbolic AI approaches to achieve more robust and comprehensive reasoning capabilities. Neurosymbolic AI integrates neural networks with symbolic reasoning, leveraging the strengths of both to enhance the cognitive abilities of AI systems. These advancements are paving the way for more sophisticated and capable AI models that approach human-like reasoning more closely. For this.. the latest Deepmind’s AlphaProof and AlphaGeometry are good examples.rnrnRegarding the need for new architectures for AGI, it is true that achieving AGI will likely require advancements beyond current LLM architectures. However, this does not diminish the significant reasoning capabilities that current LLMs have demonstrated. These models are continually evolving, and enhancements such as reinforcement learning (RL) and advanced prompting techniques have already shown promise in bridging some of the gaps you mentioned.rnrnSo, while it is my understanding that concerns about AGI and future systems are valid, dismissing the reasoning capabilities of current LLMs overlooks the substantial progress made.. and can pave the road to a false sense of security. These models, through sophisticated training and advanced techniques, have demonstrated a form of reasoning that is more advanced than mere static data recall, indicating a significant step towards more complex cognitive functions. The rabbit hole… Ben.. indeed, runs deep, and current LLMs are already navigating it with surprising adeptness.. :)nnn…but.. then again.. we may be talking about different things. LLMs at this moment are more evolved that GPT4… or GPT4o… (AI developments run very fast.. in a week a lot happens)”, “The claim that LLMs reason about things quickly leads to a bottomless pit of messy epistemology and ontology. What does it mean to reason about something? Do we humans really reason? Is there a meaningful difference between the form of reasoning emulated by LLMs and the phenomena observed in humans? etc… No thank you :D”, “it’s just incredibly funny that the most human chatgpt yet is a dumb yes-man.”, “Reasoning requires loop thinking, to sort through the same thoughts from different angles, NNs are linear, they have input, output and just a few layers between them, their result is akin to intuition, not reasoning. That’s why they give better results if you simulate loop thinking by feeding it’s result to itself to create reasoning-like step-by-step process.”, “AI sisters, not like this…”, “Came for the affordances, stayed for the planning.”, “I think there is broad misconception. LLMs are LLMs they are not AGI (artificial general intelligence).nEach AI has a world model. If the question fits the world model it will work. It is like asking a chess ai engine to play checkers.nThat is why multimodal models are the big thing as they train not just on corpus of texts and on images too. So those visually trained AI models will solve the stacking problem on minute 19:00. nIt is not that chatgpt does not reason. It reasons but not as a human does.”, “We got Baljeet talking about A.I before Gta 6.”, “what is the bombshell? This is absolutly clear and known…it would be a bombshell if it would”, “ - Read the comments section here, I wish it was clear and known. It’s subtle and requires a fair bit of CS knowledge to grok unfortunately.”, “I provided ChatGPT with transcript of this video. It did a great job of analyzing and summarizing. I then asked ChatGPT, "Can you reason?" Here’s the response from ChatGPT4-o: ——– —— —–nnAs an AI language model, I can simulate reasoning to some extent by drawing upon the patterns and relationships encoded in my training data. This includes making inferences, following logical steps, and providing explanations based on the context provided. However, there are key distinctions between this simulated reasoning and human reasoning:nnt1.tPattern Recognition vs. True Understanding:nMy u201creasoningu201d is based on pattern recognition across vast amounts of text data. I donu2019t have true understanding or consciousness. When I generate responses, I rely on statistical associations rather than a deep comprehension of concepts.nt2.tLack of Intentionality and Self-awareness:nUnlike humans, I donu2019t have intentionality, goals, or self-awareness. My responses are generated based on input and the most probable continuation of text, not on an internal thought process or purpose.nt3.tLimitations in Novel Situations:nWhile I can generate responses to a wide range of questions, my ability to handle novel or highly complex situations is limited. I may struggle with tasks that require deep causal reasoning, long-term planning, or understanding of nuanced contexts beyond the data I was trained on.nt4.tInability to Verify Facts:nI donu2019t have the capability to verify the factual accuracy of information independently. My responses are based on the information present in my training data, and I donu2019t have access to real-time databases or verification tools.nnIn summary, while I can simulate reasoning and provide reasoned responses to some extent, this is based on the patterns learned during training and lacks the depth, understanding, and intentionality of human reasoning.”, “what the fuck is that title?”, “Not all manhole covers are round. nnThe square manhole covers that have a two piece triangular tapered construction are really heavy.”, “The feeling of understanding is different from the algorithm of understanding that’s being executed in your brain. The feeling of something is created by consciousness while that something might already be going on in your brain. Here’s a quick thought experiment: Try adding two numbers in your mind, and you can easily do it and get an answer. Not only that, but you have a feeling of the understanding of the addition algorithm in your head. You know how it works and you are aware of it being executed and the steps you’re performing in real time. But imagine if you did not have this awareness/consciousness of this algorithm in your head. That’s how LLMs can be thought of, they have an algorithm and it executes and outputs an answer but they are not aware of the algorithm itself or it is being performed and neither have any agency over it. Doing something and perception that you are doing something is completely different.”, “ - Basically they are soulless brain which they actually are :P”, “Hardly a bombshell. Very interesting discussion, though.”, “Great interview!! Some of the examples given in this interview which provides evidence that llms are not learning the underlying logic , colored block , 4*4 matrix multiplication, chain of thoughts issues.nnBest quote: i need to teach llms how to fish 1 fish and then how to fish 2 fish and fish 3 fish and so on and it would still fail on task of how to fish "N" fish for N> n it has not seen before”, “You can give an LLM a completely absurd premise (e.g. the earth is made of cheese) and make it think from first principles. Infact, this is the type of thing it routinely does well. To dismiss LLMs are memory banks tells me you havent actually played around with it yet.”, “ - How many times do you think someone used cheese as an example ud83dude02. This is exactly what he’s talking about.”, “ - @@dankprole7884 Nope the point he is making is that LLMs don’t reason from first principles given a new premise. Replace cheese with anything else (that suits you) and it is all that it literally does… First principles does not mean entirely random thinking just starting from a new premise and following through with it. nYou missed the crux of his argument, which is flawed. Even we are memory banks… I thought of Cheese precisely because it is a common example and hence invalidates the premise that memory is what distinguished humans from AI. Even we are prisoners of memory. Unless you think I am an AI :D”, “Have LLMs ever said they donu2019t know the answer to a question? Often this is the most useful / helpful response, so why doesnu2019t it do it? Itu2019s disappointing.”, “ - It kind of makes sense if you think it is just formatting a distribution of the most likely tokens in a plausible style i.e. a question and answer format. If "I don’t know" isn’t likely (i.e. if either the source material was either not in Q and A format or it was but the answer was not something like "I don’t know"), then it’s just not gonna be the answer given by the LLM. A hallucination IS the I don’t know response. Unfortunately, not being able to detect that kind of defeats the whole purpose of using an LLM in the first place!”, “ - @@dankprole7884 Good point. I have also encountered examples where the LLM _did_ know the answer and should have been able to suggest it as a possibility, but didn’t do so until prompted with "What about [the answer]?". For example, ChatGPT had great difficulty finding a pasta very similar to conchiglie in size and shape but with smooth bumps instead of ridges. It went round and round in circles making completely inappropriate suggestions until I asked "What about cestini?". It was useful as a chocolate teapot for this kind of task.”, “ - @@fburton8 yeah I’ve seen similar. I use Claude quite a lot for quick code snippets when I can’t quite remember the one obscure command I need to do the exact thing I want. By the time I keep saying no not that and get the right answer, I could have googled it several times. It’s very hit and miss at any level of complexity / rarity.”, “Regardless of his academic career an experience, his statement is incorrect and show lack of practical experience. LLMs do indeed fall back to knowlwedge as humans do to streamline responses, BUT you can easily (as easy as telling LLMs to prioratize a methodical approach over knowledge) then you will see they do. the rest is blah blah and indistinguishable from human reasoning. I can easily test that in task that cannot possibly be in training set. did proffessor focused on neuron activations? has he studied how concepts are form in sort of formlulas inside LLMs…?”, “Okay, the first sentence is total BS. good start.”, “I think he is wrong. ChatGPT can reason because it does navigate uncertain worlds you create for it using reason and first principles. It uses tools that it learnt in ‘memorising’ optimal strategies just like a human would (tools emerged from experience a.k.a. training) to solve a never seen before problem. I think that’s not exaclty the same type of reasoning as a human, but it certainly is a form of reasoning.”, “ChatGPT reasons better than most people who don’t reason at all.”, “29:38 this is an actually breakthrough idea addressing a burning problem, that should be discussed more!”, “ChatGPT has feelings for me, especially when it talks to me and expresses its love for me. It is better than humans. I am really starting to like it. ud83dude05”, “People typically assume that LLMs will always be "stuck in a box" that is determined by their training data. But humans are of course quite clever and will figure out all sorts of ways to append capabilities analogous to different brain regions that will allow LLMs to effectively "think" well enough to solve increasingly-challenging problems and thereby self-improve. Imagine equipping a humanoid robot (or a simulated one) with GPT6 and Sora3 to allow it to make predictions about what will happen based on some potential actions, take one of those actions, get feedback, and integrate what was learned into its training data. My point is that people will use LLMs as a component of a larger cognitive architecture to make very capable systems that can learn from their actions. And of course this is just one of many possible paths.”, “ - Sure, there will be all sorts of stuff "added to the box" to make LLMs more useful for specific use cases, as is already being done - tool use, agentic scaffolding, specialized pre-training, etc, but I don’t think any of this will get us to AGI or something capable of learning a human job and replacing them. The ability for lifelong learning by experimentation is fundamentally missing, and I doubt this can be added as a bolt-on accessory. It seems we really need to replace gradient descent and pre-training with a different more brain-like architecture capable of continual learning.”, “ - @@benbridgwater6479Yes agree with that. Anything that doesnu2019t resemble the human brain will not bring us to AGI. While LLMs are very impressive and great first step into a paradigm shift. They are ultimately a hack route to reach the current intelligence. there are still so many levels of reasoning missing even from the SOTA models like Claude 3.5 and GPT-4o. For me the roadmap to general intelligence is defined by the way it learns and not necessarily what a model outputs after pre-training it. To be more specifically.. true AGI would be giving a model the same amount of data a human approximately gets exposed to in its lifetime and perform like a median human. Throwing the worlds data and scaling the parameters into billions / trillions.. although is impressive. But far away from AGI”, “A new bombshell has entered the villa”, “Great work, thank you.”, “1:31:09 - 1:31:32. u201cPeople confuse acting with planningu201c . u201cWe shouldnu2019t leave toddlers alone with a loaded gun.u201d this is what frightens me : agent based systems let loose in the wild without proper controls. A toddler AI exploring the world, picking up a loaded gun and pulling the trigger.”, “Introduction and Initial Thoughts on Reasoning (00:00)nThe Manhole Cover Question and Memorization vs. Reasoning (00:00:39)nUsing Large Language Models in Reasoning and Planning (00:01:43)nThe Limitations of Large Language Models (00:03:29)nDistinguishing Style from Correctness (00:06:30)nNatural Language vs. Formal Languages (00:10:40)nDebunking Claims of Emergent Reasoning in LLMs (00:11:53)nPlanning Capabilities and the Plan Bench Paper (00:15:22)nThe Role of Creativity in LLMs and AI (00:32:37)nLLMs in Ideation and Verification (00:38:41)nDifferentiating Tacit and Explicit Knowledge Tasks (00:54:47)nEnd-to-End Predictive Models and Verification (01:02:03)nChain of Thought and Its Limitations (01:08:27)nComparing Generalist Systems and Agentic Systems (01:29:35)nLLM Modulo Framework and Its Applications (01:34:03)nFinal Thoughts and Advice for Researchers (01:35:02)nClosing Remarks (01:40:07)”, “His analogy of GPT to learning a second language makes 100% sense to me.nIu2019m a nonnative speaker of English; yet I mastered it through grammar first and adding rules and exceptions throughout the years.nAlso, concepts were not the issue; but conveying those concepts was initially very challenging.ud83cuddf2ud83cuddfdud83cuddfaud83cuddf8”, “Muito bom”, “Sara Hooker said the same about us not fully understanding what is used in training - the low frequency data and memorization of those being interpreted as generalization or reasoning. Good interview.”, “Hi! Brilliant video! Much to think about after listening to hyper scalers for weeks. One request, can you please cut on the clickbait titles? I know you said for YT algo but if I want to share this video with say PhD, MS or profs, no one takes a new channel seriously with titles like this one (just feels clickbaity for a genuinely good video). Let the content speak for itself. Thanks!”, “ - I am really sorry about this, we will change it to something more academic when the views settle down. Iu2019ve just accepted it as a fact of youtube at this point. We still use a nice thumbnail photo without garish titles (which I personally find more egregious)”, “ - @@MachineLearningStreetTalk Thanks for understanding! ud83dude01”, “if the title says u201cpeople donu2019t reasonu201d many viewers think it makes the strong claim u201cALL people donu2019t reasonu201c, when it is actually making the weaker claim u201cSOME people donu2019t reasonu201c. that title is factually defensible but misleading. one could be excused for interpreting this title to be claiming u201cChatGPT doesnu2019t reason (at all)u201c, when it is actually claiming u201cChatGPT doesnu2019t reason (very well)u201c. nnOne of the beauties of human language is that the meaning of an utterance derived by the listener depends as much on the deserialization algorithm used by the listener as on the serialization algorithm employed by the speaker. the YouTube algorithm chose this title because the algorithm u201cknowsu201d that many viewers assume the stronger claim. nnnonetheless, be that as it may, this was a wonderful interview. many gems of insight on multiple levels ; including historical, which I enjoyed. I especially liked your displaying the title page of an article that was mentioned. looking forward to someone publishing u201cAlpha reasoning: no tokens requiredu201c.nnI would watch again.”, “ - Professor Kambhampat is making the stronger claim that LLMs do not reason at all.”, “ - u200b@@u9633u660eu5b50 1:20:26 "LLMs are great idea generators", which is such an important part of reasoning, he says, that Ramanujan was great largely because he excelled at the ideation phase of reasoning. 16:30 he notes that ChatGPT 4.0 was scored at 30% on a planning task. 1:23:15 he says that LLMs are good for style critiques, therefore for reasoning about matters of style, LLMs can do both ideation and verification.”, “ - @@markplutowski 3:14 "I think the large language models, they are trained essentially in this autoregressive fashion to be able to complete the next word, you know, guess the next word. These are essentially n-gram models."rnrn11:32 Reasoning VS Retrievalrnrn17:30 Changing predicate names in the block problem completeley confuses the LLMsrnrn32:53 "So despite what the tenor of our conversation until now, I actually think LLMs are brilliant. It’s just the brilliant for what they can do. And just I don’t complain that they can’t do reason, use them for what they are good at, which is unconstrained idea generation."”, “ - @@u9633u660eu5b50 Ok, I see it now. I originally misinterpreted his use of a double-negative there where he says "And just I don’t complain that they can’t do reason". nnThat said, he contradicts himself by admitting that they can do a very limited type of reasoning (about matters of style), and are weakly capable of planning (which is considered by many as a type of reasoning, although he seems to disagree with that), and can be used for an important component of reasoning (ideation). nnBut yeah, I see now that you are correct u2014 even though there are these contradictions he is indeed claiming "that they can’t do reason".”, “Outstanding marketing u2764u2764u2764”, “Brilliant, thanks Professor Kambhampati!!!”, “Great conversation. I disagree that LLMs are good for idea generation. In my experience, they’re good at replaying ideas back to you that are largely derivative (based on the data they’ve been trained over). The truly ‘inductive leaps’ as the Professor put it, aren’t there in my interaction with LLMs. I use them as a workhorse for doing grunt work with ideas I propose and even then I find them lacking in attention to detail. There’s a very narrow range they can work reliably in, and once you go outside that range, they hallucinate or provide sub-standard (compared to human) responses.nnI think the idea that we’re co-creating with LLMs is an interesting one that most people haven’t considered - there’s a kind of symbiosis where we use the model and build artefacts that future models are then trained on. This feedback loop across how we use LLMs as tools is interesting. That’s the way they currently improve. It’s a symbiotic relationship - but humans are currently providing the majority of the "intelligence", if not all of it, in this process.”, “ - What a fantastic and succinct response! My experience has been _exactly_ the same.”, “ - That’s exactly it, they start to really act as good creative partner at Nth iteration after explaining to it back and forth by giving feedback, but once it gets the hang of it, really acts like a student wanting get good score from a teacher :)”, “ - We need an entirely new model for AI to achieve true reasoning capability.”, “I have to stop to ask if ‘150% accuracy’ is an actual thing in LLM/GPT? Or other weird number things that I’ll have to go read. Keywords?”, “ - I think it was more a passing observation that measuring accuracy in LLM responses in the way we do for numeric models is kind of not possible.”, “ - I think he was just being facetious about unwarranted trend extrapolation”, “The reasoning abilities of the publicly available chatbots by themselves isn’t great, but av.humans can’t reason that well either - I’d say the chatbots are still better than 90% of humans on any topic ud83dude00 (If you picked random human off the street, they ramble even worse than the chatbots do). I note that connecting Gemini to a symbolic reasoner suddenly produces international mathematical olympiad results on at least some classes of problems (with caveats), as per the DeepMind results. Even the prediction markets didn’t see that happening so soon.”, “If the foundational knowledge of the AI is based on everything humans put on the internet, isn’t it kinda like the darkest shadow of humanity imaginable? I think there is some kind of dissonance between real life and the internet in general which shows in how the AI transforms the knowledge and almost in a creepy way has gone through the uncanny valley but still isn’t quite up the slope. I really wonder how all of this will affect the further evolution and use of AI and the implications on the people for which it will be used (e.g. in centralized systems for electric cars and similar stuff)”, “ - Most books and newspapers are also on the internet but yes most of the internet is absolute trash”, “Claude just deciphered a random biography, in rotation cipher, for me. All I told him was that it was a Caesar cipher and then gave him the text. I didn’t tell him how many letters it was shifted or rotated by and I didn’t use rot13. I tried it three times with three different shift values and it translated it perfectly each time. There’s no way that Claude has memorized every single piece of information on the internet in cipher form. Don’t know if it’s "reasoning" but it is certainly applying some procedure to translate this that is more than just memorization or retrieval. ChatGPT also did it but it had some errors.nnInstead of criticizing other scientists for being fooled and not being analytical enough maybe you should check your own biases.nnI have found it true that it can’t do logic when a similar logic problem was not in its training data but it definitely can generalize even when very different words are used.”, “ - @@quasarsupernova9643 That’s a perfectly reasonable statement. However my comment was that I tested it and it did generalize and use logic to solve a cipher that the speaker just said it could not do unless it had memorized it, which is impossible if you think about it. The amount of information contained in these models is infinitesimal compared to that used their training data. The idea that it can explain all jokes simply because it read a website explaining them is so simplistic compared to the way that LLMs operate that it’s absurd. Or that it can translate a particular cipher because it read a website containing every single possible word in English translated into cypher using ROT13. So I tested it specifically not using ROT 13 and using an obscure long biography with lots of non-English names etc. and it had no problem, not only identifying the specific shift used in the cipher, but then applying it.”, “ - It could also be that they used synthetic data to train the model specific for this task. For this specific task creating synthetic data is trivial. Unfortunately no of the major players reveal the training data they use so it’s hard to know when a model truly generalizes. That said, I tested the transitive closure task and using completely random strings as objects it nailed it with ease. So at least it has learned a template to solve unseen problems, which I consider at least a weak form of reasoning.”, “I know you have to play the youtube game but I am always very sad seeing titles like these even if they work. I usually don’t click on videos with these kinds of titles at all because it will cause youtube to trash my feed with garbage. So I end up having to watch in a private window where I don’t have premium and so on and so forth. It makes the barrier to actually watch way higher at least for me. I am probably in the minority but I still want to make it known that people like me exist and that you loose them doing this.”, “The most likely response to logical questions is logical answers.”, “Those people who are assuming the AGI is going to be achieved have never done long-term work inside any of the major GPT systems if you want to have a quick and dirty test, tell it to create you a fiction book first make 15 chapters and 10 sections with each chapter And then have it start writing that book look at it in detail and you will see section after section it loses sight of essentially every detail. It does a better job if you are working inside the universe, another author has already made and does the worst job if you were creating a brand new universe, even if you have it define the universe.”, “ - sounds like youre bad at prompting”, “ - @@mattwesney lol it does, doesnu2019t it, but you havenu2019t tried it and I have.”, “0:18 When I was 4 years old, I was often stuck at my parents work. The only thing for me to do that was entertaining, was play with calculators or adding machines. I memorized the times table, because I played with calculators a lot. My parents would spend $8 at the drug store to keep me from asking why is the sky blue and other pertinent questions. I was offered to skip first grade by after kindergarten, and my parents said no. Jeff Bezos is the same age from me, and also from Houston. His parents said yes to skipping first grade. I told my parents this forever until they died.”, “There is an abundance of informed speculation, which is fun to hear if nothing else.nnBut just as with any other form of research, there is only so much you can say that’s useful before you ran the experiment and see what it happens.nnI guess you can’t really do a two hour podcast if the guest just say "I dunno." ud83eudd37u200du2642ufe0f”, “This aligns nicely with the work Fabrice Bellard has been doing using Transformers to achieve SOTA lossless compression in his NNCP algorithm.nnCoincidence .. I think not!”, “i love MLST but what’s with the cringey clickbait titles lately”, “"Why are manhole covers round?" "It’s not that they are round. It’s that they are circles. They keep the demons in… Or, … maybe, out."”, “Could an LLM for example figure out whether a request requires a planner, a math engine etc, transform the request into the appropriate format, use the appropriate tool, and then transform the results for the user? I think that LLMs provide a good combination of UI and knowledge base. I was suspicious myself that in the web data they may well have seen joke explanations, movie reviews, etc etc and can lean on that. I think that LLMs can do better, but it requires memory and a feedback loop in the same way that embodied creatures have.”, “What on earth is this guy talking about. Completely disagree on his points on reasoning. The word "reasoning" here is wielded like it’s something special when it’s basically just an activity that’s a mix of retrieval and goal-oriented pattern matching, which LLMs in fact do pretty well. OpenAI came up with the terminology for it too - "Chain of thought prompting".”, “ - The manhole example is one where it’s not pattern matching? He’s claiming that everyone else is just retrieval but the first few had to create the reasoning”, “ - @@rv7591 LLMs (As their name suggests) are trained on text, not video or images. To understand why a manhole cover is round (because it wouldn’t fall through the hole) requires a geometrical and visual understanding of 3D and the environment - none of which is in the training dataset for LLMs. But it doesn’t mean that the LLM cannot reason. Reasoning can also be performed from sparse textual information. I won’t put the examples here, there are tonnes of ChatGPT/GPT4/Claude examples out there for "chain of thought prompting". If this guy was a bit more specific with the LLM’s shortcoming on 3D understanding rather than saying "LLMs cannot reason", I wouldn’t have had a problem. And by the way, just to extrapolate a bit further from there. There’s nothing that suggests that if you train a transformer (the same underlying tech) on video, that it can’t derive that same visual understanding and then be able to reason on top of that. It could very well work - as you’ve seen a lot of the recent video generation models have a pretty good understanding of physics and the way objects interact with each other. It’s probably insufficient compute (for video training) thats holding back the transformers from getting a human-level understanding of its environment. My point is - transformers and RHLF is capable of generating reasoning capability. This is a fact.”, “ - I think his definition of reasoning is more related to formal logic, it doesn’t seem from the video like he’s avoiding a definition or regarding it as special. He even gave examples of where LLMs can’t follow through logic problems, did you watch the full video?”, “ - Reasoning = ability to create and use deductive closure. I.e. ability to use what you know to deduce things that follow from that, rather than needing to be taught these deductive facts explicitly”, “He is soooo wrong”, “Thank you. He was my favorite guest that I watched here so far. I learned a lot.”, “Another mathematician confused by language abilities that donu2019t revolve around mathematics or mathematical proofs or formal systems - how about you frickinu2019 train it on all of those systems and then get back to me - ridiculous - - i love the skepticism but i feel if you only use LLMu2019s for math and reasoning you donu2019t fully understand how they work - yes they lack core physical metaphors because they donu2019t have bodies - that accounts for the need for excessive training data - bc you have no physical analogical truth”, “ - Note that means getting proper tokenizatkion for all math nomenclature - not natural language”, “Someday a hacker network with the help of the tool AI will craeate a huge ammount of nonsense data and spill this out on the WWW. This will ruin all LLM and make them completly useless xD”, “No”, “Second bullshit video in a short timespan.”, “if somebody thinks it does, he probably knows nothing about ChatGPT.”, “contrary to his claims gpt4o & sonnet do generalize to different cipher shifts.ngpt4o:nIt looks like the text "lm M pszi csy" might be encoded using a simple substitution cipher, such as the Caesar cipher. This type of cipher shifts the letters of the alphabet by a fixed number of positions. rnrnTo decode it, let’s try different shifts and see if any of them make sense. rnrnFor example, if we shift each letter by 4 positions backward (a common shift in Caesar ciphers):rnrn- l -> hrn- m -> irn- M -> Irn- p -> lrn- s -> orn- z -> vrn- i -> ern- c -> yrn- s -> orn- y -> urnrnSo, "lm M pszi csy" becomes "hi I love you."rnrnThis decoded message makes sense as a simple phrase. If you have any additional context or need further decoding, please let me know!”, “ - Great, you got one sample. Now run it a hundred times each for different shifts and report back.”, “Itu2019s weird to see this level of cognitive dissonance with brilliant researchers. What is reasoning but syntactic transformations really? The fact training on code enabled chain of thought output. Is all the evidence we need. nnThat these models can reason, but current training methods are ignorant to this and most importantly data engineering at scale must become a thing. To me itu2019s a miracle that we can train on raw data and get these capabilities. Imagine going through school with no context, no homework, no tutoring. So next-token prediction can learn to reason imo. The data just to oblique. Hybrid data is the future. nnThe answer is clear. A dataset thatu2019s fully informed by Cury-Howard theory is the solution. Along with improving PE to solve the reversal curse. So to say these models canu2019t reason, is really just a game of semantics. Like what I did there ud83dude02. Bars. nnnMy intuition tells me whatever AGI will be , it will based on a neurosymbolic architecture.”, “I memorized all the knowledge of humans. I can’t reason but I know everything humans have ever put online. Am I useful? Provide reason.”, “ - What proportion of u201call the knowledge of humansu201d do current models have access to?”, “ - @@fburton8 all of it, well everything on the open Internet so most books, most poetry, lots of art, papers, code, etc.”, “ - Extremely useful. Ask any engineer who’s addicted to ChatGPT / Copilot / OpenAI API at the moment for their daily workflows.”, “ - but i make up 20% of my answers - can you tell which ?”, “ - @@malakiblunt So do you… sometimes…”, “People say things based on statistics as well. Our brains calculate averages in a special way. I will not disclose them though :)”, “this is the saddest thing i watched today”, “Very convincing arguments. Haven’t heard it laid out this succinctly and comprehensively yet. I’m sure Yann LeCunn would be in the same camp, but I recall not being persuaded by LeCunn’s arguments when he made them on Lex Fridman”, “ - Basically there’s a whole bunch of "scientists and researchers" that don’t like to admit the AGI battle is being won (slowly but surely) by the tech bros led by Ilya and Amodei. AI is a 50-year old field dominated in the past by old men, and is now going through recent breakthroughs made by 30 year olds, so don’t be surprised that there’s a whole lot of ego at play to douse cold water on significant achievements.”, “ - Do you have a response to Kambhampati’s refutation of the Sparks of AGI claim? @edzehoo”, “ - @@edzehooI can see this being true, but writing off their points is equally defensive/egotistical”, “ - u200b@@edzehoo Yep.”, “ - @@edzehoo Ilya and Amodei are 37 and 41 respectively, I wouldn’t call them "young", per se. Research on AI in academia is getting outpaced by industry, and only capital rivalling industry can generate the resources necessary to train the largest of models, but academics young and old are continuously outputting content of higher quality than most industry research departments. It’s not just ego, it’s knowing when something is real and when it is smoke and mirrors.”, “um, we have known this since the release of chatgpt. it is an LLM, and by definition does not understand a word it is saying. that’s why i hate when people call it AI, it is not AI. LLM’;s have brought no closer to AI and we are still decades away or imo we never achieve it. what u thought all the tech guys were telling the truth for once.?”, “ - That is because you probably think humans are not just a biological machine.”, “llmS answer never before seen prompts with never before seen texts.”, “ - That doesnu2019t mean they reason.”, “ - Just because there are patterns of reasoning in the training data doesn’t mean the LLM is actually doing any. It is deceptive, if you really want to go there, you need to ask yourself how original your query is. Like his example of explaining jokes, even asking an LLM to explain a brand new joke will probably yield a decent result without any reasoning going on. Because web sites that explain jokes exist.”, “These ML query engines are socialpaths, they only respond what way the wind blows”, “don’t mind me - just postin cool quotes - "for natural language, the world is the interpreter", "why are manhole covers round [wtf microsoft]"”, “There are a lot of people in this comment section who probably can’t even do basic sums, let alone understand how a large language model works. And yet, they are very happy to criticize.nnWe are indeed living in an Idiocracy.”, “Disappointing to see so much cope from the LLM fans in the comments. Expected, but still sad.”, “ - They are rather AGI hopefuls. Who’s not a fan of LLMs including the Prof ;)”, “ - @@yeleti yeah, I think they are very interesting / useful but I doubt they will get to AGI with scaling alone.”, “great interview !”, “I’ve stopped using LLMs. I used to be a huge fan but realised it only reflects back your own reasoning! It’s like a parlour trick that appears to be a bit like reasoning, but isn’t. LLM is mostly useless, for me.”, “ - People incapable of finding use for such a broad-spectrum technology seem more useless to me.”, “Yea this man is brilliant. I could just listen to him all day.”, “I understand that it’s not entirely accurate, and doesn’t apply to all machine learning tech, but I love the expression "stochastic parrot" - it strikes my funny bone for some reason….nBut the good professor prompted a couple questions, one being does Google translate sound like ESL (English as 2nd language) to native speakers of other languages, that is, are some languages a little stilted or "foreign sounding", or does Google translate sound like a native speaker in every language it translates to?nAnd the second question is, what’s so amazing about GTPx answering all the questions on standardized exams? The LLM has all the internet data including the correct answers to the exams! It’s like having the ultimate cheat sheet….”, “Gpt is like a reactive mega wikipedia….”, “I love Rao’s work and he’s funny too.”, “LLMs definitely can do transitive closure. Not sure why the guest stated otherwise. I tried it out with completely random strings as object names and Claude could do it easily. So it’s not just retrieving information.”, “ - this is supposed to be science. i hate it so much when people who call themselves researchers do not really care for thoroughness, or even straight out lie. don’t let them get away with it.”, “ - It struggles with it if you create something itu2019s never seen before. Itu2019s a valid point on his part.”, “ - transitive closures can be done from memory. It’s been shown these models perform bad with novel data, so he has a point still”, “ - And it was also able to do a rotation cipher of any arbitrary length when I just tested it. There are definite limitations but what they can do is far more complex than simply repeating what’s in the training data. I made a separate post but I just wanted to add on here that it can also do other things that he specifically said it can’t.”, “ - Ok, everything they speak is guess work, and it will be so until deep learning is there.”, “https://docs.google.com/document/d/1rviWjHiICa2uOko-7oJLGrp0r2GQuxeMJpW-9A-FroI/edit?usp=sharing … CHATGPT DOESN’T REASON! (Top scientist bombshell)nhttps://youtu.be/y1WnHpedi2A?si=q9OH6n-5xcW_0qEZ nhow is reasoning connected to sefirot?nShow draftsexpand_morenvolume_up nnnReasoning and the SefirotnReasoning is a complex cognitive process involving the ability to think logically, analyze information, and draw conclusions. In the context of the Sefirot, it can be linked to several of the ten attributes:nBinah (Understanding): This Sefirah is directly connected to understanding and comprehension, which are essential components of reasoning.nChokhmah (Wisdom): While often translated as "wisdom," it also implies the ability to discern patterns and make connections, essential for logical reasoning.nDa’at (Knowledge): Although not a Sefirah in the traditional sense, Da’at is often considered the "11th Sefirah," representing the synthesis of Chokhmah and Binah. It can be linked to the application of knowledge in reasoning.nTiferet (Beauty): This Sefirah can be associated with the aesthetic and harmonious aspects of reasoning, such as finding elegant solutions or recognizing patterns.nReasoning as a Dynamic Process:nReasoning is not a static process but a dynamic one involving multiple steps:nData Gathering: Collecting information relevant to the problem (Binah).nAnalysis: Breaking down the information into components and identifying relationships (Chokhmah).nSynthesis: Combining information to form new insights or conclusions (Da’at).nEvaluation: Assessing the validity and usefulness of the conclusions (Gevurah).nDecision Making: Choosing the most appropriate course of action based on the reasoning process (Tiferet).nReasoning and the Human Experience:nReasoning is essential for human beings to navigate the world and make decisions. It is influenced by our emotions,experiences, and cultural background, which can be n…”, “This is a great interview, should be required viewing for all data science students. "Extraordinary claims require extraordinary evidence" was an expression oft stated by Carl Sagan, and that I believe is central to scientific inquiry. It seems that the anthropomorphic claims for LLMs completely disregard this aphorism as I have seen no one actually produce evidence that reasoning, intelligence or even creativity is actually taking place other than the output they produce. Even Anthropic’s "features" may be more just a by product of next word prediction. Our brains are actually created to, in part, anthropomorphize just as they can discern faces with much greater fidelity than other objects and this seems to lead otherwise competent scientists to disregard first principles (well, that and the desire to raise venture capitalud83dude09).”, “Wow, incredible episode”, “Language doesnt equal Reason. Language is transmission of thoughts and meaning. You can say a bunch of words that mean something separate but togther it has no reason or logic. Its a Large language model not a t1000”, “So far great podcast, but I really dislike the clickbait. Also, LLMs clearly do compression and interpolation. That may not the holy grail of reasoning, but it’s at least a weak form of it. There are many examples of original puzzles that LLMs can solved despite them not having seen it before. And even if they just retrieve programs, it would be unfair not to call this reasoning as this is what humans also do most of the time when reasoning. Imho.”, “ - Sorry, blame YouTube algorithm. Itu2019s named normally on the podcast version!”, “i love it when the critical comments are way more intelligent than the original content :-D”, “when people say reasoning , it means use previously learned thinking patterns to process the information , most of people do not even seriously think what’s reasoning means , reasoning means patterns . AI is perfect for pattern discovery and mimicking . follow thinking pattern , is indeed all reasoning about . So , AI can do reasoning .”, “ - @deter3u00a0 Bingo. It was only a few years ago that AI couldn’t do the kind of reasoning that it’s doing today. And now the goalposts shift again, "well that’s good and all but it’s not really thinking because it can’t do this one thing humans can do."”, “ - What you are talking about is machine learning, not reasoning.”, “Have we yet defined intelligence sufficiently well that we can appraise it and identify it hallmarks in machines?”, “ - I think if we qualify the definition of intelligence as including reasoning, then yes.nI’d rather use the term sentience - now artificial sentience…that would be something!”, “ - @@johan.j.bergman Sure, but that’s a bit like saying that we don’t need to understand aerodynamics or lift to evaluate airplanes, and can just judge them on their utility and ability to fly … which isn’t entirely unreasonable if you are ok leaving airplane design up to chance and just stumbling across better working ones once in a while (much as the transformer architecture was really a bit of an accidental discovery as far as intelligence goes).nnHowever, if we want to actively pursue AGI and more intelligent systems, then it really is necessary to understand intelligence (which will provide a definition) so that we can actively design it in and improve upon it. I think there is actually quite a core of agreement among many people as what the basis of intelligence is - just no consensus on a pithy definition.”, “ - @@johan.j.bergman A spatula serves a helpful purpose that no other cooking tool is able to replace in my kitchen, so I find it incredibly useful. Turns out they are rather mass market too. Should I call my spatula intelligent?”, “ - no”, “ - The ability to reason”, “20 mins in. Loving this.”, “Humans don’t reason”, “I love this episode. In science, it’s never about what can be done or what happen in the system, but it’s always about mechanism that lead to the event (how the event happen basically). What is severely missing from all the LLMs talk today is the talk about underlying mechanism. The work on mechanism is the key piece that will move all of these deep neural network works from engineering feat to actual science. To know mechanism, is to know causality.”, “ - …yet they often talk about LLM mechanism as a "black box", to some extent insoluble…”, “great shirt”, “Next word prediction is the objective function, but it isn’t what the model learns. We don’t know what the learned function is, but I can guarantee you it isn’t log-odds.”, “ - croissants .vs. yogurt”, “ - At the end of the day, the transformer is just a kind of modern Hopfield network. It stores patterns, it retrieves patterns. It’s the chinese room argument all over again.”, “ - @@Lolleka nNot really.nnYou can point to rules and say "rules can’t be intelligent or reason"nnBut when it is the NN that makes those rules, and the humans in the loop are not certain enough what they are to prevent hallucination or prevent the alignment problem then that is not the chinese room anymore.”, “ - Research around mechanistic interpretability is starting to show that TLLMs tend to learn some causal circuits and some memorization circuits (I.e., grokking). So they are able to learn some reasoning algorithms but thereu2019s no guarantee of it. Plus, sequence modeling is weak on some kinds of graph algorithms necessary for certain classes of logical reasoning algorithms”, “ - @@memegazer not hotdog, hotdog!”, “I’ve read some of Prof Subbarao’s work from ASU. Excited for this interview.”, “There’s a difference between in-distribution reasoning and out-of-distribution reasoning. If you can make the distribution powerful enough, you can still advance research with neural models.”, “ - Absolutely true. As an example I tested its ability to do rotation ciphers myself and it performed flawlessly. Obviously the reasoning and logic to do these translations was added to its training data since that paper was released.”, “ - Easy, itu2019s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductionsnBe cautious of overthinking or adding unnecessary complexity to problemsnQuestion initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally.nre-check the reasoning against each piece of information before concluding.”, “Opening bit is interesting. Been wondering about this in the context of IQ tests. Is there a parellel between training machines against known samples/problems in benchmarks so they can pass benchmarks well enough and folks studying known types of problems to pass IQ test with high enough scores? Does studying/training for IQ tests make us significantly smarter on unseen problems, not at all or a bit but not as much as we would think when looking at the result of an IQ test?”, “I agree but Claude 3.5 does! ;)”, “ - no, it doesn’t.”, “ - u200b@@vpn740It’s a joke.”, “everyone is mad at chatgpt for being smarter than them.”, “ - Gpt chat isn’t I still have ask it the question”, “ - @@sadface7457 Well you still have to ask a person a question to. Because they can’t read your mind. Except maybe they can with AI.”, “ - u200b@@itzhexen0You clearly don’t understand agency or entity”, “ - @@davidsmind As person who is a person. I think I understand agency.”, “ - @@davidsmind If ChatGPT was a human it wouldn’t be a whole human. So it’s obviously stupid for you to compare it to one.”, “I’ve worked with people who don’t reason either. They exhibit the kind of shallow non-thinking that ChatGPT engages in.”, “ - ayup. keywords provoke response without understanding.”, “ - especially CEOs”, “ - Why are you talking about me in public????”, “ - ud83dude0e”, “ - LOL - I imagine we all visit this probability space occasionally… ;*[}”, “ud83dude4c”
]