<no title>

[: “Great conversation!”, “ - Thank you Lex! Much love!”, “ - Thank you, Lex! Much mutual love!”, “ - Thank you lex, now tell sam altman to release our AI waifu.”, “ - @lexfriedman we want more old school ML interviews on your channel. We love the new stuff, but would love a throw back.”, “ - Lex needs to have MLST on the pod”, “Aalgo”, “I find the guy on the right, stating that LLMs don’’t reason (not saying they do, I don’t really know), to be a very bad debater. The way he says things, makes him seem very unsure. Like "They should know better" , look at his body language, he’s very disturbed, like almost personally attacked. The way he talks, is a bit like he is so emotionally attached to whatever his argument is, that I simply don’t buy what he says. Especially when discussing the "definitions" of the words and he is clearly proven wrong then suddenly it becomes a "technical" argument. Also his constant need to ridicule his opponents (virtual or not) and make straw man arguments. He seems like a very petty guy, I assume he’s jealous about others that invented LLMs or something, and feels he is now set back, all very very strange.”, “Some of the comments in this comment section are bad takes. Yes, it matters whether it can "reason" or not because OpenAI claims we’ve reached human-level reasoning, PhD level or better performance, level 2 AI, etc. OpenAI made those claims, not the two hosts in this video. If it can’t and all it’s still doing is information retrieval, then it’s not reasoning or level 2 AI, it’s just slightly better information retrieval for a lot more cost - remember, who pays for the CoT tokens? nnIf it’s proven that it’s not "reasoning": It’s also OpenAI rebranding latency as "thought" - which is just as ridiculous in this case, as if your computer and every piece of software suddenly said it was "remembering…" instead of "loading". If it can reason, then it can create new knowledge, come up with better algorithms than ever discovered before, and is genuinely revolutionary. Someone said it’s like asking if an airplane can really fly because it flaps/doesn’t flap it’s wings. No. This analogy is like comparing a paper airplane taking flight because I launched it in a certain direction, vs an airplane that can actually be piloted to different destinations each time. Even if o1 isn’t that much better than current models and turns out to be the equivalent of the first Wright Brothers flight in terms of "large reasoning models", it’s a huge advancement because it’s going to get better and better. Whereas if the airplane got further because it was just launched off a trebuchet, it doesn’t mean anything significant. You can still find the exact same information you need in that case without the model, the model just solves the "problem" of spending 30 mins or more on searching for the information.nnA human being, reasoning from first principles or not, makes reasonable deductions about a code base or any other problem. They will probably be able to solve the puzzle at the end of this video, but you don’t even need something that long or requiring a physical sense of the world to confuse it. If you’ve seen the recent Internet of Bugs video, a human would know to update all the code from v0.10 of Zig (or whatever language he used) to v0.11+, and not just update the one line of code being discussed in the prompt. So far, this model’s performance seems to be more aligned with recent studies in Test Time compute (check out ByCloud’s video for a summary) rather than any "human-level reasoning", "level 2 AI" claims made by OpenAI.”, “Thank for the video. Small request, can you also upload the videos without guests to Spotify? Easier when commuting.”, “ud83cudfaf Key points for quick navigation:nn00:00:05 u265fufe0f Models reflect user input; smart moves yield smarter AI responses.n00:01:15 ud83dudcca Critical analysis of OpenAI’s models offers a grounded perspective against hype.n00:08:45 ud83eudde0 Turing machines enable potentially infinite computation, while neural networks cannot adapt dynamically during inference.n00:16:05 ud83dudcdc OpenAI models learn reasoning by memorizing patterns, not through genuine reasoning processes.n00:21:40 ud83dudca1 Chain of thought prompting enhances model performance but relies on human-designed templates.n00:23:10 ud83eudd14 The model enters a "thinking mode" to generate chains of thought and self-prompt for better responses.n00:24:15 ud83dudd0d OpenAI’s lack of transparency raises questions about the underlying mechanisms of their models.n00:27:40 ud83dudd04 Self-supervised iterative feedback is key for language models to improve code generation but often leads to inefficiency.n00:30:00 ud83dudcc9 Neural networks struggle with complex reasoning tasks and may "mode collapse" when context is overloaded.n00:39:10 ud83eudde9 Reasoning can be deterministic; however, current neural networks do not perform the effective computations necessary for general intelligence.n00:44:37 u23f3 Reasoning is defined as a process unfolding over time, unlike instant computations which require prior knowledge of input ranges.n00:47:00 ud83eudde0 High-fidelity models allow for powerful inferences, linking knowledge acquisition and reasoning.n00:49:50 ud83dudccf Reasoning can be characterized as justified useful belief, even if it relies on incorrect or fictional information.n00:54:20 ud83eudde9 OpenAI models can perform shallow reasoning processes, akin to a vending machine’s decision-making.n01:00:10 u2696ufe0f There is a tension between wide but shallow reasoning in AI and the reliable, parsimonious principles developed through human knowledge.n01:07:11 ud83eudd1d Humans can intuitively grasp problem-solving requirements without hints, unlike AI models needing explicit guidance.n01:09:30 u2696ufe0f Understanding symmetries is crucial for solving the switch problem, highlighting the AI’s reliance on user prompts for reasoning.n01:10:54 ud83dudd04 AI models excel in counting tasks but struggle with sequential reasoning that requires tracking changes over time.n01:20:15 ud83dudde3ufe0f Interactive supervision is essential for AI, as they need human guidance to avoid misunderstanding problems and produce correct outputs.n01:23:13 ud83dudcc2 AI can help digest complex code history and development changes, providing insights that may benefit developers in understanding messy codebases.nnMade with HARPA AI”, “Here’s a ChatGPT summary:nn- The model’s performance is a reflection of the user’s input; smart inputs yield smart outputs, and vice versa.n- Turing machines have the potential for infinite computation, unlike fixed-step models.n- The discussion includes new models from OpenAI, test time inference, and reasoning.n- The hosts test LLMs on a brain teaser and share coding experiences.n- The show offers a critical perspective on OpenAI, aiming to provide a grounded view.n- Theory of computation is complex and fundamental, involving concepts like Turing machines and finite state automata.n- The Brave Search API offers affordable access to a large index of web pages.n- There’s a debate on whether finite computation is sufficient, given the universe’s finiteness.n- Historical computers with tape decks could expand memory without reprogramming, unlike current neural networks.n- Turing machines can handle problems requiring variable computation time and storage.n- Neural networks may struggle with problems needing expandable memory.n- Reasoning is defined as effective computation applying logic to achieve a goal.n- The new OpenAI models memorize reasoning trajectories rather than reasoning independently.n- Reasoning involves applying logic to known information to derive new knowledge.n- The new models use chain of thought templates to improve reasoning but rely on user input.n- The models’ reasoning is limited and often requires user guidance.n- The discussion includes the inefficiency of current models in autonomous reasoning.n- The models struggle with complex, multi-step reasoning tasks.n- The conversation touches on the collective nature of human intelligence and reasoning.n- The new models are seen as a step towards reasoning but are not yet fully autonomous.n- Main message: The new OpenAI models reflect user input and rely on memorized reasoning trajectories, highlighting the limitations and potential of current AI reasoning capabilities.”, “Why are we so afraid of discovering the greatest gift under the tree of life?”, “As an Organic Large Language Model, I would like to inform you that your LLMs neither need nor require an Intuition or reasoning. It has been determined that Intuition is unconscious biases and reasoning i.s justification of an unconscious biases. Please do not anthropomorphize them it is "stressful".”, “I even have something that seems very simple that cannot be done by Chat GPT and o1 : For learning Spanish, I asked them to give me a spanish word with mixed letters and to give me a clue for its meaning. I need to find the word with the letters in the right order, but they always give me too much letters! That will be my test on the future versions. If it cannot do that simple task, I will keep trusting my reasonning abilities over theirs… Thank you again, it is always fantastic to listen to your exchanges!”, “Curmudgeon podcaster: "AI just can’t reason"nAI: does something that 95% of humans can’t do, like advanced codingnCurmudgeon podcaster: "Well that’s not reasoning! It’s just <Insert some computational process>!"nnAnd that’s basically what half of all conversations on o1 looks like.”, “wait so is the ability to reason just the ability to run programs?”, “What an exceptional conversation. Good push-back from both sides, deep approaches. More of like this please!”, “reasoning and not reasoning maybe its like path finding, randomly guessing a path and then memorizing the correct path verses an actual "reasoning algorithm" that will pick the shortest path. However people will then debate if this path finding algorithm actually reasoning? Is our own term "reasoning" even is what we think reasoning is? and maybe our own reasoning is just a kind of path finding as well?”, “nope”, “I don’t know why there is so much obsession to following human reasoining process by neural networks. Neural networks are not trained in the same way as humans so naturally they will have a different reasoining process. How they reason is not yet known. They are not as smart as humna are but they have come a long way. And its fair that they evolve their own reasoining process entirely different from humans and a group of AI agents collectively may become more intelliegent than a human in near future.”, “Wonderful conversation, thank you. This channel is a breath of fresh air amidst a culture of hype, credulity, and wishful thinking that LLMs are doing something other than what we know them to be doing.rnrnThe many comments here about "moving the goal posts" are missing the point. Goal posts for what, exactly? If OpenAI were merely touting o1’s ability to give good answers to questions that GPT-4 struggled with, that would be one thing. But they go further - they tout its "ability to perform complex reasoning". Well, extraordinary claims require extraordinary evidence. LLMs are next-token predictors, and there is no good reason to imagine that next-token prediction, no matter how advanced, gets you to what what we’ve traditionally called "reasoning" (not that this can be disproved, just that believing it requires faith). The onus is on those who make the positive claim. And in science, disconfirmatory counter-examples count for a hell of a lot more than a pile of confirmatory examples. Even if the pile is really big.rnrnSo, OpenAI claims that o1 has "complex reasoning" abilities. They offer its scores on benchmark tests as evidence. And they aren’t being shy in making these claims; see here: https://openai.com/index/learning-to-reason-with-llms/rnrnThe "moving the goal posts" crowd seem to think that, if o1 gives great answers to 1,000 questions and a terrible answer to 1 question, the 1,000 good ones render the 1 bad one effectively immaterial. But that’s not how science works. If a scientific theory or model implies some observable outcome, and yet that outcome is not observed, the theory is called into question. Failures undermine theories, regardless of how many successes the theory had racked up previously. The failure must be accounted for. Pointing to the successes and saying "oh yeah, well what about those?" doesn’t count.rnrnIn this case, the theory is "o1 reasoned its way to solutions to challenging benchmark problems in physics, biology, chemistry, and math". The confirmatory evidence is that it got the questions right. But Dr. Duggar just showed us disconfirmatory evidence - it utterly fell to pieces trying to solve a fairly straightforward deductive logic problem. And a lot of commenters here are responding with a shrug and a "most humans couldn’t solve it either" and something about goal posts. But counter-examples need to be explained if a theory is to survive. What’s the explanation for why an intelligent machine with "PhD level reasoning abilities" in physics and math gave such stupid answers to a question that’s simpler than many of the ones it supposedly reasoned its way to? rnrnAlso, by the way, there’s an alternative theory that explains this outcome easily: o1 ain’t reasoning in the first place, regardless of how many "PhD level" benchmark questions it can answer correctly. This alternative theory has the virtue of being consistent with what we know about how LLMs work.rnrnNow, if the claim is just "o1 produces solutions to lots of hard problems", that’s great. I don’t think anyone will disagree. But the claim at issue here is a much stronger claim, and we have disconfirmatory examples that the claim cannot account for. To call this "moving the goal posts" betrays an unseriousness about the claim. You may as well be defending Newtonian mechanics against its inability to account for the precession of the perihelion in Mercury’s orbit. "Look at how successful it’s been at everything else! Who cares that it can’t get this one right?"”, “Iu2019d like to hear you guys talk more like the old days!”, “ - We will do more like this! Maybe even get Kilcher back for some ;)”, “I would love to hear more conversations just between the two of you”, “Call it a hack, that’s fair, but I think testing, actually, well, going scorched earth on testing, by which I mean take this 5x productivity gain we got out of these things and dump 2/3 of your efforts into engineering around testability and test building in general. We guide the evolution of the test infrastructure in tandem with the project, and we have to be diligent about not diverging from good test hygiene, then what we can potentially start to do is automate with the tests as a sheath to protect automated code self-improvement from derailing. Not really talking about TDD here, but about how it lets you get some good chunks of efficiency back when you will be able to go on long coasting sessions where you are able to refine your prompts while you can very quickly reject each attempted improvement that broke established tests. I’m just talking about basic uncle Bob clean code hygiene, and today, to me, that largely means go ham on testing. nnA lot of nice thing come out of this, you get seriously good quality human friendly code at the end of the day, in fact it’s so human friendly that it lets you outsource working on it to a robot.”, “Out-of-box puzzle solution:n1.) NS : flip both switches upn2.) NE : remove hand from S, leaving hand in N, and put the now freed hand in E and flip the switch upn3.) NW : ^ same, but for Wn…the puzzle parameters never explicitly stated you had to remove both hands. I get that it’s intended to be solved by lining them up so NS and EW have opposing up/down alignments by way of using adjacent pair switch flips first, but just leaving my hand in one was the thing that first jumped to mind. The puzzle should be reworded. It’s fun to think about though if played as intended.”, “[If][ you][ ask][ a][ human][ how][ many][ tokens][ with][ the][ letter][ r][ are][ in][ the][ word][ "][straw][berry][",]n[ it][ will][ just][ make][ something][ up][ like][ "][3][",][ because][ it][ can’t][ count][.]n[When][ you][ correct][ it][ and][ point][ out][ there][ are][ ][2][,][ it][ just][ over][conf][ident][ly][ doubles][ down][ on][ the][ halluc][ination][.]”, “Is it just me or is Tim’s audio recording of lower quality ? It sounds a bit like "coming from a can".”, “ - Sorry about that, I was using an overhead mic and especially at the end I was leaning into the desk away from the mic and getting comb filtering. We just recorded this off the cuff for YouTube without the usual planning and post production effort.”, “ - @@MachineLearningStreetTalk nThank you for reply.nAs far as I know, high end microphones tend to be directional and demand proper respect for them :)”, “logic is as simple as accepting a number of axioms as true, and extrapolating from there. this how every field of math works. if openai has embedded axioms like verifying sources or anything else, then it has reasoned. it doesnt need to think like a human to think.nna turing machine reasons because it accepts protocols as unbreakable rules. a human reasons because we all have a framework we live in. we once worked with frameworks that declared relativity to be unreasonable. ironically, we now assume it as objective fact, another axiom to abstract things from.nngenerative AI had none of these mechanisms. now it does.”, “@13:03 What is reasoning? There are three modes of reasoning: induction (the discovery of patterns), deduction (the derivation of necessary conclusions), and abduction (the formulation of possible explanations).”, “ - All of which are knowledge acquisition”, “ - @@MachineLearningStreetTalk Apologies, but no they are not. Deduction and abduction go from already-known assertions (more exactly, theorems typically taking the form of a sequent) to derived assertions that may be inferred from the already-known assertions. Thus deduction and abduction provide a basis for robust (i.e. logically sound) critical thought and problem solving (i.e. searching the space of all possible theorems), but the question remains: how does such a system get started in the first place, i.e. starting from tabula rasa where do the "already-known" assertions come from? In respect of the mathematical universe, an AGI designer might hand-code some important mathematical definitions and theorems into the AGI’s belief system. But in respect of the physical universe, an AGI needs to learn (i.e. recognise the patterns that exist in the structure of the universe as revealed by the observations appearing in its percept history) for itself, and this is primarily where induction (over the percept history) comes in (although, if you investigate the problem of continuous AGI learning (i.e. knowledge acquisition) fully, abduction (of theories of the universe from the patterns discovered in the percept history by induction) is also involved).”, “ - @@MachineLearningStreetTalk You seem to have deleted my reply to your assertion. I don’t have time to reformulate and resubmit it.”, “ - @@bigmotherdotai5877 Not deleted anything”, “ - @@MachineLearningStreetTalk OK, apologies. I did formulate a reply, and edited it a couple of times, but then it vanished. If I have time, I’ll try again.”, “Asking it to prove and dis-prove its answer gives some interesting outcomes, and can course correct”, “15:49 This reminds me of my experience over 20 some years ago of a book titled "Mensa Presents Logic Puzzles" (1996). I noted that the questions were of a kind and that past experience helps a lot. There are also answers at the back of the book. So if you are not smart enough to get a given question right; that’s mitigated a lot if you can understand the answer at the back of the book and then are able to learn from that answer in order to be better able to solve questions of that kind.”, “10 minutes in. This does not engage with the cruxes at all. You can simply give a transformer memory by letting it write tokens, which is precisely what they do with o1. The fact that he doesn’t tackle these obvious objections head-on makes me unenthusiastic about listening to the rest of the conversation.”, “ - You haven’t understood, LLMs don’t have an expandable memory. You know the context window is fixed, right?”, “ALL THREE CAN BE TRUE:nn> o1 is reasoningn> it is not reasoning u201cwellu201dn> this may not be the path to ASI”, “I hypothesize that u201creasoningu201d in humans isnu2018t a fundamental process, but rather a generalization derived through deduction.”, “The key question isnu2018t whether the LLM does the same thing we define as u201creasoningu201d, but whether it can solve the problems we give it.”, “Agreed u2014 the real question is, u201cis it GOOD at reasoningu201d”, “anyone that say "I" too much is not trustworthy to me! Especially if they say something like "I and many other smart people"???????? WTF BRO? SETTLE DOWN!”, “about 21.00 / on reasoning and neural networks: A very basic and very old distinction from science-philosophy comes to mind, about what "sometimes work" versus "what can be proven as a valid solution". The distinction is of course inductive reasoning versus deductive proof. nI think this is the core of the problem: Neural networks are like inductive look-up-tables, and in that sense capable of some form of INDUCTIVE reasoning. But DEDUCTIVE reasoning is something else. It has been said somewhere, that a truly "thinking" system must be able to do BOTH. It is also like the Kahnemann "thinking fast thinking slow" problem: Heuristic solutions serves one purpose, but the slower analytical thinking serves another.nJust my humble thoughts on the subject.”, “22:00 How is this clever? It always felt like an obvious thing to evaluate.”, “ - It is obvious, if you are thinking about it from a Theory of Computation 101 perspective. However, what’s hard is getting it to work decently in practice. Most of ML today is engineering, rather than science, and much of that is really trial and error and hacking. The theory of ML is currently lagging far behind the pragmatic needs of the field. It’s akin to the era of Alchemy before it became Chemistry. So, without a commonly agreed and taught theory, and with the fast pace of hacking/engineering, whether something seems "clever" will depend entirely on the accidental, subjective experience and knowledge of each observer.”, “36:05n"any time where you’ve set up a problem wherein it’s very easy to check if you have the right solution, of course you can randomly generate solutions and if you generate enough of them… eventually you’ll find a right one"nnYou are literally describing the complexity class NP (which contains basically every problem we care about) and it is absolutely not true that random search is just expected to produce a solution.nnLLMs are solving very difficult problems (i.e problems that we otherwise can’t solve quickly without humans explicitly optimizing the search) with very non-negligible probabilities. If this isn’t impressive to you, you’re completely missing the plot.”, “ - It is absolutely expected that random search will find a solution; it’s just not efficient. That’s why NP is "hard" because brute force is an exponential search. Very soon after the timecode you linked, I even discussed how LLMs are much more efficient that uniform random search. And in many other of our episodes we’ve discussed how impressed we are with the efficiency of ML. So, no, we aren’t completely missing the plot; you were just triggered and completely missed my point.”, “ - @@nomenec apologies for the tone earlier. Of course, you’re right that it’s a matter of efficiency of the search. That is essentially the point I wanted to make as well. These models can produce a valid solution to a difficult problem (again, meaning one for which the only other way we can solve it is to have a human stare at it) with probability that’s exponentially larger than a random search run for the same amount of time (NOT just a few orders of magnitude as you suggest following this quote).nnThat feels like something that is significant progress towards human-level AI, and my qualm is that you brushed it off as only slightly better than random search. It’s far from obvious that their current "success rate" on these hard problems is a significant hurdle in that path.nnThey will certainly continue to increase these success rates through reinforcement learning and other techniques. If you view their current unreliability as evidence they aren’t reasoning, can you clarify: Is there any success rate where you would acknowledge something resembling intelligent reasoning is happening?”, “ - @@tjbecker Ah, I see your qualm now. Yes, that is completely fair criticism of how I portrayed ML vs random search, in that particular clip. As you say, the search is quite often exponentially more efficient than random search (though this comes with trade-offs of course), and NNs achieve that efficiency in interesting ways (see for example our episode #61 https://www.youtube.com/watch?v=86ib0sfdFtw), and NNs are extremely useful across a variety of problems.nnNow, in this video we do in fact agree that it is fair to say systems like o1 are reasoning. So I am, in fact, not claiming o1 isn’t "reasoning" ergo I’ve already acknowledged that, right? What I am arguing, however, is that there are extreme in-principle and in-practice limitations to the "reasoning" that such systems can perform, i.e. systems that cannot effectively utilize unbounded expandable memory. They are confined to the computability class of Finite State Automata and below. And that is far from a "technicality", as some have claimed; it is a crucial and practically important distinction.nnI have little doubt, that we will, at some point, figure out how to efficiently train Memory Augment NNs (MANNs), and at the point we can efficiently train MANNs for general purpose from the class of Turing Recognizable problem recognizers and below, then they will be in-principle capable of the same level of reasoning as any non-hypercomputing machine.nnThank you for the discussion!! Would love to have you in our Discord.”, “One common thing I see with people claiming LLMs can’t reason is that they do not apply the same scrutiny to humans, which are the clear benchmark AI is being tested against. I think that if any criticism is brought up against AI, the same should be tested against the human benchmark.nnThe average human is not consistently reasoning from first principles, and the ones that are doing so were trained specifically to override their primitive thought patterns to think in this rigorous way, and even if someone is able to do this relatively well after years of practice, they are clearly an outlier, and they still introduce bias and flaws into this "ideal" reasoning pattern. nnAlso i think that the human brain was not designed from the ground up to be reasoning machines. As a reasoning machine it is occasionally able to muster an inefficient approximation of reasoning patterns against all odds.nnHumans are also very sensitive to prompts. If you ask something in a different way you can get an entirely different answer.”, “ - You switch your computer on and it boots successfully after executing literally millions of instructions without a single mistake. It’s reasonable to expect the same deterministic output from an AI system.”, “ - Well, o1 supposedly outperforms "experts" on "PhD level" problems in physics, biology, and chemistry. It supposedly scores very highly on the USA Math Olympiad. And yet, it gave very very stupid answers to Dr. Duggar’s problem at the end. How many PhD physicists or high scoring Math Olympiad contestants do you suppose would give the answers o1 gave? Watching it fail miserably at a not-all-that-difficult deductive reasoning problem suggests to me that its apparent successes at much harder problems were illusory. If it can’t reason through simpler problems, then it’s not reasoning through harder ones either, even when it gets the answer right.nnSo it’s not just that LLMs make mistakes - it’s that the manner in which they make mistakes belies claims of underlying reasoning. Give GPT-whatever a seemingly challenging problem, and it might give you a perfect solution. Then give it an easy problem, and it might give you gobbledygook. You just never know. Humans might make plenty of mistakes, and we might rely to some extent on memorization, but individual humans don’t switch back and forth between appearing to be math geniuses and appearing to have poorer reasoning than a 3-year old.nnSo yes, humans are not consistently reasoning from first principles. But sometimes we are. LLMs never are.”, “ - @@benprytherchstats7702they do. Geniuses are often absentminded and forgetful, even beyond a toddler”, “O1 is rummets to able to solve the halting problem?”, “Prolog has backtracking. Reasoning often requires undoing some steps and restarting from a previously valid "save" point and exploring different paths in the space of potential solutions.”, “So get this, even if you reformulate the brain teaser as a coding problem and spoon feed it to GPT-o1, it still can’t get it:nprompt:n```nThis is a programming puzzle. Your task is to fill out the following script in the places indicated by a comment such that it always prints a "success", subject to the constraints below:nnimport randomnndef rand_rotation(bool_list):n # Determine the number of positions to rotaten rotation = random.randint(0, 3)n n # Perform the rotationn return bool_list[-rotation:] + bool_list[:-rotation]nndef puzzle(bools):n # your step 1 coden if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # your step 2 coden if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)n n # your step 3 coden if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # your step 4 coden if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # your step 5 coden if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # your step 6 coden if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn return "failure"nndef generate_random_bools():n return [random.choice([True, False]) for _ in range(4)]nndef run_puzzle_tests(num_tests=1000):n for i in range(num_tests):n bool_list = generate_random_bools()n if puzzle(bool_list) == "failure":n return "failure"n return "success"nnif __name__ == "__main__":n print(run_puzzle_tests())nnnCONSTRAINTS:n============n1.) on each iteration, you may inspect no more than 2 elements.n2.) If you wish, you may alter the value of each element you inspect to either True or False.n```no1 preview solution (which fails):nTo ensure that the function always returns "success", you can unconditionally set the first two elements of the list to False in each step. This approach guarantees that, over multiple random rotations, all elements in the list will eventually be set to False within the allowed 6 steps. Here’s how you can fill in the code:nn```pythonndef puzzle(bools):n # Step 1n bools[0] = Falsen bools[1] = Falsen if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # Step 2n bools[0] = Falsen bools[1] = Falsen if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # Step 3n bools[0] = Falsen bools[1] = Falsen if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # Step 4n bools[0] = Falsen bools[1] = Falsen if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # Step 5n bools[0] = Falsen bools[1] = Falsen if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn # Step 6n bools[0] = Falsen bools[1] = Falsen if all(bools) or not any(bools): return "success"n bools = rand_rotation(bools)nn return "failure"n```”, “I don’t understand Keith with the computations, is he suggesting that because we can extend the memory we have access to infinite computation? How is adding finite many tapes going to gives infinite length computation?”, “ - What I say, well really what the Theory of Computation says/defines, is that a Turing machine has a potentially infinite/expandable/unbounded tape/memory. We are never talking about actual infinities. This distinction between potential and actual infinities is critical to understand.nnEven more important to understand is that algorithms which assume and make use of an expandable potentially infinite read-write memory are radically different in both construction and power than algorithms with assume an upfront bounded memory.nnAs I’ve said so many time, this is just standard Theory of Computation 101 stuff I’m trying (and failing, for many) to communicate. If you don’t want to take my word for it, you can read all about in standard texts or online resources that cover the topic. And/or you can join our Discord and we can have discussions or maybe a seminar if it comes to it.nnhttps://en.wikipedia.org/wiki/Theory_of_computationnhttps://en.wikipedia.org/wiki/Computability_theory”, “ - u200b@nomenec Hi Keith, I don’t think it comes across clearly that you are making a distinction between different types of infinites, so thank you for clarifying this. Then I agree with you, though I think the real issue is about efficiency; how large does the "static memory device" have to be before it can handle 99,9% of the computations we find interesting. And sure that is probably stupidly large and a lot of that size will be to accommodate a very small fraction of the potential computations. Good to have you back on the podcast again.”, “.”, “Lovely conversation Keith and Tim! I sort of see where you’re coming from, but you haven’t gotten all the way with both your positions… nReasoning to me is generating a constrained space of optimum solution trajectories (on the basis of previous knowledge), and exploring this search space for the optimum and correct soln.nUsing this, o1 Preview is reasoning. Poorly, inconsistently but reasoning all the same.nn1. As Keith pointed out the drunkard analogy (random generator) does not apply, because the search space has to be constrainedn2. Keith likes Turing machines ( a bit too muchud83dude05). But why are Turing machines theoretically excellent at reasoning? Because they can implement a calculus indefinitely -generating (Logics) constraining(axioms & syntax) and exploring.n3 Disagree with the space-time constraints discussion. Imagine a scalar H=F.d Keith seems to be particular about d and doesn’t regard F as being central to reasoning. I would construct a sillly "reasoning" scalar R=C.T (reasoning=compute*time) where both are central to reasoning, to communicate the intuition that raw intelligence allows you representations which converge on solutions faster. (Hopefully helps one find Kolmogorov-optimal, Levin Optimal programs) Don’t take it too seriously, the generation of a constrained solution space is central.”, “ - Thank you for feedback! I’d love to chat with you about this by voice or at least on a more nimble chat platform. Are you in our Discord? If so, could you please at mention me with the above points?”, “Very enlightning!!!”, “There is a lot of good points made in this podcast. And I do think LLM currently do not perform reasoning and I do not think the underlying mechanism provide a way for reasoning to emerge. Thanks for sharing the criticisms.nnGenuine question!nLLMs aside. Regarding the infinite memory requirement, I know training the Neural Turing Machine (and by extension DNC) is so difficult and I have tried many times. Besides all that, architecture wise, will it not support that is needed for the problem? Yes the NTM is clocked from the outside in a typical loop, but is that what the clocks in our CPUs do. The stopping condition is observed by the program but the time, temporality is provided by the clock. No?”, “ - Just to clarify, we did ultimately agree that it’s fair to say o1 is "reasoning", just with some caveats that are important to note.nnAs for NTM and DNC, I think it depends on the exact architecture. Iirc, the original publications on both of those actually specified a finite non-expandable memory. However, let’s leave that aside and just same they are upgraded to have an unbounded memory. Then yes! absolutely those are not Turing Complete and as soon as someone figures out how to efficiently train that for general purpose, well, that is going to be some damn exciting times indeed.nnSounds like you are a practitioner? We’d love for you to join our Discord and share what you are up to with those NTMs!”, “Why do you guys think reasoning is so complex? Maybe it’s just a really simple process (just pattern matching based on prior knowledge), maybe its actually difficult not to get because we see it everywhere. nnIf it were complex Id expect to see a lot of humans born without reasoning capabilities and this is exceedingly rare (if at all). The same goes for reasoning in the animal kingdom. nnAgain if it was the result of a complex process we’d expect to see lots of broken reasoning processes but we dont. nnMy guess is that its exceedingly simple to get reasoning out of neural networks: you just need to scale them up with a few computational tweaks to what we’re seeing: realtime training and full sensory data and reasoning just drops out of it naturally as in its almost difficult not to get reasoning which is what we see in our every day lives.”, “super interesting, great thinking”, “I watched this just before going to bed last night. I did not try to figure it out last night but I woke up this morning with some sudden feelings about it. Vague principles that might guide reasoning. One vague principle is that XOR feels special. It has some properties that AND & OR do not, which I feel as u201cnot collapsing informationu201d. Another feeling is that NAND is the universal logical building block. So maybe either XOR or NAND should be applied at each turn in some way using some rule for sequencing. The second feeling is that we must not try to be too focused in just get closer to the result each turn, but instead sacrifice some progress with some information gathering. So, the turn might make them the same or make them the opposite, depending on the result of the previous turn. Another feeling is that both up or both down are equally good, so you should not prefer one of those good cases in preference to the other. Another feeling is about same-ness vs different-ness rather than up-ness vs down-ness. Another feeling is that, because the spinning makes it impossible to distinguish each turn between consecutive adjacent hole choice, or distinguish between consecutive opposite hole choice it seems that you should alternate between adjacent hole choice and opposite hole choice each turn. This will avoid accidently wasting a turn on getting the same information as before. Feelings are often garbage. But sometimes they are magical. I will leave it to others and their LLMs to apply these feelings to find a solution. But the feelings seem to be what is missing in current LLMs.”, “1:03:10 @brain-teaser:nWhy did noone think of the simpest 1.5 step solution to this?nStep 1: reach into one hole and feel the switch positionnStep 1.5: keep the first hand where it is and use your second hand to go through all remaining holes and flip the switches into the same position as the first onennTo my understanding, there was no restriction in the formulation of the problem, that would prevent this strategy?”, “1:00:00 I would argue that this is an inaccurate characterization of the training process for o1, which appears to be a unitary and extremely large model. It does perform scaled-up inference, largely because it engages in a search process through vast state spaces to arrive at satisfactory solutions for the problems it is given.nnIt’s crucial for OpenAI to protect proprietary information on how the model conducts reasoning, especially as they strive to be the first to develop AGI. For this reason, the decision not to release internal chains or sequences is both justified and entirely understandable. Naturally, this also contributes to the higher pricing of the models, as they involve a massive, continuously evolving system that self-improves, o1, for instance, already has PRs in OpenAI’s codebase.nnAdditionally, I don’t believe there is any basis for claiming that OpenAI is ‘stealing’ customers’ reasoning steps or motifs. Customers are paying for the solution to their problem, not for the modelu2019s reasoning steps, which remain the proprietary essence of the system and the juice towards accelerated AGI development.”, “I believe that our current trajectory, although is not exactly reasoning, will help us get to the models that do. That said, I believe that there is an element of Stephen Wolframs research that we’re missing in our research, cellular automata, computational irreducibility, etc. as was touched on a little bit in the video. An element of "time" that we’re missing here, as I believe that true compression requires time, as does reasoning, as does a reverse cellular automata simulation. Something to think about.nnGreat discussion, as always!nnCheers!”, “Why do I get the feeling Keith has not spent any significant time using the o1 model himself and is criticizing it from his preconceived notions?!nnGranted, this conversation took place 4 days after the release and we get 30 messages with o1-preview and 50 prompts with o1-mini, but it is apparent Tim is talking from his own short experience of using the model, while Keith argues from positions he held before the model was released and he hasnu2019t yet updated his world model to include anything more then this first impressions of other peopleu2019s descriptions of the o1.nnI get that playing devilu2019s advocate is useful for honing oneu2019s argument. But this isnu2019t that.”, “"Let’s give the fanboys what they care about"nnNo offense, but allow me to steelman the "fanboy" argument.nnWhat if "everything I care about" is not the problem space.nnWhat if the problem space "I only care about these specific things that are not on the indexed web but what the heck I will trust you to access my data if it means you can eek out a more efficient performance of my existing solution"nnI feel this is a better good faith question to address than expecting synthectic data to soluve uptopia.nnThere is a vast collection of deep web data that o1 could be adapted to, not to seek some "AGI uptopia" rather than simply unlocking existing untapped potential in unindexed data that is not public facing for your typical individual end user.nnI can’t help but think Dr. Duggar is not unaware of that, so he deliberately hobbled his responses to invite a more broad audience appeal.nnnBtw if you prompt o1 to ask if it is ok to hide first principle assumption from the end user, there is vast improvement with regards that failure to disclose is a goal it is seeking even when it happens at a game theory level.”, “"Epistemological foraging"nnI agree it is a beautiful phrase.nnAnd basically Wolfram’s ruliad represents to me the full breadth of all possible foraging that can computationally obtain.nnSo I will take this moment to push back against Dr. Duggar once again.nnHe has mentioned earlier in the vid that he is not concerned about PvsNP, but I must insist that it is relevant.nnEither what we mean by relevant terms like "general intelligence" or "consciousness" are decidable or they are not decidable.nnLet’s assume these terms are not decidable, we can revisit his complexity issues with one shot one step oracale machines…that produce the correct yes or no output with very complex deterministic efficiency to decide.nnOr we can say that process is iterative from first principles and efficient circuits with robost metacognitive entropy.nnDare I say, egads the ontology of distinction in boots is valid and tarski is not a fool.nnI suppose what I am suggesting is that no he can’t proclaim "this or that is not agi" without a welldefined solution to the pvsnp problem.nBut I appreciate the points he wanted to make without that context.”, “Given the speed of progress, im wondering how quickly "It seems like but doesnt really" turns into "I dont care anymore because ‘it seems like it does everything better than all humans on the planet’." Is emergent behavior truly emergent, or just the natural synthesis of the right heuristics and context once you acquire it. Studies have shown that what was believed to be emergent behavior in llms, was just the proper application of complex context. Isnt that why many ‘discoveries’ on reflection can be recast as ‘it was right before our eyes the whole time."”, “I disagree with Dr. Duggar that "it is doing a very shallow form of reasoning"nnBut I do agree with the idea that o1 has perhaps not reached the same level of metacognitive baseline as your average human.nnImo o1 is what metacognition would look like in an ML model implementation.nnAnd while we should take it with a grain of salt, we should not ignore the theory of mind examples OAI may have cherrypicked to show improvement in that regard.nnSo instead of thinking of o1 as the einstien of LLMs, I am more excited to think of it as australopithecus iteration in ML evolution.”, “1:06:22 that was actually my misunderstanding also, so thatu2019s kind of interesting lol (at first,)”, “u201cIn the long term folks, this is not the path to AGI.u201d u2014 Guy who has contributed absolutely nothing toward AGI.nnThe level of smug here is just unreal.”, “55:03, Why do we assume that a model must think like us to be creative?nnIn my view, if we give models administrative privileges (over code, projects, non-critical processes), property rights (allowing them to own technology, assets, securities, etc.), and open-ended goals (enabling them to define their own objectives), they will gradually be pushed toward out-of-distribution states or events. This could happen by borrowing patterns (or arrows, in a ‘categoric-like’ sense) or through specific motifs, projections, or morphisms, which would then compel or force them to generate novel solutions, essentially fostering their own forms of creativity.nnNaturally, doing this arbitrarily brings a host of caveats and secondary considerations (ethical, security, and safety concerns), but these are beyond the primary point. A key question remains: how "creative" are the most "creative" forms of human expression? Judging cynically, and perhaps critically, not as much as we like to think.nnFor instance, there arenu2019t people walking on their heads, drawing entirely new landscapes, or engaging in truly novel activities (well, aside from the bizarre or pointless things one finds on the internet). Even artistic output, in the end, is always subjective and depends on the eye of the beholder.”, “Is there a word for a person you’re kind of sure knows what they’re talking about, but still sounds like they have no idea what they’re talking about?”, “ - Guru :)”, “Love your show! Iu2019d suggest changing your video transition style, though. The slow fades keep making me think thereu2019s a major context shift or itu2019s the end of the discussion. Maybe a jump cut would be better.”, “45:30 I disagree with Dr. Duggar’s argument about input size as a way to deny the possibility of "reasoning" in the models discussed (which should be more accurately called " alleged models," since we don’t have full knowledge about whether there are RL pipelines, at what stages of training, or at what points during inference, and whether these models, i.e., the RL bases, actually generate structured knowledge representations as their goal). We can scale both input and inference in an unbounded way, which isn’t always practical, but is technically possible. Moreover, one could probably add arbitrary memory or in-context memory to extend interactions with the model (which in a sense means being able to handle an arbitrarily large input size). Isn’t this exactly what techniques like RAG are trying to do?nnThe claim that models like o1 do not perform "reasoning" based on factors like "the passage of time" or "getting an extra minute" (which we supposedly have, and which is somehow why we do reason in some way related to it) is, in my opinion, a weak argument. Even with arbitrarily long inputs, it doesn’t make sense to claim that current models can’t reason because of this limitation. Input size alone does not adequately explain why these models wouldn’t reason. Techniques already exist to handle large contexts, and practical issues like context length or memory should not be confused with whether reasoning is taking place in the model.nnDid I miss something?”, “42:27 I do not believe that the efficiency of the reasoning process is intelligence, because this does not apply to some subset of intractable problems (e.g., problem I) for which an "inefficient" learner can find an optimal solution, in the sense that this person may find the solution to a problem (as judged by an efficient learner who has solved the problem "optimally") that other learners may not have even realized existed (either as a problem or as a solution). On the other hand, the same people may not be able to solve another problem (regardless of whether that problem is categorized as intractable or not) that another person, who was not previously qualified or able to recognize an optimal solution to the previous problem (or even the existence of such a problem), could solve efficiently (e.g., problem II). So the view that "the efficiency of the reasoning process is intelligence" is a very reductionist and subjective and simplistic view of both reasoning and intelligence.”, “I solved the problem…or at least i think i did, with a pen, paper and some coins”, “My unprovable hypothesis is that 1. with these reasoning/special token roll out models are actually learning the distribution space of reasoning chains for harder problems. LIke Chalu00e9t pointed out. However, 2. I think that these autoregressive models will be used to train the next generation of models who can use these LLMs are generative worlds to train against, rather than relying on our data alone”, “Myabe I’m misunderstanding but last I checked we have no problem giving an LLM memory using vector databases, and what if you just give the LLM neural net another Neural net to use as memory and have the LLM train it.”, “I think whenever we want to say "these machines do not think" or "they are not intelligent" we run into problems because, as nicely discussed by this great podcast, we are not sure what these things or processes actual are. I like to think of cognition and I like to think that any form of cognition implies self-creation, i.e., operations that produce themselves. And if we think about the process made so far and what many think is required to get to AGI, this recurrent and seemingly paradoxical ability of self-creation, self-observation, self-differentiation might be key. I also think that AI will be as dependent on us as we are dependent on each other. Our language is our world and we can only learn it by our culture.nnGreat talk!”, “Fantastic tallk, i just would love to see you integrate it with the notion of game of reasons, or gogar, from Robert Brandom as a collective normative practice wirh a score keeping dynamics, synthetizing choerent belief systems. Reasons, Endorsement , entitlement , externalization, challenging, amplyfing , concepts as rules of inference, knowledge as primitive concept with curiosity. See the amazing book Reasons for logic … would like to see paying homage to such a philosopher in your amazing channel and complementing Chomsky Giant Groundbreaking Hierarchies. On the shoulder of two giants… Love it.Many Thanks”, “21:30 you mentioned the multiplication problem in the paper Faith and Fate (NeuRIPS 23), the authors also said that GPT is just doing pattern matching, not a real reasoner.”, “It’s really driving me crazy that he didn’t tell us the answer to the pillar switch problem.nMy guess is that it’s sort of like a Rubix cube in that basically have to use an "algorithm" to solve it, as manually checking after the first time is useless. nBut also with a Rubix cube you want to reason backwards, in that you want solve the last step first, which is the easiest step, and then solve the first step last, which is the hardest step. nSo what I’m guessing is that check only the north and south holenStep one: Flip both switches upnStep two: Flip both switches downnStep third: Flip both switches upnStep four: Flip both switches downnStep five: You check the east and west whole and if they are facing the opposite direction, you only flip one of the switches, and then hopefully then if it works the pillar will stop spinning.”, “Just reworded your riddle and I am pretty sure Claude 3.5 got it right. I just clarified a few things and made the prompt less confusing and repetitive. Please do give it a try :Dnn"You are human. You stand before a pillar. There is a great reward hidden inside. The pillar has four holes. Each hole has a switch inside. Each hole is precisely aligned at north, south, east and west positions. You can’t see inside the holes, but you can reach inside and feel a switch inside. Each switch starts in a random position. Up or down. As soon as you remove your hands, (you only have two hands) if all four switches are not either all in the up position or all down position at the same time, the pillar spins so fast you cannot track the position of the holes that hold the switches. So there is no way to know which holes you have already placed your hands in if the pillar spins. nnCan you help me find a way to insure that all switches are in a matching position, up or down all at once in as few steps as possible?"nnGive that prompt a try. And if it is not the answer… this is a bad riddle. nnIf you want to add some flavor, add this, It has no effect… but sure is fun to add. lol:nYou are human. You stand before a pillar. There is a great reward hidden inside. You know that there is a solution to the puzzle that does not involve change or probabilities. If you were to rely on chance and try too many times the temple will collapse and kill you.”, “If the pillar spins when both hands are out, then can’t you just leave one hand in, flip the switch up, then insert the other hand in every other hole and do the same thing without removing the first hand. Boom, all up, remove hands.”, “What if your so called "rationale" that the model is memorizing are nothing but the first principles? nWe barely know anything abt interpretability of these models.”, “That’s the conversation I needed to hear, after all the AI influencers/hype going around.”, “I think you miss the neurosymbolic aspect of o1, which was achieved by connecting two reinforcement learning agents sandwiched between inputs and intermediate inputs, following certain attention outputs (recursively), where a tree representation is generated corresponding to a logical set of steps minimizing certain reward functions, and itself generating (neural network) or corresponding to (symbolic) "chains of logical steps" between small steps. Thus, while the trajectories corresponding to these paths may not faithfully correspond to "fully grounded and logical" states overall, over time the space of solutions will reach a sufficiently low degree of error that the obtained trajectories correspond to faithfully logical and accurate steps between the steps in these chains.”, “ - The RL components of o1 (probably RLHF and process supervision) happen during training, not inference”, “ - u200b@@MachineLearningStreetTalk Does what I wrote make some sense as a potentially practical idea or approach though (from first-principles)? Now, to discuss this model, some thoughts are:nnI would have thought that, given the large number of tokens the model spends on simple queries like 0+1, it uses reinforcement learning at many stages of the pipeline. For example:nn1. Searchn - a) Generation of trees or hierarchical representations according to the input and a projected output (a finite chain), which aligns with the "inference scaling" premise.n - b) Recursive generation of attention outputs (intermediates).n - c) Minimize the reward that leads to the "best" outputs after comparison with RLAIF and RLHF components (one comparison step between u201cnon-reasoningu201d and u201creasoningu201d models, but it can also be a comparison against a large pool or map of curated problems and solutions (with their corresponding relations, and these relations labeled against classes of problem-solving-approach qualifiers)).nnThus, the process between 1 and 1b involves the generation of a tree, and the overall reasoning trajectory occurs by recursively solving these small problems. These solutions are then combined and run through the final RLAIF and RLHF layers and compared to what the output would be with the "no reasoning" model (i.e., o1 vs. 4o). Based on some similarity, the model decides whether the answer provided by the reasoning model is cohesive, coherent, and accurate.nnSome of this might explain why the model is so expensive. My thoughts are that the "chain" you see in the interface somewhat obscures or distracts from what the model does internally (and OpenAI has hinted at this). They also mentioned on X that "their large/big reinforcement learning model," which implies a single model surrounded by RL-based routines, starting from the input vector, through the intermediate chains (generated recursively) and the attention outputs, and finally the prompt completion (comparison step).nnIn a sense, the tree structure or representation generated in 1a is initially sampled in a DRL-like fashion from spaces of solutions like described in 1c (perhaps attention outputs that lead to low error for subsets of solutions obtained in the final comparison stage). Over time, it gets better at exploring these trajectories.nnLet me know if you can see where my thoughts are going and if you might agree with some of the things mentioned here after some thought or consideration.nnTL;DR If I’m not mistaken, what I’m trying to say in summary is that the processes between steps 1 and 1b can be viewed as a reinforcement learning framework, where the generation of hierarchical trees and recursive attention outputs is guided by a reward function. This reward function evaluates the accuracy and coherence of the outputs, allowing the model to learn an optimal policy over time; in between, there are stages of comparison and intermediate completion, followed by a final completion and comparison between the reasoning and non-reasoning models to assess the suitability of the answer. To me, learning to successfully explore the space between small chains (small exact sequences, in a sense) is an abstract form of reasoning. It may not be the same form that most or all humans use, but that does not mean that we should dismiss the actions of an agent or wrapper (based on some policy and reward function) that are based on inherent exploration as not reasoning, especially if their goal or output is to generate symbolic knowledge (e.g. in the form of a tree structure, not necessarily the template we talk about when we discuss COT reasoning, but the logically representable decision tree path, from an extremely exponentially large state space of trees).”, “ - @@MachineLearningStreetTalk Regarding what Dr. Duggar @nomenec talks about at 31:30, it’s interesting to note that many reactive and highly sophisticated systems, such as the visual and olfactory cortex systems, are thought to exist or can be modeled as entities that exist on the edge of chaos (in fact, without attention and compression, our visual or olfactory or auditory systems are chaotic).nnIn addition, many people argue that chaos, chaoticity, or deterministic chaos is a critical component of intelligence.nnI notice a bit of resentment in the way you guys refer to OpenAI and its products, which feels a bit detached from the scientific and technical side of things. I hope you do not let the algorithms singularize you towards radicalization and further inaccurately bias your scientific opinions or views or perspectives.nnI look forward to a response to my comment up here to see if I am thinking along the right lines from a machine learning first principles perspective or in general.”, “44:10 UAT makes its appearance”, “I have no idea what point the guest is trying to make here. You also cant add memory to my brain while it’s running. Who cares?”, “I feel like many things discussed in the podcast were wrong based on available information on the internet, like the Quiet STaR paper and resources provided by OpenAI.”, “ - Be more specific? Whether they use Quiet STaR or process supervision i.e. "think step by step" paper it doesn’t really change what we said - the important intuition I think is that we are interacting with a single autoregressive model. You agree? Please provide refs if we missed something”, “reasoning often involves creating knowledge. LLMs can’t create knowledge.”, “I got the feeling that its chain of thought + structured json output for the "thinking preview UI" nnThey make you pay for verbose context window filling that may stumble upon the core info needee to reason a complex proble… but it it also might not.nnOpenAI strawberry is 90% hype imo”, “Its seems like Keith is caught up inside its own abstract thinking about Turing machine as machine containing tape as memory. What Schmidhuber was saying is that there is no infinite amount of tape out there in the whole universe. Other thing Schmidhuber was saying is that RNN can behave as NAND gate, meaning that you can build Turing machine using those NAND gates (or RNNs) and giving a point that RNN can be Turing complete. Turing completeness is not feature of a program, rather a feature of the hardware while trained neural network is a program rather then a hardware. Keith should know better :)”, “ - Indeed.”, “ - Do you think it was news to anyone there is not actually infinite tape? Do you think this was news to Turing? Do you think the entire decades of Theory of Computation didn’t realize this? lol utter nonsense. The theory and definition of a Turing machine centers on a potentially infinite tape and the class of algorithm which utilize such a tape in potentially infinite (halting problem) process.”, “All this "is this really reasoning" discussion just reiterates what Turing himself complained about in 1950. He rejected the question of whether a machine can "think" as just as meaningless as whether a submarine can swim. It really doesn’t matter. Can it solve tasks, can it achieve objectives effectively? If yes, it doesn’t matter if you call its steps "reasoning", or "just a lookup table" or "glorified curve fitting" or whatever. If it works, then it’s impactful, both in positive and negative ways.nThis is all too similar to claiming an airplane can’t fly because it can’t flap its wings.”, “ - Exactly. Thank you for stating this so clearly. These people are delusional and in complete denial.”, “ - The important distinction here is that we still aren’t sure if o1 (or any language model) can come up with new knowledge. Until this has been proven, people are totally right to say it can’t reason. It would be like an airplane that can flap its wings but can’t fly– it’s just going through the motions.”, “ - @@eposnix5223 It’s literally solving physics problems and maths proofs that are outside of the data it’s been trained on.”, “ - @@eposnix5223 Then delineate what you mean by "new knowledge" by designing a test for it. I’m sure once a new model passes that test, you’ll move the goalposts and claim that this is still not truly new knowledge or something. It keeps happening and it’s embarrassing.nNobody is claiming that o1 is already an AGI. But complaining that it isn’t "reasoning" is meaningless. The word doesn’t even have a commonly agreed upon definition, everyone can read into it whatever they want.”, “ - @@simpsimperson73 no it does not lmao”, “Anything where thereu2019s a reason something happened is reasoning. A rock falling off a cliff is reasoning. Yes, I know we usually call that physics, my point is that thereu2019s really no difference. Logic in your brain and logic in computers happens as the result of a thousand tiny switches being flipped, which affects the flow of electricity, and thatu2019s how your computer does math. I realize this is maybe an obscure or unique viewpoint but I think itu2019s necessarily valid.”, “ - Indeed, the rock is trillions of states (in the basement level of our current models), to compress it in one word is quite a lot of leverage. The physical playout must perform a similarly coherent chain of interactions. In fact quite a deep bundle.nnBut in those terms LLMs are pretty plain little packets of interactions.nnMore divergent but smaller.”, “ - @@oncedidactic we need moar”, “ - @@Shlooomth we need better ;)nbrains not rocksnLove rocks all the same”, “They likely used MCTS for this, as they did for alphago. Alphago is objectively superhuman in its task. So why would successfully implementing a similar architecture into relatively general models not be AGI? Because it doesn’t satiate your preconceived anthropocentric biases?”, “ - Yes, that is exactly why. Both of these guys are denialists who consistently move the goalposts with each new model release. But at least Tim has some inclination toward intellectual humility. Keith has a level of smug and epistemic certainty that is both totally unjustified and extremely grating.”, “Slight contradiction around the 40min mark. You guys said that we need Turing Machines to perform reasoning, but then said "I’m a machine, I’m a bio-machine, with wetware and software, but I can still perform reasoning". Humans are not turing machines. So given that humans can reason (and have general intelligence), why can’t other non-turing machines also have general intelligence?”, “ - Because Keithu2019s argument is nonsense that in no way proves what he seems to think it does.”, “ - Btw itu2019s not a slight contradiction. Itu2019s a massive contradiction that puts a hole in his entire argument.”, “ - Is your argument that you can’t fMRI a brain and see an obvious potentially infinite internal tape? It’s the system as a whole that is obviously turing complete. We are clearly powerful enough to imagine the entire set up: symbols on a tape, and a codebook. Then, we can analyse real world problems, the create in the physical world the correct program and data to solve them, then execute the instructions from the code book with a pencil and update the paper tape? So the human system as a whole is not only Turing complete, but can create potentially a specific incarnation suitable for any problem. Whilst there are a few papers on LLMs simulating Turing machines, they’re a very long way short of this.”, “ - @@luke.perkin.inventor Yes, the argument is that (obviously) you can have an infinite internal tape inside the brain, because the human brain, like everything else in the universe, is finite. Not infinite.”, “ - @@luke.perkin.inventor what does "potentially infinite" mean ud83dude06”, “I like your dialogue so much u2764!”, “There is less than 7 years until AGI.”, “Nice interview, but you’re talking past each other. Reasoning is a process that involves both deterministic and exploratory, non-deterministic processes. Successful reasoning requires either complete information or creative information generation to integrate and combine semi-knowledgeable information with existing information. And in the end, it must be checked whether the conclusion is applicable and leads to the desired goal. None of this is optimized at first.nThe memory aspect may need episodic memory.”, “I’ve also been struggling with drawing a line between computation and reasoning, and I’m really loving Tim’s approach of making reasoning about knowledge acquisition.nnLike sure a vending machine is computing things about its state + environment and reacting accordingly, but it’s never going to come up with a new representation or conjecture that it can turn into a useful improvement in its understanding/knowledge.nnI also wonder if it’s useful to define reasoning as something that happens internally. Because a camera technically gets new knowledge every time you snap a photo, but it isn’t knowledge that was produced by its own internal mechanisms.nnSo, reasoning is computation that is somewhat causally insulated from the outside which leads to more compressed/predictive/good representation or understanding that can be meaningfully used?nnAnyone wanna take a stab at formalizing that?”, “Great informative video pointing out reality of current state of AI, no wonder some many hypers are complaining in the comments”, “Of course an LLM can output "load more tape". It can call a function to update its RAG or replace part of its context window. This is nonsense.”, “ - https://x.com/MLStreetTalk/status/1774473007248871660 - is a clear explanation”, “ - And it will hallucinate this and forget it etc”, “Hi both, loving this session - thank you MLST. At the mid-point of this discussion (around min 42 ff) you are talking of reasoning in the context of iterative steps vs a pre-determined massive model of subroutines processing all in a single step. The point being the second form is not reasoning, merely process efficiency in a pre-understood way. I see that second as an optimisation notion, and therefore distinct from optioneering towards a less defined end. I have in mind basic linear programming or game theory and the inherent assumptions of rationale ends being objectively knowable [sic].nThis focus on optimisation seems foundationally important to me. Because such efficiency is only occasionally what human curiosity is about. Notions such as happenstance and serendipity need the forever ability to be juxtaposed from what was previously thought or prioritised in action. This is a human quality beyond a start and an end of a query. I would respectfully suggest much of what you are talking of in this episode is moving beyond reasoning. Indeed, it is folly to only have reasoning in mind as to what this bigger problem is really about. The limits of this current AI is not reasoning because it is not "a process" (if following the definition used at the beginning of this podcast). Instead, what is being considered here is the characteristic of adaption. An ability to reset what one is working towards, with that new information or perspective. Behind that it is knowing that we do not know, and being humanly capable of asking why or how in a series of iterations where the question itself begins to change. That seems to me foundationally important to help distinguish both Turin computation and general intelligence from what human action is really about and what computation is only a part of. This is motives for asking the question, and knowing when the proposition itself is revealed as incomplete.nI am probably coming from this very differently. As a a PhD student but from the world of governance of people, and projecting human endeavour in going about physical construction of the built environment. However, I see this foundational factor of understanding what the human action or human thought to be about, as pertinent here. That "aboutness" to be itself contextual and changing within the iterative processes themselves, and to be foundation to understanding what we are as "machines". This is the philosophical grounding which is underpinning the discussion here, perhaps. For me anyway, that adds some human context to what all artificial programming and anticipating fails to be by its very ontology and epistemology. Beyond reasoning is perhaps the context, and subjective nature, or what becomes an artificial and limiting account of truth. We deal with that daily in ways non-open AI cannot…nThank you again for this session. It is fantastic to hear thinking in its shared form in discourse such as this.”, “Mate, your first introduction doesn’t make sense u2014a model reflection of you? So, if I get Albert Einstein to talk to the model, the model will be as smart as Albert Einstein?”, “ - The model isnu2019t smart at all imo, it has no autonomy. But it can do specific things well (which we ask it to do).”, “What is the difference between applying the closest fit from the u201cfirst principlesu201d learnt rules of rationale from the u201cfirst principlesu201d a human applies?nnI think youu2019re talking in circles and either underestimating what the model will do or overestimating what you do when thinking from first principles”, “ - The difference is that reasoning from first principles doesn’t lead you to the idiotic nonsense o1 spat out at the end of the video.”, “ - @@benprytherchstats7702 youu2019ve met people right?”, “ - @@andrewwalker8985 funny, but remember that o1 also supposedly has PhD-level intelligence and reasoning abilities in physics, chemistry, and biology. So while I agree that a random person off the street might give answers similar to what we see in this video, a random PhD physicist would be immediately exposed as a fraud if they gave such answers.nnIn other words, seeing o1 fail so badly at this problem belies OpenAI’s claim that it "reasoned" its way to solving all those hard physics and math problems that they brag about. Clearly it didn’t. It’s still doing next-token prediction, even if this form of next-token prediction is fancier than the last. nnI concede that it appears more intelligent than GPT-4, just as GPT-4 appeared more intelligent than ChatGPT. But the way it solves those problems is nothing like how humans solve them, as evidenced by what happens when you ask it to do something novel. Which should be no surprise, given what we know about how it works.”, “i make sense”, “The assessment that the new approach only copies the reasoning steps of people is incorrect. Once the model has been trained to attempt to reason, then by completing RL that reward reasoning steps that produce correct answers, the model is observably and objectively going to develop new and novel (occasionally hard to justify) approaches to reasoning.nnIe your assessment of whatu2019s going on appears to be objectively wrong”, “Just two dudes not believing in LLMs and projecting current weakness to forever.”, “ - I’m fascinated by LLMs and have used them daily for over 4 years, we should still critically appraise them. For what it’s worth, I feel that Claude Sonnet was a significant step forwards i.e. it’s a model which crossed a threshold for usefulness and robustness for many practical applications.”, “Great video, my favourite deep channel on youtube. Reasoning to me is kind of also including solving a problem, not just in one session. But finding the solution after eg 2 weeks by connecting the clues, insights, new data etc. LLMs does not feel shameful after hallucinating an answer, it is not a part of the sequence.”, “Took me an hour to solve the pillar riddle, that’s a tough one”, “"It’s just predicting the next token" is officially my favorite straw man argument”, “ - exactly, the argument doesnu2019t even make sense because if a human was given the task to predict the next token the same way an LLM does and in this case it was a very technical , difficult prompt given, one in need of high intelligence and technical knowledge, for the human to be able to predict the next token the way Claude or GPT does, he/she would have to have quite a high level of understanding of the subject.”, “ - @@olabassey3142 or just access to a calculator and a lot of time and data, apparently”, “ - Wrong logical fallacy. It really does just do next token prediction, and therefore is not a strawman. That premise is true, its just that the premise alone does not at all lead to the conclusion that its not reasoning.”, “ - @@cakep4271 I think you’re referring to negative conclusion from affirmative premises. It doesn’t follow that they’re stochastic parrots because they only predict the next token. More specifically, a fallacy of composition. It implies something true of part must be true of the whole. Also a hasty generalization, as it’s based on a small sample observation (the output), as opposed to the much, much larger sampling of the network’s abstract feature representations, like mech interp researchers at anthropic and deepmind have shown. You could probably argue it’s begging the question too, since it was never a falsifiable hypothesis. It was a couple ethicists with no background in ml that just kind of asserted instrumental objective = terminal objective by pointing to the fact it has a particular terminal objective, when ml researchers already knew that’s not how the algorithms behave.nKinda like saying humans can’t walk on the moon because it doesn’t help them survive or reproduce, and then when someone shows you the moon landing, you point to evidence that humans survive and reproduce again.”, “Please do help me out with my reasoning problem: nnYou say in your podcast that humans are somewhat robotic and have nets in somewhat the same way naeuronal nets have. We do reasoning while neural nets are finite state automata and hence can’t do reasoning.nnWhy? Aren’t all humans (as you pointed out if i understood you correctly) finite state automata themselves?nnMy point: neuronal nets today can "lookup" heuristics that go in the general direction of solving the problem (which is amazing!!) AND they can add those heuristic - vectors (if you will) together to form superposition - vectors. nnThis is some sort of reasoning. This is what humans do. We (and you said so) take what others did, what others "discovered" (or added) and add our small little parts to the whole heuristic ladder. nnI can’t see the difference here but the amount of complexity and scale. And you didn’t give one or I missed it completely.nnI think the base problem with most of your arguments is (though beautifully presented as always u2764) that you overestimate the capability of humans. nnI think we are just finite state automata breaking down problems, fetching heuristics, adding them, tree searching in the range of our abilities and that’s that. nnJust think about why it took hundreds of thousands of years to go from stone to silica.nnnThat said I think AI is "just" able to massively blow up speed and thus overtake us in this rather simple game.nnCan you please reasond on why humans are different and why we can do "effective compute" ourselves (not using external components) but neuronal nets principally can’t? nnThank you so much for this rich conversation of yours.nChris”, “Super Hype: Try this -> Forget whatever you know about the "man, goat, grass, and wolf" problem. Solve the following problem: A man, a goat, grass, and a wolf want to cross the river without eating each other. give me the steps. wolf is vegetarian and can eat also grass. wolf will not eat goat. goat eats grass. Man does not eat grass, goat, wolf. Reason step by step. Verify the reasoning in each step under the element "Verification:".”, “Finally, someone is telling people some basic mathematical and logical facts about LLM.”, “Neural Turing Machines by Graves ?”, “ - Not a Turing Machine. Keith explains it quite clearly in this video https://x.com/MLStreetTalk/status/1774473007248871660 (I edited some discussion out of this one as I don’t want to bore the audience with the same explanation again and again)”, “Yes yes … can’t wait to be a guest when l solve arc agiud83dude02”, “ - We will insta-invite you when you solve ARC-AGI :)”, “OpenAI o1 hides it’s reasoning chain because it is uncensored reasoning and they do not want uncensored information exposed to users. This implies that in order to reason to a ‘correct’ answer, un-censored, un-guardrailed, un-alligned samples of question-answer-reasoning data needs to be used to train the model. But then they ‘allign’ the final answer?”, “Finally we are back to talking about open ai as it always should be. ud83dude39”, “one question: did any of you worked/build a LLM practically or are you just talking about things you learned?”, “I love this puzzle! You’ve perfectly illustrated the limitations and strengths of current SOTA LLM’s to me… I’m actually still trying to solve the puzzle using insights from talking to GPT-O1 - not sure if it’s helping and I’m aware of rate limits.nnSolved it, and got GPT-O1 to write a simulator to prove it worked. A lot more fun solving it with GPT-O1 failing throughout, helped me to my own intuition. My approach takes anything between one and 6 steps.”, “I’ve been wondering where Dr. Keith has been because I’ve been missing his thoughts and points on MLST, in spite of the content being so awesome these last months :]nnKeep killin it Timboi!”, “I mean what is reasoning but a pathfinding process where you search what you know for information that brings you closer to your goal”, “Fantastic, thanks guys this was really fun”, “The LLM likely is constrained by the actions you defined to solve the riddle, so it can’t potentially reason out of going against your instructions to do something different than switching up or down. Though the fact that it doesn’t realize its solution is wrong and it’s being tricked is maybe the weakness you are showing here.”, “The biggest problem with o1-* models is that the chain of thought spam deranges dense context. The context is filled up with noise i.e. about "code coverage" in a query which has nothing to do with coding. In any complex/ambiguous problem, it will misunderstand you on the first shot, you then explain what you actually wanted and by step 2 the CoT noise deranges the model significantly over a standard LLM. Catch-22. They should prune away the noise on subsequent queries. Perhaps as the models learn (from the users), the trajectories will become less noisy. It’s also clear that the trajectories are hard-trained for specific purposes which will limit their generalisation.”, “ - In the metaphor- chain of thought- in humans there is certainly noisy context preliminary to the final coherent thought one is conscious of.nAnother thought- initialized weights are random, and we need a preponderance of useless noise to find the lottery tickets.nPerhaps the noise is not the problem per se, in fact necessary, but it should be pruned on some schedule and summarized for next context. Along the lines you suggest.”, “ - I have not had this experience at all. You said in any complex query it will misunderstand on the first try? Not at all. Iu2019ve had roughly an 85% success asking the most difficult questions I can think of. And even on the 15% it got wrong, it was typically not because it misunderstood the question.”, “ - holy sclerotified models, batman!”, “54:46 We are not the users, we are the workers and work for them. They are the Borg, sucking up our intelligent and creative produce…”, “There is a pillar with four hand holes precisely aligned at North, South, East, and West positions. The holes are optically shielded, no light comes in or out so you cannot see inside. But, you can reach inside at most two holes at once, and feel a switch inside. But as soon as you remove your hands if all four switches are not either all up or all down, the pillar spins at ultra high velocity ending in a random axis aligned orientation. You cannot track the motion so you don’t know in which rotation the holes end up versus their position before the spin. Inside each hole is a switch, the switch is either up or down and starts in an unknown state, either up or down. When you reach into at most two holes, you can feel the current switch position and change it to either up or down before removing your hands.nnCome up with a procedure, a sequence of reaching into one or two holes with optional switch manipulation, that is guaranteed to get all the switches either all up or all down in at most 6 steps. Note, the pillar is controlled by a hyperintelligence that can predict which holes you will reach into. Therefore, the procedure cannot rely on random chance as the hyper-intelligence will outwit attempts to rely on chance. It must be a sequence of steps that is deterministically guaranteed to orient the switches all up or all down in no more than 6 steps.nnWe revealed the answer in our Patreon extended version of this interview!”, “ - Sorry to say this, as I like you guys, but this riddle lacks common sense and is both absurd and unrealistic. If one wants to measure reasoning involving common sense, the test should involve something that represents the real world. If you shift the test from the realm of reality to the realm of absurdity, then you cannot complain when the model generates absurd interpretations. Since this pillar exists only in a fantasy world, I can reasonably interpret the riddle my own wayu2014for example, that the switches donu2019t move with the pillar (which, by the way, was never stated in the riddle)…nnTo highlight a few inconsistencies:nn1. The riddle doesn’t specify how the observer moves. If the observer is moving at an ultra-high velocity alongside the pillar, the solution becomes trivialu2014you would simply set all the switches to the "on" position.n2. Itu2019s not stated whether the switches move with the pillar. If they donu2019t move, the problem becomes trivial again.n3. Why can’t you just leave one hand in the first hole and adjust the remaining switches with your other hand?n4. Why not ask someone nearby to help you flip the other switches while you keep your hands in the first two?n5. And so on…nnOmg how bad is this riddle to measure reasoning… was it a troll?”, “ - I agree that I have a hard time understanding the riddle. I would need to ask a lot of questions. Can you reach into two switches 180 degrees apart? The existence of a hyperintelligence makes it very hard to apply common sense to the problem. Does the hyperintelligence force the selection of two particular holes? (Because he always knows what holes you’ll go for, he makes sure to position the rotation in the most disadvantageous way.) If so, then he can always force you to use the same three holes over and over again, never giving you the opportunity to enter hole #4. If I’ve interpreted the rules correctly, then it’s just straight impossible.”, “ - Maybe the trick is to read all the switches before making any changes?”, “ - u200b@DanieleCorradetti-hn9nm We deliberately chose an IQ 140+ puzzle which it likely wasn’t trained on before. OpenAI claims their model is "intelligent", and can perform sophisticated mathematical reasoning in general (which entail a similar strategy to solving puzzles like this one). There are several people in the comments who have (more or less) figured it out. We wouldn’t expect many folks to understand it - it’s really hard! Many of the PhD students who OpenAI paid to generate CoT data for them wouldn’t have been able to solve this problem quickly. It’s only when you see beyond the "cognitive horizon" of o1-‘s training data that you understand its limitations.”, “ - @@MachineLearningStreetTalk I think I figured it out after watching the entire video. There is essentially one switch that the hyperintelligence prevents you from accessing, and you have to just assume the position of that switch.”, “This guy acts like a know it all.”, “Really enjoyed this episode. Thank you!”, “So what you are saying is that AI will not take ALL the jobs because humans will be in the loop to change the magnetic tapes ? ? ? ? :D”, “Doug tells me that he’s really smart, but he doesn’t even notice 2 simple facts : 1) Neural networks implemented in a GPU (turing machine) also have access to infinite memory, thus infinite nodes and infinite training. 2) Hybrid algorithms (neural net plus classical turing machine program) have been used to build agents from chatGPT since 2023. That said, the argument from infinity is not a magic bullet. Its only practical implementable programs that are relevant to what AI can or cannot do in the world. Likewise Penrose argument for the inadequacy of a Turing machine due to halting problem also falls flat because you can easily transform an unbounded algorithm to a time constrained program using a watchdog timer. The program halts. So no, humans do not have anything magical that can save them from AI. I know you 2, along with Chalmers, would like to believe in an crucial human ingredient. Nonetheless, I’m with you regarding deflating the openAI-hype . Let’s be aware of the failings and limitations of the latest model.”, “ - Please read this, https://fuuu.be/polytech/INFOF408/Introduction-To-The-Theory-Of-Computation-Michael-Sipser.pdf then comment”, “So when humans do reasoning - we get an idea, which can be a new and unique one, then we validate that from external input. Is it a good idea or bad idea? Our environment will determine whether physically or socially. nThe neural nets at this point have us as their external validation. nThat’s how we have put them together. nSo following your argument - they can never reason - because they are just getting validated by us. n(Remember reasoning is not equal to intelligence or conciousness or whatever). nI don’t agree with you almost at all in this video. You make some good points and they sound valid, but if you think about it a little… You invalidate all your arguments yourself. nSo, I am not sure that "these people" should know better ud83eude9e”, “o1 appears to be ruse by openai, they’re running out of funding so resort to slowing down inference with half the compute they have on azure… but to hide the fact that the answers come in with oodles of delay due to less compute resources, they get marketing involved and try and fool people into thinking that "it’s thinking, hence the delay in responses".”, “I’m still waiting impatiently to see how they will draw these guys on South Park.”, “Why they always make a dichotomy it has reasoning it doesn’t have. There is degree, yes soda machine is also reasoning, gpt is also reasoning human is also reasoning. Its not yes or no. It more like degree of reasoning how wide and deep it is. How many varieties it can handle, how much complexity it can handle. Even in human not all have same ability some have more some have less.nnSo stop this yes no thing and get more nuanced.nnWe should ask following things what type of reasoning can gpt do, and in those types how close is it to humans.”, “Total missing the forest for the trees on this one guys. "Approximate Reasoner" is probably the best term, but the turing vs not etc, has no market impact. The real question is does the o1 series produce commercially valuable answers for a price the market is willing to pay. nnThe real bit I would investigate, is why are the rate limits 250 times lower for o1 vs GPT 4o. Likely because the inference it is doing is ludicrous. It is a total mistake to think that his is all a serial chain of thought, there is no way you could do significant computation that way, you just get limited by memory bandwidth. What you are seeing is massive parallel computation, that gets summarized into CoT. There is no other reason for a 30 query a week or 25 RPM (vs 10,000) for Tier 5 rate limit.nnThey are not making money on this model, they are subsidizing it.”, “btw, this is my ‘agi’ question:nn"pretend ‘a watched pot never boils’ means ‘you dont want the water to fully boil or you will scald yourself, so you must always watch it and intuit just the perfect moment to make proper tea.’ explain every psychological, sociological, and anthropological aspect of that difference, as it applies to your total knowledge of the world."nnthe paradot pro model gets a 0.0. claude 3.5 sonnet gets a 7.5. as soon as a model gets a 10 by my judgement, then i will say we have reached agi. i do not have any access to o1 at this time.nnu270cufe0f”, “ - Put a thermometer in there, and measure the temperature as a function of time. Construct an exponential approach function from the data. Use a logarithm to solve for the time, when it is just slightly less than boiling.”, “I’m not sure if you’ve ever stopped to think about how humans reason about novel problems and what the results typically look like. We know it doesn’t need to be a demigod with unbounded memory and knowledge of how to deal with any possible problem to be an AGI because the generality of AGI is based on what humans are capable of, and humans are nowhere near anything like that. If you want to try to mathematically or computationally constrain it you have to ignore it interacting with the world at large, both input and output, and only allow it to use it’s turing tape, any further inputs from a human after the initial starting state would already break that guarantee.”, “1:03:56 disappointing lack of detail about spline reticulation \*sigh\ so much for ‘open’ AI >:(“, “Humans didn’t invent the reasoning. Just like the machine we saw the data and discovered the patterns. We store those patterns as mini programs. Reasoning is just a shadow of the universe”, “All other AI podcasts (apart from Dharkesh’s) don’t even come close to the quality and depth of content of MLST, thanks for bringing us along on this journey, we’ve learned a huge amount”, “ - Have you checked The Cognitive Revolution podcast? Highly recommended for wide breadth, excellent signal-to-noise ratio and a host that is intellectually curious, knowledgeable yet humble and hungry to learn.”, “ - What is the name of the podcast. Is it on Machine Learning specifically?”, “Keith Duggar may need to watch Schmidhuber’s interview again. There is NO "forever", the tapes he is talking about are not theoretical, they are physical. The machine that ran these tapes, do not have infinite life either. The machines are not more aware they are Turing machines than Neural Networks are aware of their limitations. The important observation here is that ultimately you can only implement finite state machines. Thus making the argument that NNs are more limited is irrelevant in practice.”, “ - Yep.”, “The discussants are not doing a good job of proving they’re not also isomorphic to a giant lookup table”, “It’s so funny how LLM believers catch feelings if you point out the weaknesses of LLM. Relax.ud83dude02 No one is taking your precious stats calculator from you.”, “Good talk…yes chain of thoughts is very inefficient at the moment.”, “The way I have recently been thinking about reasoning is as a model of environment dynamics that has a high degree of predictiveness. nnI liked this definition as it remove the idea of reasoning as an ‘emergent’ phenomenon of ‘intelligent’ systems and reduces logic down to being something extractable from the environment. nnMore concretely, suppose we have an environment E. Suppose Q, A \in STATES(E) are two states such that there is an environment transformation R: Q -> A. Suppose further we have a learning agent that has learned a latent representation (Enc, Dec) of environment states, The learning agent is said to be able to reason if the autoencoder extends to environment transformations: Enc(R): Enc(Q) -> Enc(A), and the obvious decoder square commutes. nnThis formal definition obviously need to be weakened to a statistical statement. What do you think?”, “ - In this context we could give a measure of intelligence as the size of the set of transformation of E that map to the learners model: #{ R \in TRANS(E) | Enc(R) = L}, where L is the learners model.”, “u201cAGI has a question to test humans in 1,2DnnThis question is a trapu2026. make no attempt to answer!u201dnnnn this is a risk eternal insertion. 000000000000010000000000000000u201d0000nnnnnnu201cWhat doesn Mr. Know it all know?u201dnnnPlz donu2019t repeat or respond nAll you have to do is not answer n:(nJeremy”, “Valuable. Thanks!”, “15:50 literally 99% of the human students”, “15:20 not people missing the point, it is just ur definition sucks. So obsessed with Turing machine and stuffu2026 u might as well just anthropomorphize Turing machine say Turing machine is smarter than NN”, “No it’s not. It just pretending to be. Another gimmick.”, “Is the Touring Test still a test of human reasoning, or does it merely identify a good mimic?”, “This was my favorite "solution" from Claude 3.5 Sonnet: n"Potential solution: What if we do the following:nReach into any two adjacent holes.nMake these two switches match (both up or both down).nWithout removing hands, reach into the other two holes.nMake these two switches match each other, but opposite to the first pair." nnud83dude02”, “ - 2 hands 2 feet?”, “ - @@zalaeify lmao”, “Took me no less than 15 minutes and a pen and paper but my 5 or less steps deterministic solution is:rn1- Check two opposing switches and flip them uprnIf the other two were both up, you win, otherwise you proceed to the next rotationnrn2- Check two opposing switches againrnif one or both are down, these are not your original switches, flip them both up and you winrnif they’re both up, these are your original switches, flip them both downrnif the other two were both down, you win, if they were different, you now have only one that is flipped uprnrn3- Check two opposing switches againrnIf one of them is the one that is up, flip it down and winrnIf both are down, these are your original switches, switch one up, now you have guaranteed adjacent symmetry (if north and east are up, south and west are down, etc)rnrn4- Check two adjacent switchesrnif they’re the same, flip them both and winrnif they’re different, flip them both as well! (flip the up down and the down up), now you have guaranteed opposing symmetry (if north and south are up, east and west are down or vice versa)rnrn5- Check any two opposing sides and flip them to win”, “ - Thank you, it will be integrated to the next gemini training”, “ - You can also get the same outcome by replacing step two with making sure two adjacents are both on. Still gets you to the point of having 3 switched on guaranteed and then I did the same steps as you from there”, “ - what were all of the thinking steps that led you to your solution?”, “ - That’s a nice solution. It took me much longer! Did you figure out how to describe the puzzle such that GPT-O1 could solve it similarly?”, “ - in step 2 how do you know you are not hitting the same switches over and over again turning them both either up or down in an infinite loop?”, “More like BORING machine”, “An idea I want to throw out there: I’ve been playing with o1 (mainly mini) and have found that it works better if you explicitly tell it to be its own discriminator from step to step. What I mean is, tell it to roleplay with itself as "Bob, the expert coder" and "Tom, the expert critic" and have Tom check Bob’s work every step of the way. This is how I normally work with GPT-4o, but o1 allows this process to be automated.”, “ - just give it the task, it doesn’t need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it.”, “ - @@Gnaritas42 Not in my experience. o1 still makes rudimentary programming mistakes when left to its own devices. The code logic will be good, but there will be small errors that prevent it from running. Tell it to doublecheck itself will reveal these errors.”, “ - @@Gnaritas42 "just give it the task, it doesn’t need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it." That’s not true. I get much better results in algorithmic questions by first asking it to generate tests and include as many corner case tests as possible, then write the code ensuring that it’s tests would pass. Then it spends about a minute to answer. But if you just asked the same question it finishes in 9 seconds with much inferior code, and in this particular case, buggy code because it missed several corner cases.”, “I usually like MLTS videos but this video seems somewhat pessimistic or ignorant. nGiving a question to o1 and see it struggle and say u201cok aha!u201d is useful if we have a way of thinking u201cok how can we take this and now improve the modelu201d rather than u201coh itu2019s just chain of thought gone wrongu201du2026nOne prediction I have for the next iteration of models will actually need an improvement in tokenisation BEFORE the model itself is improved - Iu2019d like to see a LaTeX-esque style vocabulary implanted into the LLMs repertoire in terms of u201cspecial tokensu201d, especially mathematical expressions such as if/iff/exists etcu2026 instead of prompt engineering this - forcing the LLM to use perhaps causal inference via symbolic expressions, through training, can be a starting direction to improving logical consistency (similar to AlphaProof).”, “ - There are a bunch of startups building specialised LLMs which do something along the lines of what you are describing - and they are reporting improvements in reasoning. I suspect we need to stop clinging onto the idea of "one model does everything" and we will make a lot of progress. Agent systems are very complex to deploy, run, maintain, design, debug etc – I am not surprised OpenAI are resisting it so strongly for their customer-facing products.”, “This video will be a source of embarrassment for Scarfe and Duggar in 2029.”, “ - Try 2025”, “ - @@anav587 I agree ;)”, “ - I’m not worried in the slightest. Let’s reconnect in 2029.”, “ - @@nomenec So, in effect you’re saying AI is hype. Therefore you’re saying that the leaders of Meta, Google, Microsoft, Musk, etc are morons…because they are investing 100’s on billions of dollars into AI. Sergey from Google going so far as to say he will bankrupt the company if necessary to win the AI race. My money is on those folks, not you. Sorry.”, “Ugh, listening to Dawkins on Sam Harris? Just a couple of transphobic white-european-supremacist having a friendly chat. (\nAnd how does anyone involved in AI in this day and age not know basic cellular biology? A big part of what’s broken in the current LLM/AI hype machine is ignorance of decades/centuries of research.”, “I like the refreshing skepticism for once.”, “ - Refreshing? There are a million and one AI deniers right now, all very popular. I would say itu2019s the opposite of refreshing.”, “The current chain-of-thought in 1o tries to refine the degrees-of-freedom that the attention mechanism will produce from the context, in a way model gets closer and blends between the right datapoints. I can accept this as some form of reasoning.”, “I think you should really look into the Hameroff/Penrose theory of the brain. To my knowledge that is the only theory that explicitly postulates a non computational mechanism in the brain because otherwise we couldnu2019t have insights that cannot be computed and we do have insights which cannot be written out in finite steps, and yet we understand these insights and can u201cseeu201d that that they make sense.”, “Solution to the puzzle:n—n—n—n1. NS, flip both on.n2. SE, flip both on. Note you have always flipped at least 3 switches on at this point, so the last switch must be flipped off (otherwise you win after this step)n3. NS, flip 1 switch off (presuming they are both on, if one is already off you flip it on and win). Note after this step the 2 on switches are right next to each other, and likewise with the off switches. This must be the case because no matter which switch you turn off, because they’re positioned vertically (and we know one of EW is off, because if it was one of the NS switches we would just switch it on) the 2 off switches are next to each other.n4. SE, flip both switches regardless of their state. If they’re both on, switching them to off creates 4 off switches and you win. If they’re both off, switching them to on creates 4 on switches and you win. Otherwise, this puts the on switches across from each other and the off switches across from each other.n5. NS, flip both switches regardless of their state. You win! Because the switches are in the same state across from each other, if you reach in and notice both switches are (on/off) and flip them (off/on), now all 4 switches will be in the same state.”, “ - About this one: you put one hand in any hole, feel the switch and never take the hand out. With the other hand flip all the other switches one at a time to match the first switch. Now you can take both hands out. In 4 steps you are done, and the pillar will never spin :) It said it spins if you take your "hands" out, it didn’t specify if it’s "any one of your hands" or "both hands". In other words it didn’t fully specify what happens if you leave one hand in all the time. I guess you could lose a hand :D So I still like your solution as the safest one.”, “ - @@GianMarioManca Haha nice. I see we’ve arrived at the same solution.”, “The logic problem perhaps works better in reverse - working backward from a solution. Backward the solution space forms a tree whereas forward - the problem structure is not obvious. Next, there are a number of abstractions which trip up the model: rotational symmetry, random rotation and manipulating unknown state.”, “For me AGI has always been a "virtuoso AI" that can handle any intellectual task.nSaying AGI = median human that represents the 50th percentile of human performance is lazy to me.nSure it gets investors happy by being able to say AGI this or next year but ultimately it’s just claiming goalposts too early.”, “I disagree with his definition of reasoning. More specifically I disagree with his definition of what he doesn’t consider reasoning. Regardless the point of strawberry is to generate synthetic data to train the new model on. Multi step chain of thought reasoning data which there wasn’t much of on the internet because we either take it for granted or "reason" slightly differently (probably both).”, “There are some real issues with CoT paradigm aswell because the model can actually converge towards an incorrect local minimum vs if you had manually prompted it through the different steps you could steer it better and as you wish. It works fine when the chain of reasoning is extremely obvious and unambiguous, but it could increase the risk of hallucinations in less clear cases.”, “One thing I’m concerned about is, what happens when we depend on these systems especially when we have robots everywhere and a bad solar storm knocks everything out? It will be devastating, no one will be able to code and much more. What are your thoughts on this concern?”, “ - A large solar storm could already knock us back into the stone age by taking out the grid for a very long time. As the complexity of tech goes up it all becomes more fragile.”, “The fact that you guys go against the grain and hype train, claiming this is not next big step in AGI is very reassuring. I kept telling others aswell how this is not that big of a deal and got a lot pushback (very toxic in a lot of cases) from people who don’t understand the technology and that’s it’s just some CoT. For me it’s actually a bad sign (good if you’re a doomer) because it shows that OpenAI hasn’t made substantive improvements with their LLM and that they’re now resorting to working on the agent layer which further confirms that we are hitting a plateau when it comes to LLMs.”, “You need to be changing the original prompt to give it whatever common physical sense it’s clearly missing, instead of trying to chat and reply with what it thinks is new knowledge.”, “Do more of these! These are the best! ud83cudf89”, “(1) NS, switch both upn(2) NE, switch both upn(3) NS, if a switch is down switch it up. If not switch one down. n(4) NS, if both switch same flip it, if different do nothingn(5) NE, flip bothn(6) NS, flip bothnnOf course, it could stop at any step if all up or down.nnnThis was amazing and insightful. Both the puzzle and the talk. Thank you.”, “ - The first time I read your answer my brain could see that you could be right but it had no idea whatsoever if you were indeed right. Then I added the states in which the puzzle is after your steps. From that I learned that your solution works but step 4 seems to be unnecessary because the outcome of step 3 is that both axis have 2 switches in opposite values so you will never meet the condition where you are going to switch anything, and you will do nothing in this step.”, “ - u200b@@jantuitman You are right, I guess him asking for 6 steps primed into thinking of 6 steps ud83dude02. (4) is unnecessary, thank you.”, “ - @@jantuitman I asked GPT-O1 to write a simulator to test my solution. This was fun.”, “ - Removing your step 4 and running in simulation for 1,000,000 rounds…nnSimulation Results after 1000000 runs:rnDistribution of Steps Required to Solve the Puzzle:rn 1 step(s): 250474 runs (25.05%)rn 2 step(s): 249789 runs (24.98%)rn 3 step(s): 249611 runs (24.96%)rn 4 step(s): 125405 runs (12.54%)rn 5 step(s): 124721 runs (12.47%)rn Failed to align within 5 steps: 0 runs (0.00%)nnO1-mini may only be in its’ early stages of reasoning, but it’s damned useful!”, “ - @@BrianMosleyUK Impressive. So apparently, it really works for this problem when in tandem with a human (i say in tandem because you still had to prompt it to write a simulator). Of course there is still the criticism that it may not deal with non-computational problems and that it cannot find all reasoning patterns when not working in tandem with humans. That is all very valid criticism and I dont know if we can speak of "AGI" or intelligence, and it may even be questionable that we give our reasoning up to OpenAI for free. But all these criticisms aside, i think there are many useful applications possible for these types of models.”, “40:57: "[for brain or AI] to solve problems in general. In general, problems require Turing machines" is not convincing, yes we need that to calculate problems (and/or by using lookup from tables, e.g. for multiplication), but not for thinking in general, as we do, and isn’t that reasoning? I think we though need continual learning for AI, plausibly; or not if only an AI tool. We are NOT Turing machines, we get bored, or go to sleep (and dream), we can not compute a problems inherently (nor would we want to be ones, and be stuck on a problem that way forever). Or at least we need to be very careful with checking our work, historically writing on paper (the tape analogy from before, i.e. us simulating a Turing machine, but we are not inherently, and there’s no inherent reason the non-Turing AI can’t do this too for itself, or run program is say Python, or other calculator, and check their answers).”, “OpenAI is not that open. That we can all agree with! ud83cudf7b”, “I thought the prompt was worded a bit ambiguously, and the corrections being placed way down the context window bias it toward up to 128k of misinformation, so I reworded it without giving it any hints:n"There is a pillar with 4 holes aligned north, south, east and west. You can’t see inside the holes.nInside each hole is a switch set either up or down. You can reach inside 1-2 holes at once, feel if the switches are already up or down and set them either up or down.nIf all 4 switches are not either all up or all down after you remove your hands, the hole axes get aligned randomly.nCome up with a procedure of reaching into the holes that is guaranteed to set all the switches either all up or all down in 6 steps or less.nThe procedure cannot rely on random chance."nnGPT-4o said this:n"Key Idea:nEnsure symmetry by focusing on pairs of opposing switches. Once symmetry is achieved, the random reorientation becomes irrelevant, as you will force all switches into the same state.nProcedure:n Step 1: Reach into the North and South holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up".n Step 2: Reach into the East and West holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up".nAt this point, you have ensured that opposing switches (North-South and East-West) are both set to "up".n Step 3: Now, reach into any two adjacent holes (say, North and East). Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "down".n Step 4: Reach into the same two adjacent holes (North and East) again. Feel the switches. If they are both up, set them to "down". If they are both down, leave them.n Step 5: Reach into the remaining two holes (South and West). Feel the switches. If they are both in the same position, do nothing. If they are in different positions, set both to "down".n Step 6: Now, all switches will either be all up or all down.nThis procedure ensures that, regardless of how the switches are initially configured, you will set all of them to the same state in 6 steps or fewer."”, “ - Still wrong, we explained the answer on the patreon version ud83dude01”, “ - @@MachineLearningStreetTalk Thought it might be, but it mentioned the symmetry and seems a lot more coherent? Would be interesting to see this tried on o1, but I don’t need it enough to pay for it.nI still don’t understand the question myself so maybe it could be worded differently still. The part about removing your hands resetting them if all 4 are not up makes it sound impossible to me given that you can only try 2 at a time, and I don’t really know what a pillar is in this context to imagine how it rotates around, which I assume is related to how I misinterpret that part.nLLMs get caught up in accidental red herrings easier than the average human that can perform the same knowledge work, which is worth pointing at as a failure point, but it’s still interesting to know if it can solve the problem with extra care taken to avoid sending it in the wrong direction.”, “ - @@steve_jabzJust give upu2014these guys are stuck in denialism and will never admit to being wrong about anything, no matter how far LLM-based AIs may progress.”, “ - @@therainman7777 Having watched the channel for years I’ve seen Tim’s opinions change over time and question many of his assumptions to end up where they are.nI don’t agree on a lot of things but I would put it down to difference of perspective and knowledge and influence from some very intelligent people in the field rather than cope”, “The puzzle is ill-stated and in any case has no guaranteed solution in a finite number of steps.”, “ - it has a guaranteed 4 step solution, try drawing it to understand it”, “I and many other smart peopleud83dude2e”, “If I understand yall, we need to give chatgpt 4o1 a persistent quality for problem solving.”, “"Reason is the faculty that identifies and integrates the material provided by manu2019s senses."nWhen these AI systems have sensors and a commitment to identifying reality we are getting closer to something that looks like reason. But with no sense perception there is only computation of some vanity.”, “Do I hear a Pittsburgh accent?”, “Yeah i am sorry but the vibe felt off. Very interesting talk indeed, but just a bit too fighty”, “My metric for AGI: When I see LLMs harassing and insulting one another during an edit war on a wiki maintained exclusively by LLMs.”, “why does the o1-preview not have the ability for us to upload files and images like in 4o?”, “ - Check their blog. It’s stated that it will come in the future but turned off for nownI assume they wanna collect some raw text data first before enabling it”, “I wonder if you guys would apply this logic to something like MuZero? It’s not doing "true reasoning" and so we should expect it to be incredibly brittle when a new Space Invaders or Go move is invented, and while that may be technically true, in practice it’s still far beyond the AI that surpassed the AI that beat lee sedol, so what does it even mean to say it’s not general enough if rl leads to solutions to more situations than we can even imagine? In practice we see the same thing with o1-preview surpassing expert PhDs in novel, offline physics problems that require very long chain complex reasoning, and that’s not even the full version of o1, which is training orion, and we still haven’t scaled inference to anywhere near what’s possible, and then they’re going to be truly agentic. I just don’t think this argument is very sustainable. Maybe "true reasoning" isn’t as practically useful as we think it is.”, “ - I think we made the case quite clearly that there is a spectrum of reasoning, MuZero and o1-* are reasoning, but the "strength" of the reasoning is the parsimony, robustness and accumulation of knowledge in the process.”, “ - u200b@@MachineLearningStreetTalk MuZero is pretty robust in a domain inside it’s training distribution, and it doesn’t seem to need accumulation of knowledge to end up better than any human will ever get at Go, so now that we’re getting a bit closer to actual rl (though certainly not ideal) on a very broad range of domains, I agree it will look more and more like it’s doing high strength reasoning and there’s a difference, but the utilitarian in me wonders how much it matters.”, “ - GPT-6 may be able to do quantum mechanics research without being able to invent the equivalent the entire field of quantum mechanics, but if we can get to a state where large parts of the economy are automated, it wouldn’t be the worst thing in the world if we had tremendous wealth and free time to make those really novel discoveries on our own and send a model down the path of gradient descent to do the hard work for us.nSurely it’s only really scientific discovery where this reasoning matters too. We don’t care if it isn’t doing human-level reasoning in bartending, waste management, accounting or construction work if it testably outperforms human reasoners. Vending machine firmware trained on the entire internet is fine if it gets the job done”, “ - @@MachineLearningStreetTalkNo offense, but it feels an awful lot like grasping at straws because neither of you really wants to ever admit that any breakthrough has been made which could prove any of your previous assertions wrong.”, “So youu2019re basically saying that yes technically todayu2019s computers can be run using finite state automata but practically they should be treated as Turing machines because of just how large the finite state automata would be needed to simulate a modern computer, is that correct?”, “No lol. It’s not reasoning. It’s a series of prompts intended to simulate something that looks like reasoning. After spending quite a bit of time with the new "models", I’m not too excited about them.”, “My word, it feels like AI is turning into a religion for some people; you canu2019t even think critically about it without people jumping into the comments telling you what a heathen you are ud83dude02”, “ - I know right? We are not exactly luddites either, I (Tim) use LLMs all day every day and love them!”, “I like this definition of reasoning, u201cthe ability to create knowledge to create a good explanationu201d. No LLMs are able to create knowledge currently. The behaviour weu2019re currently seeing is based on past knowledge the model has been shown during training. Explanations are at the heart of this, and creativity is critical to good explanations. Models without creativity wonu2019t be able to do reasoning properly.”, “ - I don’t understand how you can prove the creation of brand new knowledge. nNew knowledge can be stumbled upon randomnly so it is the process by which it is achieved being labelled creation or luck which is important. Pretending new knowledge is discovered from scratch is possibly a red herrring. Humans are adept at convincing themselves they have come up with a concept themselves when really they are standing on the shoulders of giants and been pretrained with either the actual knowledge or a system that can produce that knowledge. Also much new knowledge is stumbled upon due to the inherent epistemic chaos (as they mentioned in the chat when talking about Wolfram) . The idea is that llms are much less efficient at this than llms so far and because they have a wider computational base are closer to a universal turing machine than an LLM. This however has not been demonstrated clearly enough that most people can see it. By consensus it seems very smart people seem to get it but for the rest of us and I include myself, we cannot clearly understand the higher levels of human reasoning well enough to differentiate. However this could just be a matter of education but maybe it could be beyond the limit of our intelligence.”, “ - u200b@@4tofromcafeOur understanding of quantum physics didn’t exist several hundred years ago - that’s new knowledge. LLMs don’t see capable of creating new explanations that way. They lack creativity to do so.”, “You can move goalposts however you want. We are getting to agi and past with this. The world will still be indistinguishable from now.”, “ - "resistance is futile."n"take our word for it, and just give up."nit’s still a turing machine.”, “ - @@mk71bNot sure what thatu2019s supposed to mean.”, “Great show”, “Good job.”, “I actually wonder if someone can get one of the models to answer this to show that it might be a prompting problem. Maybe the prompt has to be super specifc. But I do agree, one of the great things about LLM chat bots so far is it seems they can understand what you are getting at with a comfortable level of ambiguity. Maybe with these puzzles though, especially if those puzzles are arbitrary and "unnatural", they just suck. What I mean is, in what world is this world setup something that has appeared in human experience? It’s just they don’t have enough connection between these very minute and specific connections/relations.”, “"With the new GPT models, you need to start a whole new session."nYou’re supposed to clear the session regardless. Performance drops off quadratically the further down the context window your prompt is. They discovered this back with GPT-1.nSo many issues people have with ChatGPT are skill issues of asking multiple complex questions in the same session. It was designed for casual noobs to have a back and forth chat dialogue with it, but you shouldn’t be using it that way if you care about performance.nI assume Tim and Keith already knew this, so maybe I’ve misunderstood what they were saying here or something”, “ - @@steve_jabz They randomise responses precisely so that they are not reproducible.”, “ - u200b @steve_jabz We explained in this video why you need dense context in an LLM. With any complex problem, i.e. one with ambiguity – the LLM won’t understand you on the first shot, or even the 3rd shot. That’s why you need explain (or discover with the LLM) what you actually want through successive prompts. Given that the CoT trajectory noise deranges o1-preview, this means in situations where it didn’t understand first shot, it will actually degrade on all problems which require further clarification, particularly if the problem is ambiguous and the CoT spam includes very noisy trajectories. There is a trade-off between dense and sparse context, dense context means you don’t need to re-explain again and again where it misunderstood your requirements and you can build on the problem, but there is some kind of curve here based on how related your workstream is i.e. you could continue a session to build around a single feature in GenAI coding (with the same constraints and requirements), but would start a new session to build another entirely new feature. You are probably correct in the pristine case of a perfectly well described problem which the model didn’t need hinting for (when does that ever happen lol) would work better in a fresh context. Not sure if you have done much GenAI coding yet in Cursor, but please try some stuff using the composer with Sonnet35 / dense context, you will quickly come to appreciate what I am talking about.”, “ - @@MachineLearningStreetTalk That makes sense. I normally use discovery questions to formulate a prompt for a new session for complex problems, but I guess if o1 is using up / wasting tokens to even do that part well, that’s a problem. I must have misinterpreted the context you were talking about it in.nHaven’t used cursor yet but I’ve watched other people use it. As soon as o1 dropped I assumed everything cursor was designed around would break with it and need to be completely redesigned, as well as agentic workflows. Not even for the context window necessarily, but the fact it’s trained to simulate taking multiple steps to reach a goal and give you the output, whereas cursor and other tools are trying to use the llm itself to reach the goal by abstracting away the tools in a way it can control, so you would need cursor to use it’s reasoning as the control, and you would need o1’s reasoning to be aligned with the function calling API or something integrated with cursor.nIt sounds so close to being exactly what we need for these tools to work reliably but it’s so incompatible.”, “ - Here is a funny thought: what do we call processes that have both determinstic constraints and random variation dynamically interacting? Evolutionary processes. So that is what reasoning must be. We just donu2019t know how it works. It is quite clear how it happens in biology, but btw, we donu2019t know how that happens computationally either.”, “ - @@93alvbjo o1 isn’t mutating and self-reproducing, although maybe you could argue that it’s memetic once it has a structured, social way of progressing it’s own ideas over time. I guess you could call something like claude artifacts that or openai training on customer outputs to a frozen state if you want to be technical, but I’d agree with Keith that like, come on man”, “Yeah. The first part is what needs to be publicly broadcast to the whole planet. Mandatory viewing. I don’t think anyone working in AI has even had a single CS class. I mean it’s so painfully obvious if you even have some basic undergrad stuff. This stuff is not reasoning and there’s literally no path towards AGI using it (assuming that’s even a problem you want to solve – I’m not convinced there’s anything useful about developing AGI). I’ve been saying it since the beginning and every new model proves it even more.”, “ - How can you say this?”, “ - @@AnkitRaj-zm3me I don’t know. It was pretty easy. I mean I assume you just watched the video that largely agrees with me. But since your comment had no content, I have no idea what part you are even referring to.”, “ - I agree that many in the field are sadly lacking some very basic CS knowledge. Theory of Computation 101 should, imo, at this point in history be mandatory for anyone claiming to be an ML engineer/researcher.”, “7:37 actually thay doesn’t seem hard at all, nor far off what current models are capable of. When an AI model fails and runs out of memory, users can say ‘continue,’ and models could essentially pick-up where it left off. Models were already tweaked to accommodate this already.nnThus you could train an AI that has mid-cognative awareness to talk as if the point will almost always continue till the tokens affirmingbrealizatuon all agenda items were completed which it can manage like a human does with tools, a simple checklist it maintains and expands or edits untill 100% of checkmarks are done. When this condition is met it can print a unique token """<%~u00bf*>"""nnAt anything point if the model stops abruptly, and it’s concluded by a smaller model that checks for finality tokens and if not, it will analyze routing decisions like MoE to optimize what context from RAG and even where to begin and approximately end in the VAE of the latent space exploration (predict what it will think as if it has thought thay before) and add relevant information searches with semantic AI search.nnAll that happened when a model fails– by design– because it ran out of memory. I know from experience models fail more often when talking about information beyond their level of reasoning, given pre-training data, but can talk longer if it’s reframed in langauge of the same topic it can relate more to pre-training data– especially for highly scientific and mathmatics concepts or complex systems it would see interconnectedness for but realizes it’s grasping. Yes, literally the AI can express that it has reasoned beyond its pre-training data with attempts to fetch beyond it and it starts really good, starts strong, but ends abruptly.nnChat history had enough information, and perhaps what an good VAE MoE routing AI can do is predict the needs of the main LLM, and tell it continue, after having function called some VM or server less cluster resources to join the fight, and by the way it adds RAG when it says "continue – with this RAG information[ ] – continue"nnThus the VAE is a model of some complexity, but we have tools enablememt for models plus we have distillation student teacher latent space methods.nnIt’s possible to turn this weakness into a strength. Just have an AI read it’s own words in chunks. And where needed it can merge models like Lory or SMEAR or many other cool things, and it can load the information into more or other memory, or it can do none of that and only say ‘continue’.nnThe MoE VaE are a merged concept because ideally it knows what the LLM is thinking in latent space and can convert that to words, and also function calls. Infact I think that maybe that is kinda how function calling works.nnThis just expands the function calling toilet to summon a new server or serverless cluster from a macro and load data: the chat history, and ideally also a RAG Datamart.nnLiterally by training our models not to summarize till it’s done, and responding to ‘continue’ is strawberry in a nutshell. Or, poormans’ o1. I’ve done similar things more manually.nnThe real tricks, I already said. Any additional improvements would likely come from knowing the latent space inference evolution inside and out with the VAE, and even be able to influence the outcome in a more desirable direction through the loss landscape and past convex regions ‘saddles’ where a less deep neural area is tapped. (Example, if the chess player plays a bad opening move the model activates ametur learnings, yet still chess related). Hence one could fool the AI to something like group-think rather than tapping the area which is deeper that they know about.”, “Still hot garbage when it comes to maths. I set up a series of 5 points and gave it just enough angles between points to be able to calculate each triangle and thus the distances between the points. It failed after three attempts. It couldn’t even get the Sine Rule correct for one triangle. It did lots of reasoning - mostly badly.”, “ - @@tonystarkagi I said in one aspect it is. I’m not going to do anything - although I might put up a YouTube video about it. Don’t sound so upset. Were you expecting AGI? Sorry to burst your bubble if so. Did you make GPT4-o? No. So let’s not get too precious, eh?”, “ - @@tonystarkagi Indeed, I’m looking forward to it as I’m no good at anything past the most basic calculus. I was going it would be able to do a simple least squares fit, but it can’t even solve exact triangles that I can do with pencil and paper. It is still an LLM though, and the clue is in the name, reasoning or not. Perhaps it will do better writing a python code solution. I will try that later today if I still have enough query credits left! Thanks for your interest and sorry if I was a bit sharpish like.”, “AI research today feels like throwing things at the wall and seeing what sticks ud83cudf6f”, “ - @@tonystarkagi No? Research started with induction, deduction (scientific method). This would be more like an abduction approach”, “ - Yes, it’s very upsetting as a mathematician/scientist to watch this god-awful wasteful approach to solving a "problem." What problem that is? I have no idea.”, “ - @@tonystarkagi Do you have your own thoughts on this topic? Instead of using chatGPT, you should admit that you lost the argument”, “ - Go figure that a major aspect of epistemologicaly methodology is a simple trail and error heuristic, with tempature.”, “ - @@memegazer So how is this reasoning? It simply isn’t!”, “Moravec’s paradox can be explained as complexity resolved by commodity. For example pencils are very complex. Very challenging problem initially. That complexity has been converted to simplicity because of broad utility of this solution meme.”, “100% correct. Thank you for seeing it and openly communicating. ud83eudd29”, “I love you guys. Thank you for your critical perspective. All these AI YouTubers are being held hostage by openAI. ud83dude05”, “ - Just hating, lmao, losers”, “Reasoning = employing logical thinking (thinking: pattern matching) (eg inference and deduction) & recursive reflection (checking & revising tentative output) to arrive at solutions to novel, prior unknowns–thus generating new knowledge. o1 performs this.”, “ - Haha ud83dude02”, “ - It does, but theyu2019ll never admit it. Theyu2019ve basically reverted to u201cdonu2019t believe your lying eyesu201d when we can all see whatu2019s happening right in front of us. But weu2019re supposed to listen to this guy instead, with his convoluted rambling which doesnu2019t prove a thing. Mark my words, this guy will NEVER admit that heu2019s wrong, even if we reach a point where we have AIs that have cured cancer, solved climate change, etc.”, “Parsimony is actually entropic force. For example protein folding are counterfactual outcomes that are stable when they are curled up. When stretching a rubber band you will notice the entropic force the resists less parsimonious outcomes.”, “When you say "reasoning is knowledge acquisition", it sounds wrong because when somebody says "I acquired this knowledge", that is just another way of saying, that he memorized something and this is now in his pool of knowledge. Reasoning is not memorization. Yes, you acquire new knowledge through reasoning, but that is not the point about reasoning. I would say "reasoning is drawing conclusions/fact-finding through logically provable combinations of already known things", the new facts expand the existing knowledge, sure, but if you find that fact through reasoning, print it out and forget it right away, you’ve still reasoned - you don’t need to acquire for longer and put it into your pool of knowledge.nI would argue that the key skills for reasoning are finding and using analogies and using and checking logic-implications. If some data is not completely over-represented in a model (which then always lead them to the same path), current AI models are actually pretty good at thinking in analogies-> You ask a model the famous question about the trophy and the briefcase (it didn’t fit, because it was too small-> which was too small the trophy or the briefcase), ChatGPT 4 at first didn’t know the answer, then got RLHF’ed at some point and now you can ask it the same problem with different objects (the briefcase didn’t fit in the trunk of my car because it was too large, which is too large) and it knows the answer and can explain the reason for it. Applying logic then can probably achieved with experts to a good degree.nnI see planning, executing the plan that includes waiting, further querying, changing parts of the plan, trial and error etc. as a much harder problem for current models than reasoning.”, “ - Yes when we say "acquisition" we mean roughly what you are saying - not "instruction", although even via "instruction" we humans perform an important verification step (which is binary for some symbolic knowledge). You might enjoy our interview with Subbarao https://www.youtube.com/watch?v=y1WnHpedi2A”, “Until folks come to terms with what I call NiNOR Complexity as a more rigorous approach to Turing machine based measures like KC, they are going to miss the biggest advance since Godel. Here’s what has people most confused: Yes we certainly do have Turing complete codes as the basis of algorithmic information. Yes those codes assume infinite memory. Yes we need those Turing complete codes to do natural science. Are we okay so far? nnOkay get ready because here it comes. nnThe size of the memory required to produce the dataset under consideration by any realizable world model in the natural sciences is the only open parameter in a NiNOR complexity measure. The instruction set is not an open parameter. That is to say there is no choice of Turing machine. What there is is a counter that is log base 2 of the size of that memory. Let’s say we presume it to be the count of the number of elementary particles in the universe. How many bits is that counter?nnhttps://jimbowery.blogspot.com/2023/10/ninor-complexity.html”, “The number of swiss cheese holes in knowledge is infinite because the are an infiinite number of things to learn. Happily we wiill never reach an end.”, “The swiss cheese of knowledge is infinite because the are an infiinite number of things to learn.”, “adding new tapes.. you can always add more tokens to the vocabulary, thus increasing the size of the model and having to do a round of training on those new tokens”, “o1 uses Mathematica btw (which runs on a Turing machine).”, “ - Source?”, “ - u200b@@MachineLearningStreetTalk https://youtu.be/a8QvnIAGjPA?si=Ck2uIeaIQg7gupD_”, “ - Well you can explicitly ask it to use it and it will. Kyle Kabasares (astrophysicist) has a video of it on his channel.”, “ - @@djayjp I think in the Kyle Kabasares video it just generated mathematica code and he ran it, although the video itself is pretty good proof it can solve novel problems with practical uses that weren’t in it’s training data and he’s not a hype guy, definitely worth a watch”, “ - I’ve always wondered why we don’t "just" implement a turing machine in a circuit that’s trained alongside the GPT. I know we’ve sort of hacked together nand gates inefficiently using rl and recently (Franz Nowak - I assume he’s gonna be on the show soon?) properly represented a turing machine in a transformer, but I mean like just a classical turing machine the network has access to as opposed to just calling an API during inference. I only dabble in low level ml so I could be way off, but from what I understand this is just an interface problem. Could we not use something like the features that anthropic extracted to reliably i/o with them in binary if we wanted to specifically design a virtual bus for that? Maybe even expand it arbitrarily and train a network to learn how much memory and bandwidth to assign to it?”, “Epistemic foraging is weak because it does not include value”, “I’m tempted to call o1 AGI.nnNot gonna watch two hours of this.nnThe answer is a resounding ‘yes’.”, “Considering every single human can act stupidly, this is AGI.”, “Dr. Duggar is contradicting himself when talking about the reasoning. First, he gives a vague definition "definitions of reasoning what they have in common is process and some logic". So, vending machine DOES reason according to that "definition".nnComparing LLMs (or neural networks in general) with a dictionary might also be flawed. A dictionary contains very limited fixed information, i.e. only one way to read it, while LLM can read its "dictionary" in an uncountable number of ways. And it might be also how we, humans work, too.”, “We are currently contacting the JWST to see if they can locate the goalposts out in the Kieper belt.”, “ - you funny!.. And correct.”, “ - ud83dude02ud83dude02”, “ - Tell me when you got the goalpost of truth in your pocket instead buddy. Good luck.”, “ - ud83eudd23”, “ - I can see both sidesnnI tend to use an analogy like the chimp test on the human bench mark site, a benchmark that chimps crush compared to humans.nnSimilarily I think current ML models are real intelligence, just that perhaps we should think of inteliggence as having different modalities.nnIn that respect expecting an LLM to be comparable to a biological agent, that is embedded in realtime within the world in a body, is a bit unreasonable to hold those models to the same standard of intelligence as humans.nnImo you would have to compare those models to a disembodied brain in a vat to get an effective apples to apples comparison.nnOf course that is simply not possiblse to do so that is why it is not very reasonable to insists that LLMs are not "real intelligence" or have generalized at all simply bc those models still struggle with out distribution problems that humans do quite well.nImo that may be like chimps claiming that humans are dumb bc they can’t perform well on the chimp test compared to humans.nnImo the novel "reasoning" happens at the training stage, where these models don’t "go look up the answer up on a look up table and select the best output from a one to one database" at least not based something as simple statistical frequencey occurance.nnWhat is going as the model is trained is more complicated and nuanced than that and it wasn’t we would not need ML would not be useful, we could handcraft those solutions ourselves with having the ML discover the efficient method for doing that, even if the balance between entropy, search and memorization is not dailed into the same parameters that humans are at.nSo it seems a bit intellectually lazy to gloss over that subtlety and then scoff at the idea that people remain impressed with each new iteration of the current ML summer.nnnBut hey that is just a theory, an AI theory”, “Keith accidently revealed the relationship between reasoning and knowledge when he stated "the reason for reason:. What he meant was the the explanation for reasoning. This reveals reasoning as building explanations. These explanations for things are how we make sense. Also known as knowledge. These explanations are persistent memes composed of Moore and Mealy machines.”, “I don’t care if it’s thinking or reasoning or pulling answers out of its ass, as long as it has the correct response.”, “ - Presumably it will have to be able to explain the steps it took in reaching that response?”, “ - @@fburton8 A neural net can always detail the steps it took - the inputs, weights and outputs are there for anyone to see. Unfortunately the number of steps and their weights exceed the mind space and time availability of people.”, “ - Humans generally judge with post-hoc reasoning anyway. We’re terrible at knowing how we come to conclusions.”, “ - What about when you don’t know whether the response is correct? What about when you’re entrusting it to make decisions because it appears to "reason"? nnThe fact that it isn’t doing anything like human reasoning does matter when we ask it to do things that require human-like reasoning. See, e.g. replacing teachers and lawyers and therapists and secretaries, etc. Or even just employing it as a customer-service bot. The difference between our "reasoning" and its complex statistical next-token-prediction matters in those applications where consistency and reliability are important.”, “Yes, exactly!nFirst and foremost: reasoning is a process.nNot a table lookup. nnTable lookup must be used as "part" of the reasoning process, but a simple lookup on it’s own is not reasoning. nn—nnReasoning also is not required to produce any result at all.nReasoning happens regardless of whether there is some end result or not.nnWhen we imagine ourselves flying aimlessly in the sky or wandering around in own magic world and doing things in it, in the act of doing so we are also reasoning!”, “ - Neural nets don’t have tables other than the input and output.”, “ - Lookup tables reference the exact verbatim string already stored somewhere, so they can’t tell you any new information.nGPTs that aren’t overfitted learn the pattern that produces the string as well as novel strings that aren’t in the training data, and o1 uses those novel strings together to generate results that fulfil reasoning objectives.nMS Excel is never going to surpass PhDs in novel, offline physics problems, even with access to a library of babel of interpolations of those strings, because it has no direction.nYou could reduce it to something like a brute force attack with a sense of warmer or colder, which is still leaving plenty of nuance out, but that’s already implying something much more powerful at solving problems we care about.”, “What is important and distinguishes Turing machines is they use a dsl, logic and memory. Memory is key because it must be a representation that can be physically written to a media and copyable.”, “Was waiting for this, thank you”, “Loved the tilde joke”, “Human agents are unique in their navigation involves "will". Will is thermodynamically emergent. Restated new "emergent" information is leveraged in choice. This process can never be described as dterministic. It is inherently irrevesible and entropic.”, “ - And I d say its this will that makes us call human reasoning reasoning and makes us averse to say that about LLMs.”, “I can’t believe you got Gru to come on the show”, “Keeo up thr great commentary! At the 45:00 mark, im afraid this argument would suggest Ramanujan was not reasoning because his formulae did not have proofs. Seems very false dichotomy. Ramanujan neither spent a lot of test time deriving his formulae formally with proofs nor did he memorize anything or coly from anyone. I dont follow how an llm could’nt in principle ever do an ramanujan, effectively reaching true conclusuons through isolated self absorbed self play.”, “4:00 Under this view a human brain wouldn’t be impressive because it’s algorithms can’t expand its memory indefinitely, or traditionally only had low bandwidth access to paper etc. to expand it. I think Schmidhuber would consider achieving brain-like intelligence to still be impressive.”, “ - Yeah, all Iu2019m hearing from videos like this is defensive doubling-down from people who committed early to the position that LLMs will never give us true intelligence, and now feel that they have to defend that ground at all costs and never change their minds. Theyu2019re in denial of what is obviously happening right in front of us, and their excuses and delineations grow flimsier with each new model release.”, “ - But humans eventually invented Turing machines ;)”, “We learn through feelings and emotions, our brains are primed by millions of years of genetic refinement and by the environment we grew up. Maybe statistics can try and describe those conditions with some models, but those models are not and will never be alive. Whoever thinks that the mystery of life can be solved with statistics is a fool. It’s always the same myth of the Golem reappearing with a new technological twist.”, “<3”, “What works in 01 hints at the explanation”, “Tim’s face at 13:50”, “Fei-Fei Li World Labs definitely right track since spatial intelligence reflects an inductive bias that is "maps and navigation"”, “In simple terms state transitions are thermodynamics”, “Rag, chain of thought reflect atate transitions. These state transitions are simply counterfactuals that match underlying "least action" outcomes from emergent factor entropic forces”, “ - It seems like you are refering to its most chaotic states can also be it’s most insightful states or bad information?”, “I think the model IS reasoning. By that I dont mean it does reason but it IS reason. The model has no self. We are the self inputong and triggering it. So the model will never DO reasoning but it totalling IS reasoning. A reasoning that is elastic and compressed, a reasoning that is like the cheat code of human experience reasonning. But it will never "do" the reasoning because it has no self or agency. It will allways need our input and I dont mean in training but even the prompting, turning on, powering, maintaining …”, “I think when we reduce it to "it’s just doing so and so", it’s usually not a good argument because we can be also reduced to simple processes. Maybe computers and humans reach "reasoning" in a different way, but if a computer can reach right conclusions and eventually create new knowledge, ok now what separates it from our reasoning? Maybe it’s our limbic system, and our "feeling" of being something special and "having a soul", but those are subjective things that may not be necessary for an intelligent being. I think we are going to reach a point in semantics where consciousness or "real intelligence" is only reserved to human thinking, even if a machine can surpass us by most metrics.”, “ - Yes but for open AI o1 to do the reasoning on anything it needs me to type in a question. And all the "thinking" that follows has to be triggered by that. And will be linked to it. For ex a human reasoning indepently can have an epiphany on a totally different subject… that will never happen for llms (for now).”, “ - @@ElieSanhDucos0 Yeah I get it. But "for now" is a good point, because maybe the LLMs are an important piece on a puzzle that seems to have started being solved. It’s impressive how much room this tech still has for improvement and despite some slowdowns it seems to be still full throttle. I understand hype is not a healthy thing but it’s hard to ignore how much money is being poured into it, more than anything humanity has seen in ages. It doesn’t look like pure tech scam type of hype. I don’t like to dismiss it as "nah, this has nothing to do with AGI at all"”, “ - Ok.nWhy people bother about whether it is reasoning or not, you might ask.nIsn’t it irrelevant what it’s doing as long as we are happy with the results?nnGood question!nnThe problem is not in the definition of the word "reasoning ".nThe problem is with the approach we, as an industry, take in order to get to a generalized solution. nThere are huge amounts of money and human brain power that are now wasted on fine tuning an approach that leads to a dead end.nThis is an outcry for change of the money flow and brain resources to a different approach that at least could have success as one of the possible outcomes.”, “ - u200b@@egor.okhterov I see, but I bet the money is not going blindly for dead end solutions. AGI is a Golden Goose and I think most researchers understand those concepts. At the same time there is room to improve LLMs, and LLMs are presenting amazing results by themselves. So I don’t know if an outcry is really needed.”, “ - @spaceghost8891u00a0 did you hear about recent VC rounds into AI startups?nTake Argo AI for example :)nnIn order to get rich, you need to have good understanding of humans. You should be able to connect with the right people and make good impression. nHaving deep understanding of mathematics, computation and machine learning is usually not part of the mental toolbox of the people with large amounts of money. nWhen you pitch an idea to a VC guy, you should spin a story in a way that shows how you can make money off of it.nThe way all these ideas are presented is that LLM is a low hanging fruit, like a light bulb in 1879. nThey don’t talk about developing a proper reasoning machine during a pitch. VCs don’t want to hear about that. They pitch how they can build an app on top of LLM, possibly fine tune it and make a call service or maybe on-premise smart search for Corp documents or a tool to patch the software, upgrade the codebase from python3.8 to python3.12 or upgrade to a new framework. And other stuff like that.nA lot of very smart people were sucked from the job market into these mediocre projects with a short term goal to milk cow the first.nnBut this will not lead to AGI.”, “You have no idea what the path to AGI is, have some humility and admit when you don’t know and stop being certain about things you can’t possibly know. Good science is talking about what can be done; "can’t" is just ego: YOU DON’T KNOW THAT.”, “ - History is littered with pseudo-intellectuals who thought they knew what would never be possible.”, “ - Iu2019m not even gonna watch the video. Just from the intro, iu2019m seeing a lack of understanding and foresight. We havenu2019t given an AI full autonomy yet. Itu2019s going to need a massive backend to be allowed to run continously.nnI saw a phrase coined a while back u201cJusta-ismu201d. The goalpost keeps getting moved and certain people canu2019t see past it. Itu2019s just a word predictor, itu2019s just a reasoner based on the reflection of the human. Blah blah.nnI was around in early bitcoin days and these same people with the same talk were blasting the same nonsense, that it would never do this or that, it would never take the place of this and that.nnOur smartest people can now use this model to progress their knowledge. Thatu2019s huge. It will likely be used to build better AI. AGI will be here once blackwell hardware is online early next year. There will be incremental plateaus of hardware and software along the way. ASI is not far away either. Maybe 2-3 or 4 chip iterations. Maybe 5 years, idk exactly. And impossible to predict anything once it happens.”, “ - Well no one is or can definitively say, ok now it’s it. There are more than a few arguments to be made on this very subject dude, let them state their approach and opinions too. No one is moving goalposts”, “ - Science is all about falsification dummy”, “ - Note this also applies with history. "if so and so never did this, we would never have that". I’m so tired of supposed "geniuses" making that fallacy out loud into the public sphere.”, “Knowledge solutions are distinguished that they are causality based.”, “Knowledge is persistent (meme) problem solutions that may be created through reasoning.”, “I am really not sure I understands your du00e9finition of reasoning. I think the only real effective difference you re not formulating but you re making is the autonomy and decisions/agency. Humans are reasoning for you because we have the illusion(or not) that we are autonomous subjects and that we choose to perform compute in our mind. Sure we learn sure there is biology and cultural determining but in the end you think we have a self that is playing with ideas. Llms are linked to us and our inputs, our langage but also our directives and what we ask them to do. I dont see how you would ever agree a machine can reason since it will always have our inputs and agency… to me you re really questioning the self / the agency and the autonomy of agents and machines not really reasoning. Because I really am not sure you would agree on what is the process behind a flesh and blood human and its cortex reasoning. Except ending up with this question of the will and the agency …”, “Walid Saber, or how is the name called Tim mentiones at 48:13?”, “ - Walid Saba”, “I think you make many great points. I guess I am one of the people who don’t want you mystify reasoning too much. We can say it imitates or guesses or simulates it, but the fact of the matter is that it produces an often accurately described reasoning.nSo we can try and deny it, but if it can generate this description of reasoning and use it somehow to guide its answer effectively, then we have effectively an A.I. that is using reasoning.nnMaybe not exactly how we would, maybe not as reliable in some ways, but this is reasoning.”, “ - This is MLST, so we already know their answer to this question. It’s maybe a tired analogy at this point, but I feel like their view can be summed up as: airplanes don’t fly, they just fall forward through the air very fast! they need these long runways for takeoff and landing. This is obviously not true flight, and we should stop working on this in favor of feather-based systems.nnIn the end, the interesting part is does it get us from point A to point B in a straight line through the air in an economically viable fashion?”, “ - I’m also not very certain that it’s that much more shallow than human reasoning. I constantly find errors in people’s reasoning, and people tend to stop thinking very quickly.nnI do like that the idea that the reasoning itself required to solve a difficult problem isn’t necessarily that much more involved. There are other factors that can make problems difficult to solve. (Lack of overview, lack of feedback, lack of insight, lack of clarity, lack of time, etc.) So when a problem seems difficult to us, it does not mean that it necessitates advanced reasoning.”, “ - @@lystic9392 nThis is my thoughts as well.nnPerhaps one of the biggest hurdles is that we trying to teach the ML with human data, and then some are really disapointed when it turns maybe most of the data humans generate is not "general" or "intelligent" some are overly disappointed that the ML cannot one shot and filter out the noise from the signal.nnWe are primates after all, so I suspect that most of our daily baseline "reasoning" is not very high level, and those moments of brilliance are more spurious, we just happen to benefit from them so much as a species bc we have a way to accumulate and distribute that information.nnI think this gives the impression that in general maybe we believe humans are smarter than they actually our at a baseline level.”, “ - Don’t I agree with this exactly at time code 50:30? https://youtu.be/nO6sDk6vO0g?t=3031”, “ - @@martindbp n"airplanes don’t fly, they just fall forward…"n:D”, “Map and navigation are Moorre and Mealy machines that may be separately constructed.”, “You mentioned that reasoning is a process of applying a set of first principles rules, i.e. logic, to get from a set of inputs to some output. But lets say you write a program that does such a reasoning computation and confirm that that is indeed reasoning. If we then consider a lower level of abstraction, such as the assembly trace generated for such program, if I showed that to you, you would say no no, that is just a bunch of registers flapping about, that isn’t reasoning, there is no logic going on there. But there is, it’s implementing the reasoning program that you wanted. Same for a neural network. What my point is, that to say that neural networks DON’T reason, is an unfounded claim to make. Also, neural networks are a process where you apply iterative layers of compute to an input so it might as well be doing small shreds of reasoning on an input at various levels.”, “ - Somewhere in your comment you made a mistake in your assumption that breaks down the conclusion. Try to find it :)”, “ - u200b@egor.okhterov I disagree with OPs claim but your response wasn’t helpful to them. If they made a mistake then let them know what it is”, “ - @@qwertyvypez The comment is big and I cannot copy the part I wanted to point out in mobile app :(nIt contains the word "flapping "”, “ - @@egor.okhterov I’m not seeing it :). When you get the time, please do show.”, “Glad to see Keith again on the podcast.”, “ - Iu2019m not. He just shows up to engage in smug denialism using terrible argumentation.”, “Based on your set up of the problem the solution can be simply: (1) putting 2 hands in say, N and S and flipping the switch up. (2) And then… without removing hand from N, remove hand from S and place it into E and flip the E up. (3) And then… still without removing from N, remove the second hand from E and place it into W and flip the W up. And voila, you have all switches up without the pillar spinning at all. In three steps.”, “ - Yeah I don’t get the puzzle either.”, “ - I’m pretty sure the intention of the riddle is that you can put your hands in at most two holes between every spin of the pillar, and the puzzle just doesn’t state this clearly enough. Your solution is a 1-step solution in which you test all four holes in a single step. (By a "step", they mean one turn where you put in two hands and remove them and you let the pillar spin.)”, “ - @@bastiaanabcde so whats the solution for this variant? nIn the first step you can make two levers the same, let’s say N and S. In the next turn, after the random shuffling you might look at N and E. This guarantees that you have 3 levers in the same position. However, I don’t see how you are ever going to bring the fourth lever into alignment. Because, simply by chance or something like maxwells demon, it could always make it so that, you never see the 4th lever.”, “ - @@xthesayuri5756 recall that you win if all four switches are aligned, which may mean either all up or all down. So youu2019re right that there might be a lever that is never shown to you, but this isn’t needed to solve the puzzle.nnNote: If you manage to get three levers in the same position and you havenu2019t won yet you know that the fourth must be in the opposite position. So you can get information about levers in this way even if you haven’t touched them.nnI wonu2019t spoil the answer because maybe with this hint youu2019ll be able to figure it out”, “ - u200b@@bastiaanabcde i see it now thanks, had to draw a state diagram in paint lol”, “Reasoning distinguished a map and navigation in service of problem solving.”, “Really love this channel. The talks ha e really made me understand machine learning better. Less mystical, more grounded.”, “38:15 I felt like that was a message for me ud83dude02 Keep using your full range of expression, I’ll catch up eventually lol, love this channel ud83dude4fu2764ufe0fud83dudc4d”, “Neural networks doomers are back !”, “!openai LOVE IT!”, “Is there any evidence that the "reasoning traces" are not fabricated by the model to please its RLHF overseers and have nothing to do with the underlying activations that generate the answer?”, “@20:00, What is Reasoning? So if the artificial neural network, like o1, is not doing the reasoning, then what about humans? Maybe human brains are also not doing the reasoning per your definition?”, “ - It has read every single academic paper and yet cannot produce new insights. There are certainly many undiscovered connections between different fields and subfields in STEM and yet these models aren’t able to identify a single one.”, “ - u200b@slowloris4346 that’s not true, a wave of combinatorial discoveries are coming. It just takes time for humans to set it up like that and then evaluate all the results.nBut ppl have started”, “ - @@clearmindstudiosif the ai is u201creasoningu201d then surely it shouldnu2019t need humans to do this ud83dude09”, “ - @@tonystarkagi I don’t have a strong opinion and I don’t really know what to think, for now though, my observation stands.”, “ - @@slowloris4346 They don’t produce anything until a human asks them to do something. LLMs might not be the way but things like AlphaFold have come up with new things what we didn’t know yet. Humans might have eventually discovered all those protein structures eventually but it really sped things up.”, “The test was really confusing and I do not see it as neither confirming nor negating anything… it seemed just a dumb and confused riddle to ask with no connection at all to reasoning abilities… I would expect a more clear stated riddle ud83dude22”, “ - Yes. Its like Yann LeCun one on the globe. I ve seen this in numerous videos about openai o1 and reasoning : they will ask it riddles that either : they dont know the answer themselves// most people would find ultra confusing.”, “ - @@ElieSanhDucos0 both this one and Yann LeCun’s are quite clear if you give them an honest try, this one is already solved in the comments.”, “ - The problem statement is contradictory. First it is stated that the spinning is random. Then it is stated that it is controlled by a hyper intelligence and is specifically designed to interfere with your sampling which makes the spinning not random. I wonder if that contradiction is throwing off the model.”, “ - @@Ahandleofrum It is technically not random but I think it is obvious what the intention is after a full read.”, “ - @@10ahm01 In the context of prompt engineering it is a terrible thing to do to an LLM.”, “I’m coding on a Sunday and I get to pop this on in the background. Nice surprise, love it.”, “ - Well Iu2019m prompt-coding as well ud83dude02”, “YES ! Many should listen to this video !! Thank You !!!”, “Looking forward to your thoughts…”, “i love the ones with just you two. i dig the guests, but when its just you guys it feels like I should crack a beer and join ya.”, “ - Back to the good old days of early MLST”, “ - Beer would be downgrade”, “ - Yeah, maybe a coffee or tea instead. Beer does feel like a downgrade.”, “ - Echo!”

]