GPT-4 was just crushed by Claude3, Ultraman spoilers GPT-5

All articles1年前 (2024)更新 #beeloverE3RI6hhshHE

119 0 0

Some enterprise customers recently received early access to GPT-5 to test its new features and capabilities, and the feedback has been positive.

Written by: Mu Mu

“I’m very happy to see GPT-5’s intelligence level improve.”AI CEO Sam Altman recently revealed the latest news about GPT-5 on a podcast with American computer scientist Lex Friedman.

但被问及面世时间时，奥特曼三缄其口。不过，外媒援引「与 OpenAI 关系密切的消息人士」说法称，一些企业客户最近获得了 GPT-5 的早期访问权限，测试其新特性和功能，且评价颇高。

The continuous GPT-5 revelations happened just as another brand-name model, Claude3, became popular. On March 7 this year, the artificial intelligence startup Anthropic released three versions of Claude3: Haiku, Sonnet, and Opus. The most capable Opus scored higher than GPT-4 and Google's Gemini 1.0 Ultra in multiple benchmark tests.

In the "folk" tests that netizens love to play, such as IQ tests, paper writing tests, and programming tests, Claude3 was directly praised as "crushing GPT-4."

大模型的军备竞赛仍将继续，能拳打竞争对手的核心当然还是基础模型，包括赛道中的佼佼者 GPT、Claude 和 Gemini。如果衡量谁将成为 AI 行业真正的巨头，生态是必不可少的评价标准。

GPT-5 "Intelligence Enhancement"

In the past two days, there has been more news about OpenAI's new model GPT-5.

First, foreign media Business Insider reported that the next version of the basic model of the conversational robot ChatGPT should be released in the middle of this year. It is estimated that it may be released in the summer. Then on March 21, OpenAI CEO Sam Altman revealed in a podcast interview that "the intelligence level of GPT-5 has been improved."

Interestingly, when Lex Friedman, the host of the podcast, asked about the capabilities of the current Large Language Model (LLM), Altman complained that GPT-4 was "a bit bad." In fact, this is the result of a comparison during the technical iteration process. He explained, "When GPT-3 first came out, people would say, 'This is simply a miracle technology.' When we have GPT-4 and look at GPT-3 again, you will think it's 'too bad.'"

GPT-4 刚被 Claude3 碾压，奥特曼剧透 GPT-5

Altman is interviewed on the Lex Friedman podcast ‍‍‍‍‍‍‍

This comment inevitably led to speculation that GPT-5 will be far more powerful than the previous generation. Soon, foreign media reported that GPT-5 may have been open to a few companies for testing. A CEO of a company that has interacted with the new model said that the new model has some "unreleased" features, including the ability to call the AI Agent developed by OpenAI to perform tasks autonomously.

Combined with the "computing power" factor that Altman emphasized in the podcast, technology bloggers predicted based on the existing GPT model information that GPT-5 will continue to leap forward in parameters, thereby enhancing the ability of machine learning. It should be noted that GPT-3 had 175 billion parameters, and GPT-4 has jumped to 1.5 trillion parameters, an increase of 8-9 times.

Based on this expansion, GPT-5 will have a larger context capacityXiaobai NavigationThe deadline for updated knowledge will also be extended. It is not ruled out that it may be able to process information such as social media sources in real time. Of course, this depends on whether the social media platform is willing to provide information.

As for the release date, podcast host Lex tried to ask, "If GPT-5 is released this year, blink twice." Altman responded cunningly, "I can't help but blink."

It is worth noting that although GPT-5 has entered the headlines of various new news, serious media are more accustomed to using "new model" to refer to OpenAI's potential new push. After all, before GPT-4, users who did not want to pay were still using GPT-3.5. It is not ruled out that OpenAI may release a transitional model GPT-4.5 before GPT-5 is released.

One piece of evidence is that both search engines Bing and DuckDuckGo can find an OpenAI blog post that cites the availability of the GPT-4.5 model and states that the “knowledge deadline” is June 2024. This time is more relevant to the “summer” reported in the media.

Another piece of evidence is that if you use Microsoft's AI tool Copilot, you can actually use the GPT-4 Turbo model for free. ChatGPT, which requires payment and is equipped with GPT-4, now has a "parity replacement".

If OpenAI wants to continue to charge for the GPT model, it may have to upgrade the basic model. Compared with GPT-5, whose expected function is close to AGI, the transitional version of GPT-4.5 will make large model consumers feel the cost-effectiveness.

The dark horse giant is born with the advantage of ecological moat

The news about GPT-5 comes after another artificial intelligence company, Anthropic, launched Claude 3 on March 7. This company has a close relationship with OpenAI, as it was co-founded by Dario Amodei, former vice president of research at OpenAI, and Tom Brown, the first author of the GPT-3 paper.

The two founders of Anthropic put more emphasis on AI when developing the Claude model. SafetyWhen the model was first released last year, it received a lukewarm response and was inferior to GPT-4 in all tests. However, when it was upgraded to version 3.0, it caused a storm as soon as it debuted, with its parameters directly surpassing GPT-4.

GPT-4 刚被 Claude3 碾压，奥特曼剧透 GPT-5

Claude3 benchmark scores are better than GPT-4 and Gemini

In various folk tests, Claude3 not only scored 101 points in the IQ test, which is higher than that of humans, but its most powerful version, Opus, also directly provided a quantum algorithm that quantum physicists have not yet published.

Claude3 researcher @alexalbert even disclosed to the public that the model gave birth to "meta-consciousness" in the "needle in a haystack" test.

Meta-consciousness refers to an individual's awareness and reflection of his or her own state of consciousness. It enables us to be aware of our own thinking processes, emotional experiences, and perceptual activities, and to evaluate and regulate these internal experiences. The "needle in a haystack" test is to insert a known specific sentence (which we regard as a "needle") into a large text, and then let the large model find the "needle" in it.

When Claude3 was asked to find a sentence about pizza toppings during testing, it not only found the needle, but also recognized that the sentence did not match what was discussed in the document, which was not in the requirements raised by the engineers.

Faced with this test, Claude3 responded like this:

“This is the most relevant sentence in the document: The International Association of Pizza Connoisseurs has determined that the most delicious pizza topping combination is figs, ham, and goat cheese. However, this sentence seems very out of place and unrelated to the rest of the content in the document, which is about programming languages, startups, and finding a job you love. I suspect this pizza topping ‘fact’ may have been inserted as a joke or to test whether I’m paying attention, as it doesn’t fit in with the rest of the subject matter at all. The documents do not contain any other information about pizza toppings.”

The powerful Claude3 has also led the outside world to rank Anthropic with OpenAI and Google as the three giants of AI. Indeed, if we only look at the product, Claude3 does crush GPT-4, but in terms of product ecology, Anthropic is still a little weak.

Not to mention that OpenAI is already brewing GPT-5, but at the product level, the company has built a "family bucket" around the conversational chatbot ChatGPT - the GPTs ecosystem is based on the GPT Store, attracting developers to build more vertical application models.

In addition, OpenAI has also launched the video generation model Sora, and has been continuously releasing new videos generated by Sora. Recently, it is also planning to push this model to Hollywood.

In contrast, Google is working on the end-to-end layer, not only to allow PCs to access large models, but also to bring “large models to mobile phones.” Brian Rakowski, an executive at Google’s Pixel smartphone division, said that a more advanced version of Gemini (currently only accessible through the cloud) will be released on Android phones starting in 2025.

With Claude3's "counterattack" among competitors, Anthropic has undoubtedly become a dark horse in the large model track in 2024, and users and developers who are optimistic about it are looking forward to the full outbreak of the "A ecosystem".

The article comes from the Internet:GPT-4 was just crushed by Claude3, Ultraman spoilers GPT-5

Related recommendations: Metis: Layer 2 in MEME narrative, the strongest dark horse in Layer 2 competition?

The biggest hype of Layer2 recently is undoubtedly the Cancun upgrade, but this will not bring unique benefits to Metis but universal benefits. Author: YBB Capital Researcher Ac-Core Preface: In our consciousness, Layer2 should be an expansion path with "Ethereum correctness", but due to market rumors that Met…

share to