DeepSeek dominates the App Store charts, a week when Chinese AI caused an earthquake in the US technology circle

All articles4周前更新 wyatt
13 0 0
The DeepSeek model has shocked Silicon Valley, and its value is still rising.

author:APPSO

In the past week, the DeepSeek R1 model from China has stirred up the entire overseas AI lock up.

On the one hand, it achieves comparable performance to OpenAI The effect of o1 performance illustrates China's advantages in engineering capabilities and scale innovation; on the other hand, it also upholds the open source spirit and is keen to share technical details.

Recently, the research team of Jiayi Pan, a doctoral student at the University of California, Berkeley, successfully reproduced the key technology of DeepSeek R1-Zero - the "moment of enlightenment" at a very low cost (less than $30).

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

So it’s no wonder that Meta CEO Zuckerberg, Turing Award winner Yann LeCun, and Deepmind CEO Demis Hassabis have all spoken highly of DeepSeek.

As the popularity of DeepSeek R1 continues to rise, this afternoon, the DeepSeek App was briefly busy with servers due to a surge in user traffic, and even "crashed" at one point.

OpenAI CEO Sam Altman just tried to reveal the o3-mini usage quota to grab the headlines of international media - ChatGPT Plus members can query 100 times a day.

However, what is not well known is that before it became famous, DeepSeek’s parent company, Huanfang Quantitative, was actually one of the leading companies in the domestic quantitative private equity field.

DeepSeek model shocked Silicon Valley, and its value is still rising

On December 26, 2024, DeepSeek officially released the DeepSeek-V3 large model.

This model performs well in many benchmark tests, surpassing the mainstream top models in the industry, especially in knowledge question answering, long text processing, code generation, and mathematical ability. For example, in knowledge tasks such as MMLU and GPQA, DeepSeek-V3's performance is close to the international top model Claude-3.5-Sonnet.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

In terms of mathematical ability, it has set new records in tests such as AIME 2024 and CNMO 2024, surpassing all known open source and closed source models. At the same time, its generation speed has increased by 200% compared with the previous generation, reaching 60 TPS, which greatly improves the user experience.

According to the analysis of the independent evaluation website Artificial Analysis, DeepSeek-V3 surpasses other open source models in many key indicators and is on par with the world's top closed-source models GPT-4o and Claude-3.5-Sonnet in performance.

The core technical advantages of DeepSeek-V3 include:

  1. Mixture of Experts (MoE) architecture: DeepSeek-V3 has 671 billion parameters, but in actual operation, only 37 billion parameters are activated for each input. This selective activation method greatly reduces the computational cost while maintaining high performance.

  2. Multi-Head Latent Attention (MLA): This architecture has been verified in DeepSeek-V2 and enables efficient training and inference.

  3. Load balancing strategy without auxiliary loss: This strategy aims to minimize the negative impact of load balancing on model performance.

  4. Multi-tokens prediction training objective: This strategy improves the overall performance of the model.

  5. Efficient training framework: It adopts the HAI-LLM framework, supports 16-way Pipeline Parallelism (PP), 64-way Expert Parallelism (EP) and ZeRO-1 Data Parallelism (DP), and reduces training costs through various optimization methods.

More importantly, the training cost of DeepSeek-V3 is only $5.58 million, which is much lower than the $78 million training cost of GPT-4. In addition, the price of its API service also continues the past approach of being affordable.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

Input tokens cost only 0.5 yuan (cache hit) or 2 yuan (cache miss) per million, and output tokens cost only 8 yuan per million.

The Financial Times described it as a "dark horse that shocked the international technology community," and believed that its performance was comparable to that of well-funded American competitor models such as OpenAI. Chris McKay, founder of Maginative, further pointed out that the success of DeepSeek-V3 may redefine the established methods of AI model development.

In other words, the success of DeepSeek-V3 is also seen as a direct response to the US restrictions on the export of computing power. This external pressure has stimulated China's innovation.

Liang Wenfeng, founder of DeepSeek, a low-key genius from Zhejiang University

The rise of DeepSeek has made Silicon Valley uneasy. Liang Wenfeng, the founder behind this model that has shaken the global AI industry, has perfectly interpreted the growth trajectory of a genius in the traditional Chinese sense - achieving success at a young age and becoming more innovative over time.

A good AI company leader needs to understand both technology and business, be visionary and pragmatic, have the courage to innovate and have engineering discipline. This kind of compound talent is a scarce resource in itself.

At the age of 17, he was admitted to Zhejiang University with a degree in Information and Electronic Engineering. At the age of 30, he founded Hquant and began to lead the team to explore fully automatic quantitative trading. Liang Wenfeng's story proves that geniuses always do the right thing at the right time.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

  • 2010: With the launch of the CSI 300 Index Futures, quantitative investment ushered in development opportunities. The Huanfang team took advantage of the opportunity and its proprietary funds grew rapidly.

  • 2015: Liang Wenfeng co-founded Huanfang Quantitative with alumni. The following year, he launched the first AI model and put online trading positions generated by deep learning.

  • 2017: Huanfang Quantitative claims to have achieved full AI-based investment strategies.

  • 2018: Establish AI as the company’s main development direction.

  • 2019: The scale of funds under management exceeded 10 billion yuan, becoming one of the "Big Four" of domestic quantitative private equity.

  • 2021: Huanfang Quantitative becomes the first large quantitative private equity firm in China to exceed 100 billion yuan in scale.

You can’t only remember the days when the company was on the bench in the past few years when it succeeds. However, just like the transformation of quantitative trading companies to AI, it seems unexpected, but it is actually a natural thing - because they are all data-driven technology-intensive industries.

Huang Renxun only wanted to sell gaming graphics cards to make a little money from us gamers, but he didn't expect to become the world's largest AI arsenal. The same is true of Magic Square's entry into the AI field. This kind of evolution is more vital than the current AI model that is mechanically applied in many industries.

Magic Square Quant has accumulated a lot of experience in data processing and algorithm optimization in the process of quantitative investment. At the same time, it has a large number of A100 chips, which provide strong hardware support for AI model training. Since 2017, Magic Square Quant has deployed AI computing power on a large scale, building high-performance computing clusters such as "Firefly 1" and "Firefly 2" to provide strong computing power support for AI model training.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

In 2023, Magic Square Quant officially established DeepSeek, focusing on the research and development of large AI models. DeepSeek inherited Magic Square Quant's accumulation of technology, talents and resources, and quickly emerged in the field of AI.

In an in-depth interview with "Undercurrent", DeepSeek founder Liang Wenfeng also demonstrated a unique strategic vision.

Unlike most Chinese companies that choose to copy the Llama architecture, DeepSeek starts directly from the model structure, aiming only at the grand goal of AGI.

Liang Wenfeng made no secret of the current gap. There is a significant gap between China's current AI and the world's top level. The combined gap in model structure, training dynamics and data efficiency requires four times the computing power to achieve the same effect.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

▲The picture is from CCTV News screenshot

This attitude of facing challenges head-on stems from Liang Wenfeng's many years of experience in Huanfang.

He emphasized that open source is not only about sharing technology, but also a cultural expression. The real moat lies in the team's ability to continuously innovate. DeepSeek's unique organizational culture encourages bottom-up innovation, downplays hierarchy, and values the enthusiasm and creativity of talents.

The team is mainly composed of young people from top universities, and adopts a natural division of labor model to allow employees to explore and collaborate independently. When recruiting, they value employees' passion and curiosity more than traditional experience and background.

Regarding the industry prospects, Liang Wenfeng believes that AI is in the explosive period of technological innovation, rather than the explosive period of application. He emphasized that China needs more original technological innovation and cannot always be in the imitation stage. Someone needs to stand at the forefront of technology.

Even though companies like OpenAI are currently leading the way, opportunities for innovation remain.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

Deepseek has taken Silicon Valley by storm, making the overseas AI community uneasy

Although the industry's evaluation of DeepSeek varies, we have also collected some evaluations from industry insiders.

Jim Fan, project leader of NVIDIA's GEAR Lab, spoke highly of DeepSeek-R1.

He pointed out that this represents non-US companies fulfilling OpenAI's original open mission, achieving influence by making original algorithms and learning curves public, and by the way, it also implies a wave of OpenAI.

DeepSeek-R1 not only open-sources a collection of models, but also discloses all the training secrets. They may be the first open-source projects to demonstrate significant and sustained growth in the RL flywheel.

Impact can be achieved through legendary projects like "ASI internal implementation" or "Strawberry Project", or simply by making raw algorithms and matplotlib learning curves public.

Marc Andreesen, founder of A16Z, a top venture capital firm on Wall Street, believes that DeepSeek R1 is one of the most amazing and impressive breakthroughs he has ever seen. As open source, it is a profound gift to the world.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

Lu Jing, former senior researcher at Tencent and postdoctoral fellow in artificial intelligence at Peking University, analyzed the situation from the perspective of technology accumulation. He pointed out that DeepSeek did not become popular suddenly, but inherited many innovations from the previous generation of model versions. The relevant model architecture and algorithm innovations have been iteratively verified, and it is inevitable that it will shake the industry.

Yann LeCun, Turing Award winner and chief AI scientist at Meta, proposed a new perspective:

"For those who saw DeepSeek's performance and thought that "China is surpassing the United States in AI", your interpretation is wrong. The correct interpretation should be that "open source models are surpassing proprietary models." "

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

Deepmind CEO Demis Hassabis's comments revealed a hint of worry:

"It's impressive what it has achieved, and I think we need to think about how to stay ahead of the Western cutting-edge models. I think the West is still ahead, but certainly China has extremely strong engineering and scaling capabilities."

Microsoft CEO Satya Nadella said at the World Economic Forum in Davos, Switzerland that DeepSeek has effectively developed an open source model that not only performs well in inference computing but also has extremely high supercomputing efficiency.

He stressed that Microsoft must respond to these breakthroughs in China with the utmost attention.

Meta CEO Zuckerberg's evaluation was more in-depth. He believed that the technical strength and performance demonstrated by DeepSeek were impressive, and pointed out that the AI gap between China and the United States has become minimal, and China's all-out sprint has made the competition even more fierce.

The reaction from competitors may be the best recognition of DeepSeek. According to Meta employees in the anonymous workplaceCommunity According to the revelation on TeamBlind, the emergence of DeepSeek-V3 and R1 has caused Meta's generative AI team to panic.

Meta's engineers are racing to analyze DeepSeek's technology and try to replicate whatever techniques they can.

The reason is that the training cost of DeepSeek-V3 is only $5.58 million, which is even less than the annual salary of some Meta executives. Such a disparity in input-output ratio makes Meta management feel pressured when explaining its huge AI R&D budget.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

International mainstream media have also paid close attention to the rise of DeepSeek.

The Financial Times pointed out that DeepSeek's success overturned the traditional perception that "AI research and development must rely on huge investments", proving that precise technical routes can also achieve excellent research results. More importantly, the DeepSeek team's selfless sharing of technological innovation has made this company, which focuses more on research value, an exceptionally strong competitor.

The Economist said that China's rapid breakthroughs in cost-effectiveness in AI technology have begun to shake the United States' technological advantage, which may affect the United States' productivity improvement and economic growth potential in the next decade.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

The New York Times approached the issue from another angle, saying that DeepSeek-V3 is comparable in performance to high-end chatbots from American companies, but at a much lower cost.

This shows that even with chip export controls, Chinese companies can compete through innovation and efficient use of resources. Moreover, the US government’s chip restriction policy may have the opposite effect, instead promoting China’s innovative breakthroughs in the field of open source AI technology.

DeepSeek "reports the wrong address" and claims to be GPT-4

Amid all the praise, DeepSeek also faces some controversy.

Many outsiders believe that DeepSeek may have used the output data of models such as ChatGPT as training materials during the training process. Through model distillation technology, the "knowledge" in these data was transferred to DeepSeek's own model.

This practice is not uncommon in the field of AI, but skeptics are concerned about whether DeepSeek used the output data of the OpenAI model without full disclosure. This seems to be reflected in DeepSeek-V3's self-awareness.

Earlier, some users discovered that when asking the model about its identity, it would mistake itself for GPT-4.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

High-quality data has always been an important factor in the development of AI. Even OpenAI cannot avoid the controversy over data acquisition. Its practice of crawling data on a large scale from the Internet has also resulted in many copyright lawsuits. As of now, the first-instance ruling between OpenAI and the New York Times has not yet been finalized, and a new case has been added.

Therefore, DeepSeek was publicly criticized by Sam Altman and John Schulman.

“It’s [relatively] easy to replicate something you know works. It’s very hard to do something new, risky, and difficult when you don’t know if it will work.”

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

However, the DeepSeek team explicitly stated in the R1 technical report that the output data of the OpenAI model was not used, and stated that high performance was achieved through reinforcement learning and a unique training strategy.

For example, a multi-stage training method is adopted, including basic model training, reinforcement learning (RL) training, fine-tuning, etc. This multi-stage cyclic training method helps the model absorb different knowledge and capabilities at different stages.

Saving money is also a technical job, the technology behind DeepSeek is a good way

A noteworthy discovery mentioned in the DeepSeek-R1 technical report is the "aha moment" that occurred during the R1 zero training process. In the mid-term training stage of the model, DeepSeek-R1-Zero began to actively re-evaluate the initial problem-solving ideas and allocate more time to optimize the strategy (such as trying different solutions multiple times).

In other words, through the RL framework, AI may spontaneously develop human-like reasoning capabilities and even go beyond the limitations of preset rules. And this will also hopefully provide direction for the development of more autonomous and adaptive AI models, such as in complex decision-making.Xiaobai NavigationDynamically adjust strategies in policies (medical diagnosis, algorithm design).

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

At the same time, many industry insiders are trying to analyze DeepSeek's technical report in depth. Andrej Karpathy, former co-founder of OpenAI, said after the release of DeepSeek V3:

DeepSeek (a Chinese AI company) had a lighthearted day today by publicly releasing a state-of-the-art language model (LLM) that was trained on a shoestring budget (2048 GPUs, 2 months, and $6 million).

For reference, this capability would typically require a cluster of 16K GPUs, and most of these state-of-the-art systems today use around 100K GPUs. For example, Llama 3 (405B parameters) used 30.8 million GPU hours, while DeepSeek-V3, which appears to be a more powerful model, used only 2.8 million GPU hours (about 1/11 of the computational effort of Llama 3).

If this model also performs well in real-world tests (e.g., LLM Arena ranking is ongoing and my quick test performed well), then this will be a very impressive achievement showing research and engineering capabilities under resource-constrained conditions.

So, does this mean that we no longer need large GPU clusters to train cutting-edge LLMs? Not really, but it shows that you have to make sure that the resources you use are not wasted, and this case shows that data and algorithm optimization can still bring great progress. In addition, the technical report is also very interesting and detailed, and it is worth reading.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

In response to the controversy over DeepSeek V3's use of ChatGPT data, Karpathy said that large language models do not inherently possess human-like self-awareness. Whether the model can correctly answer its own identity depends entirely on whether the development team has specially built a self-cognition training set. If there is no special training, the model will answer based on the closest information in the training data.

Furthermore, the fact that the model identifies itself as ChatGPT is not the problem; given the ubiquity of ChatGPT-related data on the Internet, this answer actually reflects a natural “neighborhood knowledge emergence” phenomenon.

Jim Fan pointed out after reading the technical report of DeepSeek-R1:

The most important point of this paper is that it is completely driven by reinforcement learning, without any involvement of supervised learning (SFT). This approach is similar to AlphaZero - mastering Go, Shogi and chess from scratch through "Cold Start" without imitating the way human players play.

– Use real rewards calculated based on hard-coded rules, rather than learned reward models that can be easily “cracked” by reinforcement learning.

– The model’s thinking time increases steadily as the training progresses. This is not pre-programmed but a spontaneous characteristic.

– There is a phenomenon of self-reflection and exploratory behavior.

– Use GRPO instead of PPO: GRPO removes the critic network in PPO and uses the average reward of multiple samples instead. This is a simple method that can reduce memory usage. It is worth noting that GRPO was invented by the DeepSeek team in February 2024, which is really a very powerful team.

When Kimi also released similar research results on the same day, Jim Fan discovered that the two companies' research results were similar:

  • They have abandoned complex tree search methods such as MCTS and turned to simpler linear thinking trajectories, using traditional autoregressive prediction methods.

  • Both avoid using value functions that require additional model copies, reducing computing resource requirements and improving training efficiency.

  • Both abandon intensive reward modeling and rely on real results as guidance as much as possible to ensure the stability of training

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

But there are also significant differences between the two:

  • DeepSeek uses the AlphaZero-style pure RL cold start method, Kimik1.5 chooses the AlphaGo-Master-style warm-up strategy and uses lightweight SFT

  • DeepSeek is open sourced under the MIT protocol, and Kimi performs well in multimodal benchmarks. The paper is more detailed in system design, covering RL infrastructure, hybrid clusters, code sandboxes, and parallel strategies.

However, in this fast-paced AI market, the leading edge is often fleeting. Other model companies will quickly learn from DeepSeek's experience and improve it, and may soon catch up.

The initiator of the large model price war

Many people know that DeepSeek has a title called "Pinduoduo of AI", but they don't know that the meaning behind this actually stems from the big model price war that started last year.

On May 6, 2024, DeepSeek released the DeepSeek-V2 open source MoE model, which achieved a double breakthrough in performance and cost through innovative architectures such as MLA (multi-head latent attention mechanism) and MoE (mixed expert model).

The inference cost was reduced to only RMB 1 per million tokens, which was about one-seventh of Llama3 70B and one-seventieth of GPT-4 Turbo at the time. This technological breakthrough enabled DeepSeek to provide cost-effective services without paying any money, while also bringing huge competitive pressure to other manufacturers.

The release of DeepSeek-V2 triggered a chain reaction. ByteDance, Baidu, Alibaba, Tencent, and Zhipu AI followed suit and drastically cut the prices of their large-model products. The influence of this price war even crossed the Pacific Ocean and attracted great attention from Silicon Valley.

DeepSeek is therefore dubbed the "Pinduoduo of AI".

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

Facing external doubts, Liang Wenfeng, founder of DeepSeek, responded in an interview with Undercurrent:

“Attracting users is not our main purpose. We lowered the price on the one hand because we are exploring the structure of the next generation model and the cost has been reduced first; on the other hand, we also think that both API and AI should be universal and affordable for everyone.”

In fact, the significance of this price war goes far beyond the competition itself. The lower entry threshold allows more companies and developers to access and apply cutting-edge AI, while also forcing the entire industry to rethink its pricing strategy. It was during this period that DeepSeek began to enter the public eye and emerge.

Lei Jun spends a fortune to recruit a talented AI girl

A few weeks ago, DeepSeek also saw a notable personnel change.

According to China Business News, Lei Jun successfully poached Luo Fuli with an annual salary of tens of millions, and entrusted her with the important task of being the head of the large model team of Xiaomi AI Laboratory.

Luo Fuli joined DeepSeek, a subsidiary of Huanfang Quantitative, in 2022. She can be seen in important reports such as DeepSeek-V2 and the latest R1.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

Later, DeepSeek, which once focused on the B-end, also began to develop the C-end and launched mobile applications. As of press time, DeepSeek's mobile application ranked second in the free version of Apple App Store, showing strong competitiveness.

A series of small climaxes made DeepSeek famous, but at the same time, a higher climax was also piled up. On the evening of January 20, the ultra-large-scale model DeepSeek R1 with 660B parameters was officially released.

This model performs well on mathematical tasks, such as achieving a pass@1 score of 79.8% on AIME 2024, slightly higher than OpenAI-o1; and a score of 97.3% on MATH-500, which is comparable to OpenAI-o1.

In terms of programming tasks, it achieved an Elo rating of 2029 on Codeforces, surpassing the 96.3% of human participants. In knowledge benchmarks such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek R1 scored 90.8%, 84.0%, and 71.5%, respectively, which is slightly lower than OpenAI-o1, but better than other closed-source models.

In the latest comprehensive list of the LM Arena, DeepSeek R1 ranked third, tied with o1.

  • DeepSeek R1 ranked first in areas such as "Hard Prompts", "Coding" and "Math".

  • In terms of "Style Control", DeepSeek R1 and o1 tied for first place.

  • In the "Hard Prompt with Style Control" test, DeepSeek R1 also tied with o1 for first place.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

In terms of open source strategy, R1 adopts the MIT License, giving users the greatest freedom of use. It supports model distillation and can distill reasoning capabilities into smaller models. For example, the 32B and 70B models have achieved the same results as o1-mini in many capabilities. The open source strength even surpasses Meta, which has been criticized before.

The emergence of DeepSeek R1 has enabled domestic users to use models comparable to O1 for free for the first time, breaking the long-standing information barriers. The discussion boom it has sparked on social platforms such as Xiaohongshu is comparable to that of GPT-4 when it was first released.

Go out to the sea and go inward

Looking back at DeepSeek's development trajectory, its code for success is clear: strength is the foundation, but brand recognition is the moat.

In a conversation with LatePost, MiniMax CEO Yan Junjie shared his thoughts on the AI industry and the company's strategic transformation. He emphasized two key turning points: one is to recognize the importance of technology brands, and the other is to understand the value of open source strategies.

Yan Junjie believes that in the field of AI, the speed of technological evolution is more important than current achievements, and open source canCommunityFeedback accelerates this process; secondly, a strong technology brand is crucial to attracting talent and acquiring resources.

Take OpenAI as an example. Although it encountered management turmoil later, its innovative image and open source spirit established in the early days have accumulated the first wave of good impressions for it. Even though Claude has been evenly matched in technology and gradually eroded OpenAI's B-end users, OpenAI still leads by a large margin in C-end users due to its user path dependence.

In the field of AI, the real competition stage is always global. Going overseas, competing internally, and promoting is an absolutely good way forward.

DeepSeek 霸榜 App Store,中国 AI 引发美国科技圈地震的一周

This trend of going overseas has already caused ripples in the industry. Earlier Qwen, Mianbi Intelligence, and more recently DeepSeek R1, kimi v1.5, and Doubao v1.5 Pro have already caused quite a stir overseas.

Although 2025 has been dubbed the first year of intelligent bodies, the first year of AI glasses and many other labels, this year will also be an important year for Chinese AI companies to embrace the global market, and going global will become an unavoidable keyword.

Moreover, the open source strategy is also a good move, attracting a large number of technical bloggers and developers to spontaneously become DeepSeek's "tap water". Technology for good should not be just a slogan. From the slogan of "AI for All" to the real access to technology, DeepSeek has taken a purer path than OpenAI.

If OpenAI showed us the power of AI, then DeepSeek made us believe:

This power will ultimately benefit everyone.

The article comes from the Internet:DeepSeek dominates the App Store charts, a week when Chinese AI caused an earthquake in the US technology circle

Related recommendations: The demise of decentralization and the concentration of power: US capital is about to complete the transfer of rights to the crypto utopia

Medicines are poisonous to some extent. The continuous influx of funds into ETFs is just a pain-relieving capsule that cannot completely cure the disease. Author: YBB Capital Researcher Ac-Core TL;DR ● In the long run, Bitcoin through ETFs is not a good thing. There is a huge gap between the trading volume of Hong Kong Bitcoin ETFs and that of the United States Bitcoin ETFs. There is no doubt that US capital is…

share to
© 版权声明

相关文章