Huang Renxun talks with seven authors of the Transformer paper to discuss the future of big models

All articles1年前 (2024)更新 #beeloverE3RI6hhshHE

116 0 0

The world needs something better than the Transformer, and I think all of us here are hoping it will be replaced by something that takes us to a new performance plateau.

Written by: Guo Xiaojing

Source: Tencent News

In 2017, a landmark paper, "Attention is All You Need", was published. It introduced the Transformer model based on the self-attention mechanism for the first time. This innovative architecture broke away from the constraints of traditional RNN and CNN. Through the parallel processing attention mechanism, it effectively overcame the problem of long-distance dependency and significantly improved the speed of sequence data processing. The encoder-decoder structure and multi-head attention mechanism of Transformer have set off a storm in the field of artificial intelligence. The popular ChatGPT is built based on this architecture.

Imagine that the Transformer model is like your brain being able to focus on every word a friend says at the same time and understand the connection between these words when talking to them. It gives computers human-like language comprehension capabilities. Before this, RNN was the mainstream method for processing language, but its information processing speed was slow, just like an old-fashioned tape player that had to play word by word. The Transformer model is like an efficient DJ that can manipulate multiple tracks at the same time and quickly capture key information.

The emergence of the Transformer model has greatly improved the ability of computers to process language, making tasks such as machine translation, speech recognition, and text summarization more efficient and accurate, which is a huge leap for the entire industry.

This innovation comes from eight people who used to work at Google. AI The eight authors used their expertise to publish the paper "Attention Is All You Need" in December 2017, which described the Transformer architecture in detail and opened up the field of generative AI A new chapter.

In the world of generative AI, the scaling law is a core principle. In short, as the scale of the Transformer model increases, its performance also improves, but this also means that more powerful computing resources are needed to support larger models and deeper networks. NVIDIA, which provides high-performance computing services, has also become a key player in this AI wave.

At this year's GTC conference, Nvidia's Huang Renxun invited the seven authors of Transformer (Niki Parmar was unable to attend for some reason) to participate in a roundtable discussion in a ceremonial manner. This was the first time that the seven authors appeared collectively in public.

They also expressed some impressive views in the conversation:

The world needs something better than the Transformer, and I think all of us here are hoping it will be replaced by something that takes us to a new performance plateau.
We did not succeed in our initial goal. Our original intention when we started Transformer was to simulate the evolution of tokens. It is not just a linear generation process, but a gradual evolution of text or code.
For a simple problem like 2+2, it might use trillions of parameters of a large model. I think adaptive computing is one of the things that has to come next, where we know how much computing resources to spend on a particular problem.
I think the current model is too affordable and too small, about $1 a million toke, which is 100 times cheaper than going out and buying a paperback.

The following is the content record:

黄仁勋对话 Transformer 论文七大作者，探讨大模型未来

Jen-Hsun Huang：In the past sixty years, computer technology does not seem to have undergone fundamental changes, at least since the moment I was born. The computer systems we currently use, whether multi-tasking, separation of hardware and software, software compatibility, or data backup capabilities, as well as the programming skills of software engineers, are basically based on the design principles of IBM System360 - central processing unit, Bio subsystem, multi-tasking, hardware and software, software system compatibility, etc.

I would argue that modern computing has not fundamentally changed since 1964. Although computers underwent a major transformation in the 1980s and 1990s to the form we are familiar with today, the marginal cost of computers has continued to decline over time, decreasing by a factor of ten every ten years, a factor of a thousand every fifteen years, and a factor of ten thousand every twenty years. In this computer revolution, the reduction in cost was so great that in twenty years the cost of computers dropped by a factor of almost ten thousand, and this change provided a huge impetus for society.

Try to imagine that if all the expensive items in your life were reduced to one ten-thousandth of their original prices, such as a car that you spent $200,000 on 20 years ago, now only costs $1, can you imagine the change? However, the decline in computer costs did not happen overnight, but gradually reached a critical point, after which the cost reduction trend suddenly stopped. It still continues to improve a little bit every year, but the rate of change has stagnated.

We began to explore accelerated computing, but it is not easy to use accelerated computing. You need to design it from scratch. In the past, we may have solved the problem step by step according to the established steps, but now, we need to redesign these steps. This is a new scientific field, which reformulates the previous rules into parallel algorithms.

We recognize this and believe that if we can accelerate even 1% of code and save 99% of running time, there will be applications that can benefit from it. Our goal is to make the impossible possible, or make the possible impossible, or make the already possible things more efficient, which is the meaning of accelerated computing.

Looking back at the company's history, we found that we have the ability to accelerate a variety of applications. At first, we achieved significant acceleration results in the gaming field, so good that people mistakenly thought we were a gaming company. But in fact, our goal is much more than that, because this market is huge and can drive incredible technological progress. This situation is not common, but we found such an exception.

Long story short, in 2012, AlexNet ignited the spark, which was the first collision of artificial intelligence and NVIDIA GPU. This marked the beginning of our amazing journey in this field. A few years later, we found a perfect application scenario, which laid the foundation for our development today.

In short, these achievements have laid the foundation for the development of generative AI. Generative AI can not only recognize images, but also turn text into images, and even create new content. Now, we have enough technical ability to understand pixels, recognize them, and understand the meaning behind them. Through this meaning, we can create new content. The ability of artificial intelligence to understand the meaning behind data is a huge change.

We have reason to believe that this is the beginning of a brand new industrial revolution. In this revolution, we are creating things that have never been created before. For example, in the previous industrial revolution, water was the source of energy. Water entered the device we created, and the generator started working. Water came in and electricity came out, just like magic.

Generative AI is a new type of "software" that can create software. It relies on the joint efforts of many scientists. Imagine that you give AI raw materials - data, and they enter a "building" - a machine we call a GPU, and it can output magical results. It is reshaping everything, and we are witnessing the birth of the "AI factory".

This change can be called a new industrial revolution. In the past, we have never really experienced such a change, but now it is slowly unfolding before us. Don't miss the next decade, because in this decade, we will create huge productivity. The pendulum of time has started, and our researchers have begun to act.

Today we invited the creators of Tansformer to discuss where generative AI will take us in the future.

They are:

Ashish Vaswani: Joined the Google Brain team in 2016. In April 2022, he co-founded Adept AI with Niki Parmar, left the company in December of the same year, and co-founded another artificial intelligence startup Essential AI.

Niki Parmar: Worked at Google Brain for four years, then co-founded Adept AI and Essential AI with Ashish Vaswani.

Jakob Uszkoreit: Worked at Google from 2008 to 2021. Left Google in 2021 and co-founded Inceptive, a company focused on artificial intelligence life sciences, dedicated to using neural networks and high-throughput experiments to design next-generation RNA molecules.

Illia Polosukhin: Joined Google in 2014, was one of the first to leave the eight-person team, and co-founded Google in 2017BlockchainCompany: NEAR Protocol.

Noam Shazeer: Worked at Google from 2000 to 2009 and from 2012 to 2021. In 2021, Shazeer left Google and co-founded Character.AI with former Google engineer Daniel De Freitas.

Llion Jones：Worked at Delcam and YouTube. Joined Google in 2012 as a software engineer. Later left Google to found the artificial intelligence startup sakana.ai.

Lukasz Kaiser：Former researcher at the French National Center for Scientific Research. Joined Google in 2013. In 2021, he left Google and became a researcher at OpenAI.

Aidan Gomez: Graduated from the University of Toronto, Canada, he was an intern in the Google Brain team when the Transformer paper was published. He was the second person in the eight-person team to leave Google. In 2019, he co-founded Cohere with others.

黄仁勋对话 Transformer 论文七大作者，探讨大模型未来

Jen-Hsun Huang:Please take the opportunity to speak up. There is no topic that is off-limits. You can even jump up from your chair to discuss the issue. Let's start with the most basic question: what problem did you encounter at the time? What inspired you to create Transformer?

Illia Polosukhin: If you want to release a model that can actually read search results, such as processing piles of documents, you need some model that can process this information quickly. The recurrent neural network (RNN) at the time did not meet this need.

Indeed, although recurrent neural networks (RNNs) and some preliminary attention mechanisms (Arnens) attracted attention at that time, they still required reading word by word, which was not very efficient.

Jakob Uszkoreit: The rate at which we generate training data far exceeds our ability to train state-of-the-art architectures. We actually use simpler architectures, such as feed-forward networks that take n-grams as input features. These architectures, at least with large amounts of training data at Google scale, often outperform more complex, state-of-the-art models simply because they can be trained faster.

At that time, powerful RNNs, especially long short-term memory networks (LSTM), already existed.

Noam Shazeer: It seems like this is a burning problem. We started noticing these scaling laws around 2015, where you could see that as the model got bigger, it got smarter. It was like the best problem in the history of the world, and it was so simple: you’re just predicting the next token, and it gets so smart and can do a million different things, and you just want to scale it up and make it better.

And a huge frustration was that RNNs were just too cumbersome to work with. And then I overheard these guys talking about, hey, let's replace it with convolutions or attention mechanisms. And I thought to myself, great, let's do that. I like to compare the Tansformer to the leap from the steam engine to the internal combustion engine. We could have done the industrial revolution with the steam engine, but it would have been very painful, and the internal combustion engine made everything better.

Ashish Vaswani: I started to learn some of these lessons the hard way in graduate school, especially when I was working on machine translation. I realized, hey, I'm not going to learn the intricacies of language rules. I think Gradient Descent, the way we train these models, is a better teacher than I am. So I'm not going to learn the rules, I'm just going to let Gradient Descent do all the work for me, and that was my second lesson.

What I learned from these bitter lessons is that those general architectures that can scale will ultimately win out in the long run. Today it may be tokens, tomorrow it may be the actions we take on computers, they will start to mimic our activities and be able to automate a lot of the work we do. As we discussed, the Transformer, especially its self-attention mechanism, has very broad applicability, and it also makes gradient descent better. Another thing is physics, because one thing I learned from Noam is that matrix multiplication is a good idea.

Noam Shazeer: This pattern keeps repeating itself. So every time you add a bunch of rules, gradient descent eventually gets better at learning those rules than you do. That's it. Just like what we've been doing with deep learning, we were building an AI model that was shaped like a GPU. Now, we're building an AI model that's shaped like a supercomputer. Yes, the supercomputer is now the model. Yes, it's true. Yes. The supercomputer is just to let you know, we're building the supercomputer to be the shape of the model.

Huang: So what problem are you trying to solve?

Lukasz Kaiser: Machine translation. If you think back five years ago, it seemed like a very difficult process, you had to collect data, maybe translate it, but the result might be barely correct. It was very basic. But now, these models can learn to translate even without data. You just give it one language and another language, and the model learns to translate on its own, and this ability just emerges naturally, and the effect is very satisfactory.

Llion Jones: But the intuition of "Attention" is what you need. So I came up with this title, and basically what happened is that when we were looking for the title.

We were just doing ablations and started throwing away bits of the model just to see if it would get worse. And to our surprise, it started getting better. Including throwing away all the convolutions like this worked much better. So that's where the title comes from.

Ashish Vaswani: Basically what's interesting is that we actually started with a very basic framework and then we added things, we added convolutions and I guess later we took them away. And multi-head attention and a lot of other very important things.

Huang: Who came up with the name Transformer? Why is it called Transformer?

Jakob Uszkoreit: We like this name, we just picked it randomly, we think it is very creative, it changes our data production mode, and uses such a logic. All machine learning is Transformer, and it is a disruptor.

Noam Shazeer: We hadn’t thought of this name before, I think this name is very simple, and many people think this name is very good. I thought of many names before, such as Yaakov, and finally decided on "Transformer", which describes the principle of the model. It actually transforms the entire signal. According to this logic, almost all machine learning will be transformed.

Llion Jones: The reason why Transformer has become such a familiar name is not only because of the content of the translation, but also because we want to describe this transformation in a more general way. I don't think we have done a great job, but as a transformer, as a driver and engine, it is logical. From an architectural point of view, everyone can understand such a large language model, engine and logic. This is a relatively early start.

But we did realize that we were actually trying to create something that was very, very general, that could really turn anything into anything else. And I don't think we predicted how well this would actually work when the Transformer was used for images, which was a little surprising. This might seem logical to you guys, but the fact that you can chunk an image and label each little dot, right. I think that was very early on in architecture.

So when we built the tensor-to-tensor library, we were really focused on scaling up autoregressive training. And that's not just for language, but also for image, audio components.

So Lukasz said what he was doing was translating. I think he was underselling himself, all of these ideas, we are now starting to see these patterns come together, they all join the model.

But really, everything was there a long time ago, and these ideas were percolating, and it took some time. Lukasz’s goal was that we have all these academic datasets that go from images to text, text to images, audio to text, text to text. We should train on everything.

This idea really drove the expansion work and it eventually succeeded and it was so interesting that we could translate images to text, text to images, and text to text.

You're using it to study biology, or biological software, which might be similar to computer software, where it starts out as a program and then you compile it into something that can run on a GPU.

A biological software starts its life with the specification of certain behaviors. Let's say you want to print a protein, like a specific protein in a cell. And then you learn how to use deep learning to translate that into an RNA molecule, but actually once it's in your cell, exhibit those behaviors. So the idea is really more than just translating into English.

Huang: Did you create a large lab to produce all of this?

Aidan Gomez: A lot is available, and in fact still publicly available, because these data are often still largely publicly funded. But in practice, you still need data to clearly illustrate the phenomenon you are trying to explore.

Trying to model that in a given product, let's say protein expression and mRNA vaccines and things like that, or yes, in Palo Alto, we have a whole bunch of robots and people in lab coats, both learning researchers and people who were previously biologists.

Now, we consider ourselves pioneers of something new, working on actually creating this data and validating the models that designed these molecules. But the original idea was translation.

Huang Renxun: The original idea was machine translation. What I want to ask is, what are the key nodes you see in the enhancement and breakthrough of the architecture? And what impact do they have on the design of Transformer?

Aidan Gomez: As you guys have seen along the way, do you think there's really been a lot of additional contributions on top of the base Transformer design? I think on the inference side, there's been a lot of work to speed up these models and make them more efficient.

I still think it's a little bit disturbing to me because of how similar we are to the original form. I think the world needs something better than the Transformer, and I think all of us here are hoping it will be replaced by something that takes us to a new performance plateau.

I want to ask everyone here a question. What do you think is going to happen next? Like is this an exciting step because I think it's too similar to what was 6-7 years ago, right?

Llion Jones: Yeah, I think people are surprised at how similar you are, right? People do like to ask me what's going to happen next, because I'm the author of this paper. Is it like magic, you wave a magic wand and what's going to happen next? What I want to point out is how this specific principle was designed. We don't just need to get better, we need to get significantly better.

Because if it's only slightly better, then that's not enough to push the entire AI industry to something new. So we're stuck with the original model, even though technically it may not be the most powerful thing we have right now.

But everybody knows what kind of individual tools they want, you want to do better context windows, you want faster token generation capabilities. Well, I'm not sure if you like this answer, but they use too much compute resources right now. I think people do a lot of wasted computation. We're working on making it more efficient, thank you.

Huang Renxun: I think we are making this more effective, thank you!

Jakob Uszkoreit: But I think it's more about how to allocate resources, rather than how much resources are consumed in total. For example, we don't want to spend too much money on an easy problem, or spend too little on a hard problem and end up with no solution.

Illiya Polosukhin: This example is like 2+2, and if you feed it into this model correctly, it will use a trillion parameters. So I think adaptive computing is one of the things that has to come next, where we know how much computing resources we should spend on a particular problem.

Aidan Gomez: We know how much computer generation capacity there is right now, and I think this is something we need to focus on next. I think this is a cosmic game changer, and this is also the future development trend.

Lukasz Kaiser: This concept existed before the Transformer, it was integrated into the Transformer model. In fact, I'm not sure if it's clear to everyone here, we didn't succeed in our original goal. We started this project with the intention of simulating the evolution of tokens. It's not just a linear generation process, but a gradual evolution of text or code. We iterate, we edit, and this makes it possible for us to not only imitate how humans develop text, but also to make them part of this process. Because if you can generate content naturally like humans, they can actually provide feedback, right?

We all read Shannon's paper, and our initial thought was to just focus on language modeling and perplexity, but that didn't work out. I think that's where we can go further. It's also about how we can now organize computational resources in a smart way, which is now also applicable to image processing. I mean, diffusion models have an interesting property that they can be refined and improved through iteration. We don't have that capability yet.

I mean, this fundamental question: what knowledge should be built into the model and what knowledge should be placed outside the model? Is it using a retrieval model? The RAG (Retrieval-Augmented Generation) model is an example. Similarly, this also involves the reasoning problem, that is, which reasoning tasks should be done through an external symbolic system and which reasoning tasks should be performed directly inside the model. This is largely a discussion about efficiency. I do believe that large models will eventually learn how to do calculations like 2+2, but if you want to calculate 2+2, you do it by adding numbers, which is obviously inefficient.

Jen-Hsun Huang:If AI only needs to calculate 2+2, then it should directly use the calculator to complete this task with the least amount of energy, because we know that the calculator is the most efficient tool for doing 2+2 calculations. However, if someone asks AI, how did you come to the decision of 2+2? Do you know that 2+2 is the correct answer? This will consume a lot of resources?

黄仁勋对话 Transformer 论文七大作者，探讨大模型未来

Noam Shazeer: That's true. You mentioned an example earlier, but I'm also sure that everyone here has developed an AI system that is smart enough to use a calculator proactively.

This is what the Global Public Goods (GPP) does right now. I think the current model is too affordable and too small. It's cheap because of technology like NV, thanks to its output.

The computational cost per operation is about 10 to 18 dollars. Or, roughly on that order of magnitude. Thank you for creating so much computational resources. But if you look at a model that has five hundred billion parameters and does a trillion computations per token, that's about a dollar per million tokens, which is 100 times cheaper than going out and buying a paperback book and reading it. Our applications are a million times or more more valuable than efficient computation on giant neural networks. I mean, they are undoubtedly more valuable than things like curing cancer, but it's more than that.

Ashish Vaswani: I think making the world smarter means how to get feedback from the world and whether we can achieve multi-tasking and multi-threaded parallelism. If you really want to build such a model and help us design such a model, this is a very good way.

Jen-Hsun Huang:Can you quickly share why you started your company?

Ashish Vaswani: In our company, our goal is to build models and solve new tasks. Our job is to understand the goals and content of the tasks and to meet the needs of customers as these contents change. In fact, since the beginning of 2021, I have found that modelsXiaobai NavigationThe biggest problem with models is that you can’t just make them smarter, you also need to find the right people to interpret them. We want to make the world interact with the models, making them bigger and better. There are certain advances in the learning process, and initially you can’t do this in the vacuum of a lab.

Noam Shazeer: In 2021, we co-founded this company. We have such great technology, but it doesn't benefit a lot of people. Imagine if I were a patient and heard you say that, I would feel that there are tens of billions of people who need to complete different tasks. This is what deep learning is about, we improve technology through comparison. In fact, because of the continuous development of technology, thanks to Huang Renxun's promotion, our ultimate goal is to help people all over the world. You have to test, and we now need to develop faster solutions so that hundreds of people can apply these applications. For example, at the beginning, not everyone was using these applications, and many people used them just for entertainment, but they did work and they did work.

Jakob Uszkoreit: Thank you. I would like to talk about the ecological software system we have created. In 2021, I participated in the establishment of this company, and our goal is to solve some problems that have real scientific impact. In the past, the content we dealt with was quite complicated. But when I had my first child, the way I looked at the world changed. We hope to make human life more convenient, and we also hope to contribute to the research of proteins. Especially after having children, I hope to change the existing medical structure, and I hope that the development of science and technology can have a positive impact on human survival and development. For example, the structure and deconstruction of proteins have been affected to a certain extent, but we currently lack data. We must make efforts based on data. This is not only a duty, but also a responsibility as a father.

Jen-Hsun Huang:I like your point of view, I am always interested in new drug design and how to let computers learn the process of new drug development and generation. If you can learn and design new drugs, and have a laboratory to test it, you can determine whether such a model is feasible.

Llion JonesLlion Jones: Yes, I am the last speaker. The company we co-founded is called Sakana AI, which means "fish". We named the company after the Japanese "fish" because we are like a school of fish, which naturally inspired us to find intelligence. If we can combine many examined elements, we will create something complex and beautiful. Many people may not understand the specific details of this process and content, but our core philosophy within is "Learning always wins."

Whether you want to solve a problem or learn anything, learning will always help you win. In the process of generative artificial intelligence, learning content will also help us win. As a researcher here, I want to remind you that we give computers real meaning to AI models so that they can really help us understand the mysteries of the universe. In fact, I also want to tell you that we are about to announce a new progress, which we are very excited about. Although we now have a series of research results as a cornerstone, we are experiencing a transformative development, and the current model management is organized, which allows people to really participate. We make these models more feasible, and use these large models and transformative models to change the way people understand the world and the universe. This is our goal.

Aidan Gomez: I started the company with similar motivations to Noam Shazeer. I think computers are entering a new paradigm, and it's changing existing products and the way we work. Everything is based on computers, and it's changing within technology. What is our role? I'm actually bridging the gap, bridging the divide. We can see different companies creating such platforms, allowing each company to adapt and integrate products, which is a way to directly face users. This is how we advance technology, and we can make technology more affordable and more popular.

Jen-Hsun Huang:I really like how you seem so excited when Noam Shazeer seems so calm. You two have such different personalities. Now, I'd like to give the floor to Lukasz Kaiser.

Lukasz Kaiser: My experience at OpenAI has been very disruptive. It’s a lot of fun, and we process a lot of data for computation, but at the end of the day, my role is still that of a data cruncher.

Illiya Polosukhin: I was the first to leave. I firmly believed that we would make great progress and that software would change the world. The most direct way to do this was to teach machines to write code and make programming accessible to everyone.

At NEAR, although our progress is limited, we are committed to integrating human wisdom and obtaining relevant data, such as further inspiring people to realize that we need a basic methodology. This model is a fundamental progress. This large model is widely used around the world. It has many applications in aerospace and other fields. It is related to communication and interaction in various fields, and actually provides us with capabilities. As we use it more deeply, we find that it brings more models, and there are not many disputes about copyright at present.

We are now in a new generative era, an era that values innovation and innovators, and we want to actively participate and embrace change, so we seek different ways to help build a very cool model.

Jen-Hsun Huang:This positive feedback system is very beneficial to our overall economy. We are now able to design the economy better. Some people ask, in this era where GPT models are training billions of token-scale databases, what is the next step? What will the new model technology be? What do you want to explore? What is the source of your data?

Illia Polosukhin: Our starting point is vectors and displacements. We need models that really have economic value, that people can evaluate, and ultimately put your techniques and tools into practice to make the whole model better.

Jen-Hsun Huang:How do you train models on a domain? What are the initial interactions and patterns? Is it communication and interaction between models? Or are there generative models and techniques?

Illia Polosukhin: In our team, everyone has his or her own technical expertise.

Jakob Uszkoreit: The next step is reasoning. We all recognize the importance of reasoning, but a lot of work is currently done manually by engineers. We are actually teaching them to answer in an interactive question-and-answer manner, and we hope that they can understand why and provide a strong reasoning model together. We hope that the model can generate the content we want, and this generation method is what we pursue. Whether it is video, text or 3D information, they should be integrated together.

Lukasz Kaiser: I think, do you understand that reasoning actually comes from data? If we start reasoning, we have a set of data, and we think about why this data is different? Then we can understand that all kinds of different applications are actually based on reasoning about data. Because of the power of computers, because of systems like this, we can go further from there. We can reason about relevant things and do experiments.

A lot of times, these are derived from data. I think reasoning is developing very rapidly, data models are also very important, and there will be more interactive content in the near future. We haven't done enough training, this is not the key content and element, we need to make the data more abundant.

Noam Shazeer：Designing some data, such as designing a teaching machine, may involve hundreds or even hundreds of millions of differentToken.

Ashish Vaswani: One point I want to make is that we have a lot of partners in this area who have made some milestones. What is the best automation algorithm? It is actually breaking down real-world tasks into different things. Our model is also very important, it helps us get data and see if the data is in the right place. On the one hand, it helps us focus on the data; on the other hand, such data provides us with high-quality models to complete abstract tasks. Therefore, we think that measuring this progress is also a way of creativity, a way of scientific development, and a way of our automation development.

Jen-Hsun Huang:You can't do great engineering without a good system of measurement. Are there any questions you would like to ask each other?

Illia Polosukhin: No one really wants to understand what steps they have taken. But in fact, we hope to understand and explore what we are doing, get enough data and information, and make reasonable inferences. For example, if you have six steps, you can actually skip one step by reasoning in five steps. Sometimes you don’t need six steps, sometimes you need more steps, so how do you reproduce such a scenario? What do you need to develop further from Token?

Lukasz Kaiser: My personal belief is that how to reproduce such a large model is a very complex process. The system will continue to improve, but essentially, you need to design a method. Humans are good at reproduction. Throughout human history, we have repeatedly reproduced successful scenarios.

Jen-Hsun Huang:It is a pleasure to communicate with you all, and I hope you will have the opportunity to interact with each other and create indescribable magic. Thank you all for participating in this conference, thank you very much!

The article comes from the Internet:Huang Renxun talks with seven authors of the Transformer paper to discuss the future of big models

Related recommendations: zkLogin: A bridge that seamlessly connects Sui ecology with Web2 users

It is expected that tens of thousands of users will be able to use zkLogin, providing users with a simpler and more intuitive onboarding experience. Written by: Chainwire Compiled by: Xiaobai Navigation Coderworld Introduction Sui recently announced that more than 10 decentralized applications in its ecosystem have integrated zkLogin, a unique Web3 login tool, Xiaobai Guide…

share to