AIデー向けお勉強シリーズ① ジェームス・ワンさん(前ARKアナリスト)
DojoやFSDを含めた、テスラAIのポテンシャルを理解するためのシリーズです。
テスラのAIデーを最大限に楽しむために、Dave Leeさんの動画をもとに勉強していきます。
やっぱりARKのアナリストは半端ないな~。このインタビューに知りたかった情報が詰まってる。
ご存じの方も多いと思いますが、ワンさんは、ARKに入る前は、NVIDIAで働いていました。金融界で最もGPUに詳しい方でしょう。
あとDoumaさんのFSD動画シリーズと、アンドレア・カパーシーのいくつかの講演を踏まえると、テスラのアプローチにかなり接近できると思います。
このインタビューのトピックを箇条書きすると
・テスラの保有するデータ量
・テスラがNNトレーニングにおいて直面している問題
・トレーニング用と推論用のハードウェアの根本的な違い
・トレーニング用NNと推論用NNの違い
・水平展開モデル
・水平展開モデルとしてのNVIDIA
・水平展開モデルに必要とされるカスタマーサービス
・本来は画像処理チップであるGPUをニューラルネットの行列演算に利用することの意味
・テスラの保有データを、NVIDIAの既製品でトレーニングさせた場合のコスト(時間、金額、最適化の度合い)
・垂直統合モデル
・垂直統合モデルとしてのテスラ
・Appleとテスラの事業モデルの共通点
・テスラがAIデーで示さなければいけないこと
・DOJOのスペックとして示されるべき内容
・OpenAIとGPT‐3
・GPT-3のエコシステム
・自動運転を解決するということは、どのようなAI問題を解決していることになるのか
・画像認識版GPT-3としてのDOJOとそのTAM
・新しいクラスをicrementallyに学習すること、インクリメンタル学習
・トランスファー学習の難しさ
・画像認識の一般モデル(ベースレイヤー)
・テスラが解決するの画像認識AI問題とその応用としてのFSD
・応用事例はFSDにとどまらないが、それはテスラの仕事ではない
・モービルアイ終了のお知らせ
Tesla Secret AI w/ James Wang former ARK Analyst (Ep. 318)
DAVE
you worked at NVIDIA you understand the chip side
you've analyzed Tesla's hardware etc
we know that Tesla has their so-called hardware3 in their cars
they're probably working on hardware4
now they're been working on this Tesla DOJO supercomputer
neural net training computer for the past year or two
and they're prepping for a possible Tesla AI
what's your take on Tesla DOJO
do they really have to create their own neural net training supercomputer
couldn't they use some other solution
and what are the implications for Tesla creating their own supercomputer
can they use it as a kind of AWS
neural net training as a service or
what's the kind of potential going forward with that
WANG
i was surprised when they talked about building their own training hardware
because training hardware is a lot more complex to design than inference hardware
inference hardware is the hardware you use to run the neural network
training hardware the hardware you use to create the neural network in the first place
the big difference is during the training hardware you have to feed it a lot of data
and it's the training happens in the data center
whereas inference is you've already got the software you just deploy it
it's like deploying your app on the iPhone
you just run it in the local environment in this case the FSD computer in the car
if you look at AI chip startups there are way more startups doing inference hardware
than training because training is a lot more complicated
when i saw the announcement came out
i was like why do you need to do this and i think it comes down to the fact that
they have a very specific AI problem
and they have the largest quantity of video training data in the world and for a specific application which is driving
i think the only other one you would compare to is YouTube
for this application of driving
they have more data than every car manufacturer included times probably a thousand
it's orders and orders of magnitude more
and if they were to use off-the-shelf hardware
if they were to order a computer from NVIDIA say like build together a cluster of NVIDIA dgx servers
i think it would cost them probably on the order of maybe 100 million dollars or close to that
it would be probably in that range
and the cost for them to build this in-house
given their already have a team for building FSD
is probably on the order of tens of millions of dollars
but that's not even the point
i'm sure it's not about saving 50 million dollars because Tesla's capex is in the billions
it's more about achieving what's not really plausible using off-the-shelf solutions
NVIDIA's hardware is designed to deal with all kinds of neural networks
language
speech
video
pure reinforcement learning
it's designed to solve
their strategies to launch one ship architecture for every industry-vertical
and then address the verticals using software
Tesla has a vertical use case a single use case problem
Tesla just want to solve driving
there are motivations basically saying we have this very specific use case
we have an abnormal amount of data that the current computers and supercomputers out there are not even designed to optimally handle
you would need a lot of them to fit it in
and we already have a generation of experience building our own chips using our internal team
think of it this way
Andrea Karpathy has a very specific set of software requirements
he can basically list in 10 bullets
if you can give me a computer with x
how much teraflops
how much memory
what kind of interconnect
and what kind of neural network architecture support
i would be able to train
at what rate
and if you plug in that kind of requirements back into what's available off the shelf or amazon
it probably costs an absurd amount of money
whereas if he looks across the cubicle at the hardware team
and say hey can you build that for me?
Peter(ピーター・バノン:テスラAIチップ設計主任) or whoever's running the show right now
that person will be like
yes we can build a five nanometer chip of this size
we can build a custom interconnect that's perfect for your video
in fact we can size the buffers to match the size of the video buffers
and build a super optimized chip
and attach storage and a memory really close to the chip
and we could probably ship it by the end of this year
and that would allow them to basically leapfrog any competition
not that they have any real competition but it would allow them to essentially take all the data they have
which right now is too large to plausibly fit in the training hardware you can buy off the shelf
but actually make it fit in this custom computer they build
and if they can make it fit
they can train the perfect neural network that would actually solve self-driving
and you optimize that, shrink it, ship it in FSD in the inference size
okay Tesla makes their own internal neural net training clusters
it's great it works well for them
DAVE
it seems like there's a couple paths here
one path is
fine it's an internal neural net training computer fine who cares Tesla does
and the results and the benefits are purely FSD
another route to go
can they use this stuff that they've learned and that they've built
to do something else
are there other business lines
can they open it up a service
is there any potential for that
is that even like some revenue that's significant or not what's your take on that
their own training hardware
WANG
it's easy to like go down the road of
oh you have a chip now you can build an AWS or diversify your business
i don't think that's how it works at all for this kind of thing
the whole point of this is how vertical are you.
your first business decision your first strategy decision you make as a business is
are you a horizontal business or a vertical business
If your are in a horizontal business
you build a component like NVIDIA and you try to sell it to as many people as possible
if you're a vertical business model like Apple
you build a very specialized thing for yourself and you keep it damn well to yourself and you don't give anyone and
if anyone even builds something that even looks like it
you sue the hell out of them
those are the only two business models that make sense
anything in the middle doesn't make sense
it's very confused and it's not optimized for anything
Tesla is pursuing the vertical strategy
even if lthey shouldn't have the desire to share this with anyone
because it's just literally throwing away your competitive advantage in the wind
and it's not like this is part of the mission of accelerating sustainable energy
this is not battery technology where it's just good for the environment if you share it
this is proprietary software technology that will help you differentiate against everyone else
doing it it's not part of that open ethos
and secondly horizontal business models have entirely different requirements
and operating realities than vertical business models if you want to sell this chip
as a service now you have to build out a whole team that is about supporting your customers use cases
let's say Tesla is like okay we're all image based sensor array
we have no lidar and and this is why we built the chip
this way you try to sell it to someone that's using lidar
they'll be like oh can you add support for a lidar image map
can you add support for this buffer that buffer
soon you're just like you need a whole team to service customers
that's not what Tesla does
Tesla does not service the needs of VW and GM
they're in the business of serving their own teams first and foremost
just looking through the lens of Apple
i wrote a blog on this
Tesla through the lens of Apple
the strategy is exactly the same
they're going to make their own things the absolute best first and foremost
and that's their level of differentiations against competitors
they neither have the desire nor does it make any business sense to make it horizontal
because it slows them down and it makes no significant revenue
DAVE
elon musk was saying that Tesla can become one of the largest AI companies in the world
at least like shallow-minded, not deep-minded like google but
and you've got this whole Tesla AI day coming up
and if you look at historically their events
with autonomy day and battery day
they have been very significant like strategy events five or ten year foundational events
that they've hosted for a Tesla AI day
one angle you could say
they'll just showcase some of the stuff they're working on autonomy or whatever that narrow case
but my question is like does that really deserve a whole Tesla AI day
then it's also in light of elon's recent comments
that they could possibly become one of the largest AI companies
is there something else you think that Tesla can showcase or really make Tesla AI day about
the other angle is like elon's saying hey we tried to solve autonomy but on the way along the way
we've had to solve a lot of real world AI problems as a physical world navigation
all this stuff in the busy world of humans and bikes and kids and pedestrians all this stuff and
there's a lot of expertise built up with that
that is not just for you're trying to solve autonomy
but you've built up all this extra real world solutions and expertise
like where is this headed
do you see potential for Tesla to get into other real world applications like robots like drones
WANG
that's interesting i wasn't aware AI day is coming
that's very interesting the last time they did i think was a battery day and they showcased some advances
the most obvious thing they need to show is material progress on FSD
because they've been in beta and trialing this out
they've made promises that they've broken over and over again
they need to show a demo that's far more compelling than the palo alto demo they did a few years ago
i think something on the order of complexity of busy streets san francisco
they need to show a like draw dropping demo
to put some of this criticism and skepticism from the press behind them
i think they may talk about certainly DOJO and the kind of the infrastructure side of
how they're going to differentiate and the mechanisms of training
on large-scale video data which is no one is doing
those are probably nuts and bolts
but if you were to speculate on future places they could go
what's interesting is Open AI has provided a perspective on what business you can build with really large scale models
Open AI started off as a research organization for AI like the deep mind of the US
but evolved to a commercial company
and their first product is a product called GPT-3
and it is a generative language model basically a neural network that can write call it english and it's very generalized in the sense that
it not only writes English it can write poetry
it can translate between languages
it can write JavaScript
it because it was trained on the entire corpus of text on the internet
so it's read every stack overflow
it's read every programming manual
it can actually output code
when you train across an extremely large data set
you can basically learn all the sub use cases expressed in that data set
what Tesla potentially could build with its video data set
is a generalized computer vision data set
if the result of DOJO and all this data is
with very little human labeling
it can build a neural network that has robust understanding of images and video
you could think of that as a GPT-3 equivalent but for video
and that could perhaps be deployed in all kinds of adjacent industries
it could be deployed in surveillance security robotics
there are many applications that could become conceivably a SaaS product or
like a API that they could offer to developers
that could just generate pure software revenue
DAVE
if you are solving real world AI where
you actually with vision have to identify not only every single object
but also have to identify its velocity how fast it's moving it's distance from you and from others
and make predictions on where things are going as well
you're solving all of these problems with understanding real world AI
actually maybe creating a 3d type of understanding of what's going on
this type of expertise in real world can possibly apply to many other scenarios or use cases
one angle is Tesla could possibly go into physical robots or drones where they it needs that type of real world understanding
another angle is maybe they can open it up as a web service or API or something
where if Tesla has not just data set but this neural net vision platform
where they can identify not just objects but again it's like everything going around in that environment
they could let other companies other people latch onto
one of the questions i was having was
okay but how does it get better
if a company is using it for a specific case and it needs to be improved in that specific case
let's say they're monitoring lizards or something really niche case
and Tesla doesn't have a lot of lizards
is there a way where Tesla can run a service
where this stuff is can be improved by the very developers that are using it
actually input these images labels or something
where it could actually train the whole neural net to make it better
is that too complicated or is that something that's possible
WANG
i think it's not very easy with current technology for that neural network to learn to incrementally learn a new class
it has to learn from scratch again
typically like human training
human training or human learning is incremental so
if you have to learn a new thing today
you can just write that on top of your existing knowledge
you don't have to delete or start from scratch
but the way neural networks typically is trained is that
if your neural network has been trained on 100 lizards
and you need to learn a new class
you basically add the 101st data set into your data pool
you run it again to learn it
because it all sums up to a probability of one
typically the way it's done is not easy
you can do transfer learning but you tend to forget the older stuff as well
and for GPT-3 there is no way the customer can augment the training data
Open AI does everything
it gives you an API and you have practically no control
you can condition your ass your prompt and answers
but you can't add to their training data set
and you can't certainly do a little bit of incremental training as a customer
and then use that as a custom solution to yourself
i think it's not very easy
like from a Tesla's perspective
instead of being that more flexible
I think it's more like addressing the
low-hanging fruit of
if we can just offer this base layer
generalized computer vision model
let's see what you can do with it and
without doing any customization
GPT-3 has proven out that model even with no customization with client side
actually works pretty good
can generate many useful use cases
thousands of developers are working on it
step one you don't have to get too fancy just give people access to an incredibly robust vision model and i'm sure they'll figure out what to do with it
that's fascinating
DAVE
GPT-3 for vision or real world
one of the the challenges is like with OpenAI they were able to get billions and billions of text from everywhere on the internet to analyze and feed their neural nets and but in Tesla's case it's more limited
it's a narrow niche of just driving
it's not really because there's so many ways to interact with the real world that isn't just driving
it's not as generalized as for example OpenAI's approach to language and text all that stuff
→この返しはちょっと的外れだな。ワンさんはそんなことは言ってない。
WANG
i think it's a vertical specific neural network
it's a driving it's a generalized network for driving
for general vision
yeah it's like it doesn't even have images of inside the house right by definition
i think that is challenging
i think probably the most easiest adjacent industry can do is
to maybe license it to other automakers who need help
because they have less than one percent of Tesla's data set
they could make that a licensing business to that industry vertical
that's probably the most obvious thing to do
but if you're a toyota or gm you would be at lowest to license this piece of software from Tesla
who's already killed you and now you're going to pay them to kill you more
(トヨタが死亡扱いされててワロタ)
But what is your choice
you're gonna use an intel mobile eye with chip which is not really a programmable stack
and still you have no data
there are not a lot of choices