AIデー向けお勉強シリーズ①　ジェームス・ワンさん（前ARKアナリスト）

www.youtube.com

DojoやFSDを含めた、テスラAIのポテンシャルを理解するためのシリーズです。

テスラのAIデーを最大限に楽しむために、Dave Leeさんの動画をもとに勉強していきます。

やっぱりARKのアナリストは半端ないな～。このインタビューに知りたかった情報が詰まってる。

ご存じの方も多いと思いますが、ワンさんは、ARKに入る前は、NVIDIAで働いていました。金融界で最もGPUに詳しい方でしょう。

あとDoumaさんのFSD動画シリーズと、アンドレア・カパーシーのいくつかの講演を踏まえると、テスラのアプローチにかなり接近できると思います。

このインタビューのトピックを箇条書きすると

・テスラの保有するデータ量

・テスラがＮＮトレーニングにおいて直面している問題

・トレーニング用と推論用のハードウェアの根本的な違い

・トレーニング用NNと推論用NNの違い

・DOJOをAWSのアナロジーで語ることの妥当性

・水平展開モデル

・水平展開モデルとしてのNVIDIA

・水平展開モデルに必要とされるカスタマーサービス

・顧客がNVIDIAのGPUに求めているもの

・本来は画像処理チップであるGPUをニューラルネットの行列演算に利用することの意味

・テスラの保有データを、NVIDIAの既製品でトレーニングさせた場合のコスト（時間、金額、最適化の度合い）

・垂直統合モデル

・垂直統合モデルとしてのApple

・垂直統合モデルとしてのテスラ

・Appleとテスラの事業モデルの共通点

・テスラがＡＩデーで示さなければいけないこと

・DOJOのスペックとして示されるべき内容

・OpenAIとGPT‐3

・GPT-3のエコシステム

・自動運転を解決するということは、どのようなＡＩ問題を解決していることになるのか

・DOJOをGPT‐3のアナロジーで語る

・画像認識版GPT-3としてのDOJOとそのTAM

・新しいクラスをicrementallyに学習すること、インクリメンタル学習

・トランスファー学習の難しさ

・画像認識の一般モデル（ベースレイヤー）

・テスラが解決するの画像認識AI問題とその応用としてのFSD

・応用事例はFSDにとどまらないが、それはテスラの仕事ではない

・モービルアイ終了のお知らせ

Tesla Secret AI w/ James Wang former ARK Analyst (Ep. 318)

DAVE
you worked at NVIDIA you understand the chip side

you've analyzed Tesla's hardware etc

we know that Tesla has their so-called hardware3 in their cars

they're probably working on hardware4

now they're been working on this Tesla DOJO supercomputer

neural net training computer for the past year or two

and they're prepping for a possible Tesla AI

what's your take on Tesla DOJO

do they really have to create their own neural net training supercomputer

couldn't they use some other solution

and what are the implications for Tesla creating their own supercomputer

can they use it as a kind of AWS

neural net training as a service or

what's the kind of potential going forward with that

WANG
i was surprised when they talked about building their own training hardware

because training hardware is a lot more complex to design than inference hardware

inference hardware is the hardware you use to run the neural network

training hardware the hardware you use to create the neural network in the first place

the big difference is during the training hardware you have to feed it a lot of data

and it's the training happens in the data center

whereas inference is you've already got the software you just deploy it

it's like deploying your app on the iPhone

you just run it in the local environment in this case the FSD computer in the car

if you look at AI chip startups there are way more startups doing inference hardware

than training because training is a lot more complicated

when i saw the announcement came out

i was like why do you need to do this and i think it comes down to the fact that

they have a very specific AI problem

and they have the largest quantity of video training data in the world and for a specific application which is driving

i think the only other one you would compare to is YouTube

for this application of driving

they have more data than every car manufacturer included times probably a thousand

it's orders and orders of magnitude more

and if they were to use off-the-shelf hardware

if they were to order a computer from NVIDIA say like build together a cluster of NVIDIA dgx servers

i think it would cost them probably on the order of maybe 100 million dollars or close to that

it would be probably in that range

and the cost for them to build this in-house

given their already have a team for building FSD

is probably on the order of tens of millions of dollars

but that's not even the point

i'm sure it's not about saving 50 million dollars because Tesla's capex is in the billions

it's more about achieving what's not really plausible using off-the-shelf solutions

NVIDIA's hardware is designed to deal with all kinds of neural networks
language

speech

video
pure reinforcement learning

it's designed to solve

their strategies to launch one ship architecture for every industry-vertical

and then address the verticals using software

Tesla has a vertical use case a single use case problem

Tesla just want to solve driving

there are motivations basically saying we have this very specific use case

we have an abnormal amount of data that the current computers and supercomputers out there are not even designed to optimally handle

you would need a lot of them to fit it in

and we already have a generation of experience building our own chips using our internal team

think of it this way

Andrea Karpathy has a very specific set of software requirements

he can basically list in 10 bullets

if you can give me a computer with x

how much teraflops

how much memory

what kind of interconnect

and what kind of neural network architecture support

i would be able to train

at what rate

and if you plug in that kind of requirements back into what's available off the shelf or amazon

it probably costs an absurd amount of money

whereas if he looks across the cubicle at the hardware team

and say hey can you build that for me?

Peter（ピーター・バノン：テスラAIチップ設計主任） or whoever's running the show right now

that person will be like

yes we can build a five nanometer chip of this size

we can build a custom interconnect that's perfect for your video

in fact we can size the buffers to match the size of the video buffers
and build a super optimized chip

and attach storage and a memory really close to the chip
and we could probably ship it by the end of this year

and that would allow them to basically leapfrog any competition

not that they have any real competition but it would allow them to essentially take all the data they have

which right now is too large to plausibly fit in the training hardware you can buy off the shelf

but actually make it fit in this custom computer they build

and if they can make it fit

they can train the perfect neural network that would actually solve self-driving

and you optimize that, shrink it, ship it in FSD in the inference size

okay Tesla makes their own internal neural net training clusters

it's great it works well for them

DAVE
it seems like there's a couple paths here
one path is

fine it's an internal neural net training computer fine who cares Tesla does

and the results and the benefits are purely FSD

another route to go

can they use this stuff that they've learned and that they've built

to do something else

are there other business lines

can they open it up a service

is there any potential for that

is that even like some revenue that's significant or not what's your take on that

their own training hardware

WANG
it's easy to like go down the road of

oh you have a chip now you can build an AWS or diversify your business

i don't think that's how it works at all for this kind of thing

the whole point of this is how vertical are you.

your first business decision your first strategy decision you make as a business is

are you a horizontal business or a vertical business

If your are in a horizontal business

you build a component like NVIDIA and you try to sell it to as many people as possible

if you're a vertical business model like Apple

you build a very specialized thing for yourself and you keep it damn well to yourself and you don't give anyone and

if anyone even builds something that even looks like it

you sue the hell out of them

those are the only two business models that make sense

anything in the middle doesn't make sense

it's very confused and it's not optimized for anything

Tesla is pursuing the vertical strategy

even if lthey shouldn't have the desire to share this with anyone

because it's just literally throwing away your competitive advantage in the wind

and it's not like this is part of the mission of accelerating sustainable energy

this is not battery technology where it's just good for the environment if you share it

this is proprietary software technology that will help you differentiate against everyone else
doing it it's not part of that open ethos

and secondly horizontal business models have entirely different requirements

and operating realities than vertical business models if you want to sell this chip

as a service now you have to build out a whole team that is about supporting your customers use cases

let's say Tesla is like okay we're all image based sensor array

we have no lidar and and this is why we built the chip

this way you try to sell it to someone that's using lidar

they'll be like oh can you add support for a lidar image map

can you add support for this buffer that buffer

soon you're just like you need a whole team to service customers

that's not what Tesla does

Tesla does not service the needs of VW and GM

they're in the business of serving their own teams first and foremost

just looking through the lens of Apple

i wrote a blog on this

Tesla through the lens of Apple

the strategy is exactly the same

they're going to make their own things the absolute best first and foremost

and that's their level of differentiations against competitors

they neither have the desire nor does it make any business sense to make it horizontal

because it slows them down and it makes no significant revenue

DAVE
elon musk was saying that Tesla can become one of the largest AI companies in the world

at least like shallow-minded, not deep-minded like google but

and you've got this whole Tesla AI day coming up

and if you look at historically their events

with autonomy day and battery day

they have been very significant like strategy events five or ten year foundational events

that they've hosted for a Tesla AI day

one angle you could say

they'll just showcase some of the stuff they're working on autonomy or whatever that narrow case

but my question is like does that really deserve a whole Tesla AI day

then it's also in light of elon's recent comments

that they could possibly become one of the largest AI companies
is there something else you think that Tesla can showcase or really make Tesla AI day about

the other angle is like elon's saying hey we tried to solve autonomy but on the way along the way

we've had to solve a lot of real world AI problems as a physical world navigation

all this stuff in the busy world of humans and bikes and kids and pedestrians all this stuff and

there's a lot of expertise built up with that

that is not just for you're trying to solve autonomy

but you've built up all this extra real world solutions and expertise

like where is this headed

do you see potential for Tesla to get into other real world applications like robots like drones

WANG
that's interesting i wasn't aware AI day is coming

that's very interesting the last time they did i think was a battery day and they showcased some advances

the most obvious thing they need to show is material progress on FSD

because they've been in beta and trialing this out

they've made promises that they've broken over and over again

they need to show a demo that's far more compelling than the palo alto demo they did a few years ago

i think something on the order of complexity of busy streets san francisco

they need to show a like draw dropping demo

to put some of this criticism and skepticism from the press behind them

i think they may talk about certainly DOJO and the kind of the infrastructure side of

how they're going to differentiate and the mechanisms of training

on large-scale video data which is no one is doing

those are probably nuts and bolts

but if you were to speculate on future places they could go

what's interesting is Open AI has provided a perspective on what business you can build with really large scale models

Open AI started off as a research organization for AI like the deep mind of the US
but evolved to a commercial company
and their first product is a product called GPT-3

and it is a generative language model basically a neural network that can write call it english and it's very generalized in the sense that

it not only writes English it can write poetry
it can translate between languages
it can write JavaScript

it because it was trained on the entire corpus of text on the internet

so it's read every stack overflow

it's read every programming manual

it can actually output code

when you train across an extremely large data set

you can basically learn all the sub use cases expressed in that data set

what Tesla potentially could build with its video data set

is a generalized computer vision data set

if the result of DOJO and all this data is

with very little human labeling
it can build a neural network that has robust understanding of images and video

you could think of that as a GPT-3 equivalent but for video
and that could perhaps be deployed in all kinds of adjacent industries

it could be deployed in surveillance security robotics
there are many applications that could become conceivably a SaaS product or
like a API that they could offer to developers
that could just generate pure software revenue

DAVE
if you are solving real world AI where

you actually with vision have to identify not only every single object

but also have to identify its velocity how fast it's moving it's distance from you and from others

and make predictions on where things are going as well

you're solving all of these problems with understanding real world AI

actually maybe creating a 3d type of understanding of what's going on

this type of expertise in real world can possibly apply to many other scenarios or use cases

one angle is Tesla could possibly go into physical robots or drones where they it needs that type of real world understanding

another angle is maybe they can open it up as a web service or API or something
where if Tesla has not just data set but this neural net vision platform

where they can identify not just objects but again it's like everything going around in that environment

they could let other companies other people latch onto
one of the questions i was having was

okay but how does it get better
if a company is using it for a specific case and it needs to be improved in that specific case

let's say they're monitoring lizards or something really niche case
and Tesla doesn't have a lot of lizards
is there a way where Tesla can run a service

where this stuff is can be improved by the very developers that are using it

actually input these images labels or something

where it could actually train the whole neural net to make it better
is that too complicated or is that something that's possible

WANG

i think it's not very easy with current technology for that neural network to learn to incrementally learn a new class

it has to learn from scratch again

typically like human training

human training or human learning is incremental so
if you have to learn a new thing today

you can just write that on top of your existing knowledge

you don't have to delete or start from scratch

but the way neural networks typically is trained is that
if your neural network has been trained on 100 lizards
and you need to learn a new class

you basically add the 101st data set into your data pool

you run it again to learn it

because it all sums up to a probability of one

typically the way it's done is not easy

you can do transfer learning but you tend to forget the older stuff as well

and for GPT-3 there is no way the customer can augment the training data

Open AI does everything

it gives you an API and you have practically no control

you can condition your ass your prompt and answers

but you can't add to their training data set

and you can't certainly do a little bit of incremental training as a customer

and then use that as a custom solution to yourself

i think it's not very easy

like from a Tesla's perspective

instead of being that more flexible

I think it's more like addressing the

low-hanging fruit of

if we can just offer this base layer

generalized computer vision model

let's see what you can do with it and

without doing any customization

GPT-3 has proven out that model even with no customization with client side

actually works pretty good

can generate many useful use cases
thousands of developers are working on it

step one you don't have to get too fancy just give people access to an incredibly robust vision model and i'm sure they'll figure out what to do with it
that's fascinating

DAVE
GPT-3 for vision or real world

one of the the challenges is like with OpenAI they were able to get billions and billions of text from everywhere on the internet to analyze and feed their neural nets and but in Tesla's case it's more limited

it's a narrow niche of just driving

it's not really because there's so many ways to interact with the real world that isn't just driving

it's not as generalized as for example OpenAI's approach to language and text all that stuff

→この返しはちょっと的外れだな。ワンさんはそんなことは言ってない。

WANG
i think it's a vertical specific neural network
it's a driving it's a generalized network for driving
for general vision

yeah it's like it doesn't even have images of inside the house right by definition

i think that is challenging

i think probably the most easiest adjacent industry can do is

to maybe license it to other automakers who need help
because they have less than one percent of Tesla's data set

they could make that a licensing business to that industry vertical

that's probably the most obvious thing to do

but if you're a toyota or gm you would be at lowest to license this piece of software from Tesla
who's already killed you and now you're going to pay them to kill you more

（トヨタが死亡扱いされててワロタ）

But what is your choice

you're gonna use an intel mobile eye with chip which is not really a programmable stack
and still you have no data

there are not a lot of choices