AI: The Future with be PHOTOREAL
00.00
[Music] [Music] metaphysics metaphysic excuse me is the industry leader in developing AI Technologies and machine learning research to create immersive photorealistic content name time 100 most influential companies of 2023 metaphysic is focused on the ethical development of AI to support the genius of human performance metaphysics Cutting Edge proprietary technology Artistry and content creation has quickly positioned the company as an in demand partner for the biggest names in Hollywood among others welcome please the co-founder of metaphysic Tom Graham CEO to share with us what will be the exciting future of AI and the evolving
01.01
ethics around it Tom Graham big [Applause] you hello everyone my name is Tom Graham thank you Ty for that fantastic introduction it's great to be here in Naples uh never been here before but I do enjoy Florida um every time that I'm here so I'm going to talk today about uh AI generated content um and reality so today we can create content with technology that looks exactly like the real world and I'm going to show you a bunch of videos I'm going to go through them as fast as I can to show you as much as I can and then we're going to really talk about um the ethics involved what are some of the solutions to the problems that we're going to see develop um and I think at the core of that problem is really if you can create content that looks exactly like reality that content is programmable you're creating it you
02.00
determine what's in it and then if it looks exactly like reality and it depicts someone doing something and you can't tell the difference between analog video of them taken with a mobile phone and the AI version of them if you can control that version of what people can do can you control reality does reality become programmable so before we get into that uh my name obviously Tom grahe sorry I did that one already but uh you know I was a technology uh internet and Society lawyer for some time and then about 12 years ago I went into creating uh AI machine learning Tech startups so a little bit of a strange uh change but it now for metaphysic gives me the ability to talk about um a few different aspects our mission really is to build the technology and infrastructure and software that can deliver photorealistic AI generated content at scale but to do that in a safe and ethical way for individuals and society and I think at the core of that is empowering individuals to own and control their own
03.01
data because as you'll see it's that data that is required to deliver AI generated content you need data from The Real World pictures of our face recordings of our voice to train the AI models to allow the AI models to create content so here I'm going to show you a video in just one second this is from America's Got Talent when we're on America's Got Talent we brought Elvis back to life um and what you're going to see is a face swap so we're using an AI model to generate a realistic version of Elvis on top of the performance of a young man a 19-year-old Elvis impersonator who was live on stage just as just as I am now you would up on stage at America's Got Talent you could see his um face as Elvis on the big screen so let's have a little look at that video now youin crashing all
04.01
time well you ain't never call R with you ain't no friend [Applause] mine like an angel but I got you're the devil in the devil that was it was a lot of fun we came forth on America's Got Talent uh two seasons ago which was a big departure from all of the things I normally do and my mother was incredibly proud um you saw also there Simon Cal so you saw the people in real life and then the AI version on top and you can't really tell the difference in a in a kind of context where that is just presented to you fresh um that's the quality of AI
05.02
generated content that is coming out of models today so I think that I want you to hold a couple of ideas in your mind as we go through some more of these videos um and one of them is that this photorealistic content if you see it on a screen and it looks exactly like reality it doesn't look like dodgy CGI um or a cartoon or an animation or a caricature if it looks exactly like the real world then I believe that in our minds it lives just like reality lives right so you see it today and it looks like reality and then you sleep for a couple of nights in a month you think about it and it's hard to distinguish whether that was real or not you might know that it wasn't real but our minds process it like reality and I think that when we process things like reality it unlocks new emotional reactions which are much more powerful than looking at cartoons or movies with CGI and V effects the next idea is that I believe
06.01
that 10 years from now all of the content that you look at online on mobile phones at the movies everything that is on a screen even live sports are going to be the AI generated version of that so that gives the people who create the content tremendous power to change and program and modify that content so a lot of the pixels we end up looking at may be subject to that ability to be PRM Med so uh I'm not sure maybe a few of you have the new Apple Vision Pro the the VR headset from Apple you probably seen versions of this from Facebook Etc Apple has just started scanning people's faces and creating AI avatars of themselves so this goes to a point where it's even if you're doing a FaceTime with your grandkids with your friends with your parents whomever you may be wearing those goggles but there is an AI
07.00
version of you that your counterparty is looking at this is how they do the scanning you know you you turn the thing around and the cameras point at your face and they scan and they gather data from your face and you only need a couple of minutes of good quality video data to create an AI model of yourself and we're going to have a look at that in a second so here's a video coming up where it demonstrates how this kind of thing looks really pretty realistic and fantastic already even running live real time through a call so let's play this video I don't know I know if people can see this but this is incredible the realism here is just incredible where am I where are you Mark where where where are we you're in Austin right no I mean in this place we're we're shrouded by Darkness with ultra realistic face and just feels like we're in the same room so that's Mark Zuckerberg and that face both of those faces are fully AI generated versions of those people Apple
08.02
Facebook Microsoft everybody is building the technology to create photo realistic AI generated content that is live in real time here we're going to have a little bit of a look at voice so if we play this video well I'm going to challenge you and your team to do a bit of a first here I mean there's video view right up on that screen show us something surprising you can oh my gosh so there we go this is um you know a live realtime model of um on top of me um running in real time and next you tell me that it can oh Lord Can it can it do voice as well um we we think it can we we're we're really pushing we're really pushing the limits of AI technology now and I'm talking exactly as I've heard as Thomas graner is coming out is the one and only for S
09.00
what I want you to realize about that voice that you're hearing is that's running live real time through a computer and so my voice was coming out and the AI model on the computer was changing into Chris Anderson's voice and it was coming out kind of about half a second after I was speaking and you hear heard it was a little bit lispy right there a few things where it went a little bit wrong that's a good indication of the limits of his models that's because it's getting the feedback from the auditorium and that sound is going through the microphone and the model does does know what to do with it but obviously that can be fixed right the models are so good at recreating voice and also face today so now we're going to go to a live demo I have the laptop here running um and if we bring up that feed so you guys can all see me here um laptop has models of my face embedded in this um laptop and it is taking the video feed from this camera right here and running it through the laptop and we're going to put AI models on top of
10.00
um myself and an esteemed guest So Randy love for you to come to the stage so what's going to happen is we're going to do a face swap um here's Randy it's not it hasn't happened yet here we go okay so if we turn it on um so there's there's Randy as me it's it's added um it's kind of it's trying to interpolate between our different beards there's not a lot of um white beard in my data set unfortunately um so it's struggling a little bit there but I also have the the model kind of running on my face so if you guys see that and Randy you can put your hand in front of your face and you know kind of like wipe it on and off um so you see it's the same model and it's running on my face and it's kind of um perfect you can see there's a little bit of a delay on those screens um but
11.02
you can't tell the difference between kind of AI generated me and real me through the screen and Randy you can tell the difference right because you know it's not there but um you can see that it's doing its best to interpolate the difference in face shape and facial hair and everything in between um and so this type of face swapping and you can do bodies you can do environments um is becoming quite effective so thank you very much Randy so I'll just give you an indication of the actual feed coming out of here um everything inside this box is AI generated so this is what we call the Ice Cube um and when I turn the Ice Cube off it's Auto compositing the AI version into the background um so let's just have a look at that quickly turn it off and then I can turn off the model entirely and that's that's my real face um to be honest I think I like the that's better
12.00
um I get I definitely get more beard it's somehow it doubles my beard so I have like more of a darker more manly beard instead of my my Ginger beard um okay so that's the live demo um this is a great indication of how cost effective it is to create this type of AI generated content so it is running off a laptop with a gaming graphics card in it it's running 100 frames a second so that means that every single frame that comes through it is writing that AI version on top doing all of that rendering um and you you can tell it's it's pretty cheap because it's running right here off a laptop so cheap and fast realtime AI generated content is certainly possible and in a very very short period from now um maybe a year two years our mobile phones will be running these um AI generative models and applications to create content on the Fly where it's fully rendered out on our devices so that's scale here we're going to look at some of the latest things that are
13.00
photorealistic from the company called open AI this is their new Sora model so here they have a model that they've trained on many many many images from the internet and what they do uh is create a gigantic AI generative model that can produce photorealistic video and you have to type in a prompt so this prompt is a movie trailer featuring The Adventures of a 30-year-old Spaceman wearing a red wool knitted motorcycle helmet Blue Sky salt desert cinematic style shot on 35 M VI Vivid colors so let's have a look at this video this is all AI generated it looks from my perspective exactly like 4K video the faces are really quite impressive but what you don't see is the faces moving or talking um or anything like that so that um is really one of the other limits here even though we can create Pho realistic content using these
14.01
big generative models where you can type in anything and it pops out as a video there is the frontier which is human performance um that's one of the things that we focus on at metaphysic I'm going to show you this next one so from the same model drone view of waves crashing against the rugged Cliffs along Big Sir so let's have a look at this video so to me I think it's you couldn't tell that that was AI generated um but what's interesting um when you think about old school 3D rendering VFX Etc you think about physics engines and trying to do water simulations or simulations of hair and we put a huge amount of effort into coding the software to generate those simulations these models you could see that water was moving perfectly like water would it kind of understands the physics of water but there is no software programming in there that describes the physics of water it's just looking at images over and over and over again and
15.01
understanding and inferring how things work based on looking at those images and understanding the relationship between them same thing for the models of my face that we just saw running live real time it's trained off a data set of about 20 minutes of footage of me you know 20 minutes of footage at about 30 frames a second is lots of independent images of me it just sees the images and it trains for a month on a big GPU and it builds up an understanding of how my face Works in different circumstances how to put you know a version of my face on top of a smile or a frown in this case it's understood all the different things around the world water Cliffs lighthouses Etc but it's learned the physics of water and that's really really interesting um this is kind of The Cutting Edge of building these artificial general intelligence models that can really understand how the world works just by looking at the world so now we were talking about uh kind of human performance what of a Frontiers there so let's have a little
16.00
look at this video and I'll describe what's happening so we have Kendall this is a static image of Kendall um with a model an AI model on top but we're able to go into that model and manipulate that model uh and locate the different Expressions that she has and slide them across you know like add 50% more smile or move her eyebrows up and down so that's what we call Neural animation this is one of the forefronts of human performance when it comes to fully AI generated human performance perance so instead of you know the face swaps that we saw where we're using the live action picture of my face to prompt the model here we're using Ai and human creativity to generate new human performance and so you can do things like this which is about to pop up anyone can sing in any language so let's have a look at [Music] this
17.03
[Music] so what you're seeing there um it is the woman's vocal performance and her face performance that is carrying all the way through but what we've done is take the AI embeddings the models understanding of how her face moves and transfer it to Alo Black's face and he wrote this song this aichi song um and so now his face is moving like he's speaking Spanish but he can't speak Spanish certainly can't sing in Spanish and then on the vocal track it's her voice all the way through but we use a different AI model to change her voice into his voice and that AI model is trained on about an hour an hour and a half of him singing and him talking and it's able to create something that sounds kind of really exactly like Alo black singing in
18.01
Spanish what I tried to lay out there is the future of AI generated content we have visuals we have scenery we have voice it is all happening very very quickly I think within a couple of years we will have fully AI generated movies you'll be creating things on your smartphone and you'll be able to change them and program that content and create content like a filmmaker just on your telephone when you're looking at FaceTime through VR goggles you'll see versions of people they'll look realistic but it'll be an AI version and you'll be able to change your appearance increase the quality of your beard uh whatever whatever you prefer add some hair so this is what we call um kind of content that's generated through latent space so AI models uh when you train them the training its understanding of a world we call the latent space um and the ability to generate content from that is exactly what's happening here so there idea that Laten space generated
19.01
content is everything I think is the future of all the pixels that you look at on all screens so we know that reality is going to become programmable and as a society you know what do we need to do what are the solutions right to the obvious problems of privacy and giving data to big companies that can then use it to create versions of us maybe without our consent doing things that we don't want to do and if they can do that and it's photorealistic then maybe it looks exactly like something that people would believe you actually did do that's a problem you should be able to consent to how your likeness your face your voice is used so I think that we need to empower individuals to own and control their data because it's the real world data that is used to train these models and if you own that data as property your face all the images all the video from your face your voice then if somebody uses it without your consent they have used your property without your consent we have laws that cover that but today none of us have any rights or any intellectual property
20.01
rights in how we look or how we sound so there are many Avenues through government through regulation that are trying to address that um one thing that I'm trying to do personally um from a legal activism point is to enable people to own and control their likeness the AI version of it through getting registered copyright in that so I'm going to show you a video here if we play this game hi my name is Tom Graham here's an example of a realistic AI version of myself all of the AI models that are used to create both a visual and audio AI versions of myself have been generated for me at my direction by a company called metaphysic I'm the co-founder of metaphysic but I personally own this video both the original version and the modified version that you're watching now and I also own the input data that has been used to train the AI models that have been used to create the AI version of my image and my
21.00
voice so what we've got there is it's me I've taken a video I've used the data from a video the images of my face to train an AI model and generate an AI version of myself back on top of the video now I've got a video with an AI character in it and it looks exactly like me so then you apply to the US copyright office to register your copyright in that why would you do that the reason is that if you have registered copyright in IP in this case an AI character of yourself if someone creates an unauthorized deep fake of you without your consent and posts it onto a large social platform or anywhere on the internet if you have registered copyright you can get a dmca takedown to guarantee that that's removed from that platform and they have to comply if you have regular copyright that's not registered if you just see something that you don't like you can ask for it to be taken down but no one has to do anything the fact that you register the copyright and you have intellectual rights in that AI version of your self gives you the ability to compel social
22.01
platforms to take down that unauthorized deep fake so this is one of the front lines of kind of Defending against um deep fakes and misinformation um through law that I am pursuing while other people are trying to do kind of topown legislation through Congress at state and federal levels and then obviously globally including the EU you know I think that maybe to leave us here the the magnitude of this problem is um all of the data that we collect from The Real World our face our voice our experiences our perception of reality it kind of encodes our history right it's how our kids are going to learn about what happened today right in our world um our values our shared identity all of that is rolled up in our history it's captured encoded in the data so we should own and control that data as individuals in the same way that we own and control our bodies and how we are portray and what we see today we need to move into the digital AI content world with the same rights that we have today so thank you very
23.06
much thank you Tom could I ask you a question before you before you go I understand your point that there are copyright protections if I register a copyright my my name my image my likeness my voice my face can is mine and I can be protected but by the time I become aware of a deep fake my career may be ruined my marriage may be over I may have done Untold damage to other people uh and then I have the redress through the copyright laws but by then it's fundamentally all over yes this is one reason that we should all be fundamentally concerned and have this kind of like emotion of anxiety about what's happening with the ability of Bad actors to create a version of us so let's think about what we can do um the content is created and it's posted somewhere if you see it the best thing that you can do is have have this registered copyright thing and you can get a guaranteed 24-hour takedown of
24.01
that content and you can grab it grab it grab it from all the places but again by I may not see it in 24- hours and by 24 hours it may be all over and in that gap of misinformation that's where the damage is done that's where the harm is um it's going to be very very difficult to use technology to detect deep fakes because they're so realistic so I don't think that it's meaningful for big social platforms that are posting the content to be able to instantly tell oh that's definitely a deep fake and not post it at all so so we're going to have to live in this world where you potentially can't believe what you see online and you have to be skeptical the first line is skepticism I think we're all going to have to adapt to that um because the best thing that we can do is kind of police our own thing and get it taken down within 24 hours so one other thing quickly before we move on um you mentioned I think earlier in your presentation that in a certain number of years 80% of the content I consume online including live sports will be AI generated how are they going to generate live sports and and how will live sports
25.02
then the outcomes of live sports not be manipulated this is fascinating so imagine if there's a live sports game an NBA game and if you think about the NBA you've all got a vision of exactly what it looks like through the camera you're watching that it's happening live real time you take all of the camera feeds from it happening in the real world and you move it through maybe it's the mba's AI generative models that understand basketball and the output the AI generated output is exactly the same as what you see through the camera but now all of the players AI generated the crowds everything and the reason that's going to happen it's going to look exactly the same is that the TV broadcasters can then program that content to change the advertising on all of the Billboards on all of the players for an audience of just one so if I'm streaming that on my computer all of the players are going to have kind of like logos on their Jersey um that are advertising directly to me or you could you know you could change why didn't I think that it was going to be money driven here and advertising driven what what did I what was I think so you know it doesn't we can watch
26.01
sport today it's a great product right it's it works really well but it could work better and you know when you think about advertising driving things it's kind of obvious that people are going to do that fantastic thank you so much appreciate it appreciate it thank [Applause] you