Painting the landscape of generative AI
A quick look at what's going on in the realm of generative AI.
During the last year, generative AI has taken the prime time of the collective Internet focus by storm. Millions of images have been created using AI like DALL-E, Midjourney, and Stable Diffusion. Countless hours are spent discussing the implications of creative AI in the future. In addition to the images, new apps are popping up in the fields of copywriting, video editing, 3D modeling, and more.
While the avant-garde marches on, for the large populace, the whole front is moving so fast that it is tough to keep up. So let’s take a moment, step back, and do a quick overview of what’s going on in the generative AI market.
The generative AI landscape can be divided between the AI models layer and the AI applications layer. The former is a chosen AI technology to solve similar categories of problems in general. At the same time, the latter brings this technology to human use by providing interfaces to interact with the AI.
What is an AI model layer?
An AI can be thought of as an onion, with many layers. Closest to the user is an AI model, behind which you will discover a machine learning layer, which in turn is dependent on the deep learning layer, which relies on artificial neural networks.
But that only describes the "architectural" point-of-view. In addition, we also need to understand the purpose of the AI model.
We can say that an AI model is a tool that helps machines find patterns in data so that they can make conclusions or forecasts without the involvement of a human decision-maker.
In order to make a forecast or come to a conclusion, AI needs a large amount of data to detect patterns. This pattern detection is achieved using machine learning algorithms.
What is an AI application layer?
Using an AI model, developers can now build different applications on top of it. This AI application allows humans to interact with an AI. It offers you the tools to phrase the problem that AI needs to solve, provide additional input, preview and evaluate the outcome, etc.
For example, the underlying Stable Diffusion AI model allows a developer to build an app that creates digital art from text input. The Open API GPT-3 model, for example, makes it possible to build a tool that helps you generate catchy headlines for your blog posts.
So if someone asks you what the AI was that allowed you to create this image of an astronaut riding a donkey to the moon, they probably mean that they want to know what kind of AI application was used and they will zone out when you start explaining the model part.
From text to 3D modelling, generative AI domains
A lot of stuff is happening right now in multiple fields of generative AI. New models are emerging and getting better as we speak. Let alone new applications launched each day,
Below is almost a futile attempt to give some basic overview of how the market landscape looks like at the moment.
Text domain
The most sophisticated AI field is text. Quality matters, but natural language is challenging to master. The models can now write in general short-and medium-forms fairly well (but even so, they are typically used for iteration or first drafts). We can anticipate higher-quality outputs, longer-form content, and better vertical-specific tuning as the models advance.
Code generation
GitHub CoPilot has demonstrated that code generation is likely to have a significant impact on developer productivity in the near future. Additionally, it will make creative coding more approachable for non-developers.
Images
Particularly among users of social media, AI-generated images have grown in popularity recently. More often than not, people prefer to share images over text, which has led to the emergence of numerous image models with various aesthetic preferences. Users can edit and modify images using these models to produce original content.
Speech
Although speech recognition and text-to-speech AI models have been around for a while, consumer and business applications are only now becoming more advanced. The standard is quite high for human quality speech for applications like video voice-overs and podcasts. But like with images, the models of today offer a starting point for further development or a finished product for practical applications.
Video and 3D
AI models for 3D and video are rapidly advancing. People are excited about the potential of these models to open up major creative markets like film, gaming, virtual reality, architecture, and physical product design. The research organizations are currently releasing the initial 3D and video models.
Other domains
There is fundamental model R&D happening in many fields, from audio and music to biology and chemistry.
The below chart by Sequoia Capital illustrates a timeline for how we might expect to see fundamental models progress and the associated applications that become possible.
Source: https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/
From AI domains to businesses
Each of these AI fields has the potential to give rise to a number of fresh business ventures that will revolutionize the way we produce value. While there is still much to explore, we are already beginning to see commercial applications in a number of AI fields.
Art and entertainment
A number of AI startups have been developed using DALL-E or StableDiffusion AI models to enable users to produce imagery almost as easily as skilled graphic designers.
Following the creation of static art, the next steps involve the creation of various video entertainment content, gaming environments, virtual reality, and so on.
Audio, music
Soundraw, Aiva, and OpenAI's Jukebox show how you can use AI to create background music for your videos or improve your Friday night DJ sets. You can see some examples of OpenAI jukebox samples here.
Healthcare
A generative AI can help to interpret the X-rays or provide additional angles. It can detect malignant developments by comparing them to healthy image records.
Product design
Visual AI models can help product designers speed up the process of creating 3D models from early sketches. Based on the basic concept of art, AI can quickly provide you with hundreds of potential options for your product.
Retail and commerce
The use of generative AI can be very beneficial for the retail and commercial sectors as well. People express emotions and rate both the product they purchased and the services the sales organization offered while interacting with goods. User-generated texts, speech samples, and facial expressions provide insight into how customers feel about a product.
Other generative AI mechanisms can track the web activity of online users and analyze user data to determine how effective a marketing strategy or an advertisement was overall. Such information can also be used to identify different customer groups and create targeted promotional messages.
Software development
Generative AI can be employed in software development as well. Software developers can automate manual coding by explaining to a desired solution what they want to obtain, and the machine churns out the requested programs in the necessary quantities.
Another type of generative AI helps developers without sufficient expertise in coding to build apps or various solutions without even knowing programming languages.
In both cases, the development speed of new software products is drastically enhanced, which can be a game-changer in the swiftly progressing business world of today.
Writing and communication
There are a lot of startups launched using Open AI's GPT-3 algorithms that help anyone write better texts for blog posts, advertising copy, customer emails, web content, etc.
These are just a few examples where commercialization is ongoing and new businesses are popping up on almost a daily basis.
Summary
In broad strokes, we need to understand that generative AI development is ongoing in two main spheres: building and developing AI models—the engines of the AI world—and developing and commercializing the applications that allow users to interact with the generative AIs.
Although the text domain in the field of generative AI is the most advanced, the other areas are quickly catching up. Recent improvements have been made in 3D modeling, audio synthesis, and the production of images and videos.
When we asked AI (OpenAI GPT-3 model) itself what it thought about the future of generative AI, the response was the following:
"There is a lot of potential for generative AI to disrupt traditional businesses and industries. For example, generative AI can be used to create new products, services, or processes. This could lead to new businesses and industries that are not yet even imagined.
Additionally, generative AI can be used to automate tasks that are currently done by humans. This could lead to a decrease in the need for human labor in many industries, which could disrupt traditional business models.
But there is a lot of potential for humans and AI to collaborate in the future. For example, humans can use AI to help them with tasks that are difficult or time-consuming. Additionally, AI can help humans make better decisions by providing them with more data and insights. Finally, AI can help humans to automate tasks so that they can focus on more important things."
So I guess this is what needs to be done now - we need to understand what those "more important things" are that we, as humans, need to focus on :)
Acknowledgments
The great inspiration to write this piece came from this article by Sequoia Capital.
This article was edited with the help of Quillbot, a super helpful co-writing AI that makes your blogposts grammatically correct and easy to read.