Disruptor​Digest.com

AssemblyAI: "ChatGPT" for 10 hours of Audio - Strategic Deep Dive #006

June 02, 2023 Dr. Mihaly Kertesz & Viktor Tabori Season 1 Episode 6
AssemblyAI: "ChatGPT" for 10 hours of Audio - Strategic Deep Dive #006
Disruptor​Digest.com
More Info
Disruptor​Digest.com
AssemblyAI: "ChatGPT" for 10 hours of Audio - Strategic Deep Dive #006
Jun 02, 2023 Season 1 Episode 6
Dr. Mihaly Kertesz & Viktor Tabori

Prompt Protocol for Disruptors: disruptordigest.com
Youtube:  youtu.be/kmnlqagutQo
Want to collaborate with us? artisan.marketing

Chat & play with episode on AssemblyAI here.

AssemblyAI is an API-focused firm providing efficient and affordable audio-to-text service for developers. It offers a user-friendly and precise audio-to-text model and a chat GPT-like interface, generating actionable insights from an audio file. The system enables customization of prompts to derive desired results, making it ideal for competitive analysis, sales scripts, and more. It integrates easily with CRM software like HubSpot, offering instant feedback useful in sales coaching. 

Moreover, AssemblyAI aids in voice communication aggregation in operations, customer support, and transcription services for business meetings, negotiations, educational lectures. For content producers, it spins various content types from the same source, while also serving as a research tool, customer support analysis tool, and helps in sales. 

0:00:00 Why Deep Dive?
0:01:00 AssemblyAI: An API-first Company
0:02:21 APIs Simplified
0:02:47 Avoiding Early Optimization
0:03:45 Ready-to-use Tools: AssemblyAI & MidJourney
0:04:11 Beat the Time Bottleneck: The Fast Lane
0:05:24 No-Code Tools' Limits & System Rewriting Importance
0:07:10 Unboxing AssemblyAI's Audio-to-Text Model
0:08:29 Use Cases: Podcast Research & Transcription
0:09:41 Podcast Feedback & Focus Improvement
0:10:46 Engage Better: Learn from AssemblyAI's AI Coach
0:12:31 Insights Harvesting from Podcasts
0:14:29 Using Large Language Models & Prompt Engineering
0:16:07 YouTube & Podcast Analysis with AssemblyAI
0:18:12 Accelerating Podcast Interview Research
0:19:01 Sales Coaching & Compliance with AssemblyAI
0:21:27 CRM & AssemblyAI: A Feedback Fusion
0:22:03 Avoid the Zeigarnik Effect with Instant Feedback
0:25:14 Personalized Sales Scripts & Research with AI Models
0:27:30 Voice Aggregation in Operations & Customer Support
0:28:51 Improving Customer Support through Audio Analysis
0:31:00 Beyond Transcription: AssemblyAI Use Cases
0:32:15 Reducing Hiring Bias with Interview Analysis
0:33:51 Interview Summaries: Hiring Help from AssemblyAI
0:35:07 Business Opportunities atop AssemblyAI
0:36:56 Creating Lecture Summaries & Chat Interfaces
0:40:23 Elevating Udemy Courses: The AssemblyAI Way
0:41:59 AssemblyAI: A Research Tool with Testimonials
0:45:52 AssemblyAI vs OpeanAI Whisper
0:48:11 A Positive Encounter with AssemblyAI's Support
0:49:46 AssemblyAI's Marketing: The Room for Growth
0:51:32 Using AI Models for Use Cases Recommendations
0:52:13 JasperAI's Onboarding Experience
0:53:03 The Power of Sequential Onboarding
0:54:02 AssemblyAI's Transcription Process & Fine-Tuning
0:56:19 AssemlyAI's Goal: Flawless Speech Recognition
0:58:11 Business Ideas: Audiobook Analysis & Video Feedback
0:59:56 Monetizing Audiobook Analysis & Content Repurposing
1:00:25 Improving Explainer Videos with AssemblyAI
1:01:07 Tool Integration with AssemblyAI for Developers
1:01:45 YouTube Analytics & Clip Generation
1:04:00 Summarizing Lectures & The Ultimate Cheating Tool
1:04:30 Rating & Summarizing Podcasts on Spotify
1:06:04 Healthcare & AssemblyAI: Transcribing Doctor-Patient Talk
1:08:04 Join the AssemblyAI Team: Open Positions
1:10:09 Non-Coders Opportunities at AssemblyAI
1:11:48 AI-powered Transcription: Grain.co & Fireflies.ai
1:13:10 Alloware.com, Vidyo.ai & RunwayML
1:15:19 AssemblyAI's Business Canvas

🔒 Insider Show Notes Transcript

Prompt Protocol for Disruptors: disruptordigest.com
Youtube:  youtu.be/kmnlqagutQo
Want to collaborate with us? artisan.marketing

Chat & play with episode on AssemblyAI here.

AssemblyAI is an API-focused firm providing efficient and affordable audio-to-text service for developers. It offers a user-friendly and precise audio-to-text model and a chat GPT-like interface, generating actionable insights from an audio file. The system enables customization of prompts to derive desired results, making it ideal for competitive analysis, sales scripts, and more. It integrates easily with CRM software like HubSpot, offering instant feedback useful in sales coaching. 

Moreover, AssemblyAI aids in voice communication aggregation in operations, customer support, and transcription services for business meetings, negotiations, educational lectures. For content producers, it spins various content types from the same source, while also serving as a research tool, customer support analysis tool, and helps in sales. 

0:00:00 Why Deep Dive?
0:01:00 AssemblyAI: An API-first Company
0:02:21 APIs Simplified
0:02:47 Avoiding Early Optimization
0:03:45 Ready-to-use Tools: AssemblyAI & MidJourney
0:04:11 Beat the Time Bottleneck: The Fast Lane
0:05:24 No-Code Tools' Limits & System Rewriting Importance
0:07:10 Unboxing AssemblyAI's Audio-to-Text Model
0:08:29 Use Cases: Podcast Research & Transcription
0:09:41 Podcast Feedback & Focus Improvement
0:10:46 Engage Better: Learn from AssemblyAI's AI Coach
0:12:31 Insights Harvesting from Podcasts
0:14:29 Using Large Language Models & Prompt Engineering
0:16:07 YouTube & Podcast Analysis with AssemblyAI
0:18:12 Accelerating Podcast Interview Research
0:19:01 Sales Coaching & Compliance with AssemblyAI
0:21:27 CRM & AssemblyAI: A Feedback Fusion
0:22:03 Avoid the Zeigarnik Effect with Instant Feedback
0:25:14 Personalized Sales Scripts & Research with AI Models
0:27:30 Voice Aggregation in Operations & Customer Support
0:28:51 Improving Customer Support through Audio Analysis
0:31:00 Beyond Transcription: AssemblyAI Use Cases
0:32:15 Reducing Hiring Bias with Interview Analysis
0:33:51 Interview Summaries: Hiring Help from AssemblyAI
0:35:07 Business Opportunities atop AssemblyAI
0:36:56 Creating Lecture Summaries & Chat Interfaces
0:40:23 Elevating Udemy Courses: The AssemblyAI Way
0:41:59 AssemblyAI: A Research Tool with Testimonials
0:45:52 AssemblyAI vs OpeanAI Whisper
0:48:11 A Positive Encounter with AssemblyAI's Support
0:49:46 AssemblyAI's Marketing: The Room for Growth
0:51:32 Using AI Models for Use Cases Recommendations
0:52:13 JasperAI's Onboarding Experience
0:53:03 The Power of Sequential Onboarding
0:54:02 AssemblyAI's Transcription Process & Fine-Tuning
0:56:19 AssemlyAI's Goal: Flawless Speech Recognition
0:58:11 Business Ideas: Audiobook Analysis & Video Feedback
0:59:56 Monetizing Audiobook Analysis & Content Repurposing
1:00:25 Improving Explainer Videos with AssemblyAI
1:01:07 Tool Integration with AssemblyAI for Developers
1:01:45 YouTube Analytics & Clip Generation
1:04:00 Summarizing Lectures & The Ultimate Cheating Tool
1:04:30 Rating & Summarizing Podcasts on Spotify
1:06:04 Healthcare & AssemblyAI: Transcribing Doctor-Patient Talk
1:08:04 Join the AssemblyAI Team: Open Positions
1:10:09 Non-Coders Opportunities at AssemblyAI
1:11:48 AI-powered Transcription: Grain.co & Fireflies.ai
1:13:10 Alloware.com, Vidyo.ai & RunwayML
1:15:19 AssemblyAI's Business Canvas

Viktor:

Welcome to Disruptor Digest, the top disruption business show. We dig up the secret playbooks used by first movers, featuring the latest tools, technologies, and science, ensuring you won't fall behind or succumb to fomo to Singularity and beyond.

Mihaly:

Hi, Viktor. Hello. What are we gonna

Viktor:

talk about today? Yeah, we gonna do a strategic analysis of the AssemblyAI company, which is basically a audio to text ChatGPT quote, unquote, so you can upload any kind of audio and ask questions and you can process automatically any kind of audio up to 10 hours. And why we do that, because there are too many tools on the market and it's almost impossible to even keep up with them. And we just want to share with you the all the tools we use, the good and bad and ugly, and how they can be. Efficient and how they can help you to bring your business to the next level. Actually, how we use it and what kind of businesses can be built on top of these tools we are, which we are using. And can you give a quick recap of this company? Okay.

Mihaly:

So what is AssemblyAI? AssemblyAI is a API first application, so it is built for developers and current, and we check and uh, they already raised 63 million from big names like Excel and Insight Partners and also the ex CEO of GitHub Net Friedman.

Viktor:

You mentioned that it's an API first company. And those who are not familiar with this expression, it means that, for example, Stripe, maybe you heard about Stripe. It's a payment processing company and they grew by first catering to developers. So they made it extremely easy to integrate, uh, payment into your applications. And that's how they grew. And they are kind of like using the same playbook, but in the, basically the audio processing field. Is it right? Yeah, it's correct.

Mihaly:

And their founder and CEO said the exact word that they want to be the stripe of a speech, uh, to text, uh, ai, uh, to us. And also there are good a API and bad API and, and good API is something like Stripe that even I was able to, uh, integrate it with an application. So it means it, even if it's code, it's easy to handle. And there are a lot of tutorials on YouTube how to do it. So even if it's for developers, but it, it's for no code and low-code developers as well. So when you hear API and you are not a coder like me, you don't have to be afraid.

Viktor:

Yeah. Don't scared of. So that's, that's kind of like the takeaway. It's, it shouldn't be scary. And, and to be honest, it's not scary. So even you are with a veterinary background and, and marketing research background and, and marketing creativity and, uh, background. You, you were able to integrate it. So you shouldn't be scared of APIs. But anyways, so. Why we are talking about this Because the number one biggest mistake I see people make, they optimize prematurely. And what does it mean? They kind of like try to fine tune, they try to see which model is the best and they kind of like get themselves busy without first staying on a high level and actually trying out things. And what does it mean? It's like, for example, for open ai, there's this foundry model where you basically have to make at least $80,000 commitment per year and you get dedicated compute. So in open AI's case, you can basically find tune GT four, which is not available for public if you pay a lot of money. But it only makes sense if there's a clear use case. So it's like one use case and it's a huge scale, right? For example, you are Jasper AI or copy.ai. So these companies which are offering a very narrow service of, okay, I have you to write better, So that's what they offer. Then it makes sense to optimize, obviously the model they use. But for everyday forks like you and me, and I guess most of the listeners, it doesn't make sense to prematurely optimize. So instead of like going off road and trying to find a better and faster track, you should stay in a fast lane and actually use companies like AssemblyAI, use tools, for example, like Mid Journey and these kind of tools which are actually usable out of the box. And you don't have to to get your hands dirty, dirty regarding the model itself and fine tuning the model itself. And why does it make sense? Because the bottleneck is actually not the cost. So if you think about that, a few hundred dollars, if you spend on GPT four, for example, through api, it's not really much. If you're a developer, it's not really much if you have a proper hourly rate, right? The actual bottleneck is time, right? So you should be. Progress much faster and deal with the exact problem you're facing and instead of like just tuning the model itself. So 99% of the time exploration and testing is the limiting factor. So always choose the fast name, use tools that others are already optimized. Obviously if you found or u good use case, then think about the model underlying the model and how it can be optimized. But most of the people, I think this is the biggest mistake they make. I

Mihaly:

heard the agreement from the other side about this. So actually this is from, uh, Petya OG, who is one of the most successful, uh, startup pair and now investor in Hungary. And he said about NOCO to us, and I think this fast lane that you're talking about and also NOCO tools have some parallel things, but he said that maybe the first 80% is cheaper if you're building a product, but the last 20% can be impossible. With this fast lane approach and no code tools. What do you think

Viktor:

about this? I don't agree with that. So it's like, uh, you will be re rebuild the whole system anyways, right? So as soon as you find like a product market fit, you have to rewrite it. So even I, I have a good friend who is building his company and they already rewrote it from the ground like four times and he's the smartest person in it. So he knows all the mental models, he knows very deeply the ar different architectures. So he is really like the smartest guy I know. And they have had to even rewrite the whole code anyways. And they

Mihaly:

didn't even start

Viktor:

from no code, right? Yeah, yeah. Right. So, so they properly planned it, right, but you, you cannot plan for everything because life unfolds as you explore. So that's kind of like the why ChatGPT is useful because it can adapt to your need and it's not just like a predefined vase and predefined, this is why ChatGPTworks because it's not a predefined decision tree. You have to stay in that. It's, it's actually can adapt to new situations. And I think at first, like progressing faster means a lot more than optimizing. Obviously the, the problem is, is a siren call, like siren song, that it's very seductive that, okay, let's explore new tools, let's explore new models, let's fine tune them, let's check which is even better for free and those kind of things. But if you think about that, It's sexy for sure, but you won't have business faster at the end of the day. And we are here about the business of ai, we talk about business of ai. So if you're interested about making money, then being faster and quicker and iterating faster means makes more sense than just trying to optimize the underlying models. That's what the, some of the AI is. It's, it's kind of like a fast lane in audio ai and just as I said, it's basically chat gt for audio. So, um, what does it mean you can upload up to 10 hours of audio to them and they transcribe it? Right. And they use a conformer model, which is, uh, what's published in 2020 by, uh, Google Brain, and it's using the transformer model, either attention manner and so on and so on. But anyways, you don't have to get into details, but just for their listeners to have a good anchor point. It's kind of like similar, what BIS is doing, and BIS was published by, uh, open ai. It's, it's kinda like a similar system, but, uh, smd AI's audio to text model is actually producing less error, so it's more accurate and it's quite reasonable. The pricing is quite reasonable. What does it mean it's reasonable? So if you want to analyze and transcribe an hour of audio, it's just a couple of bucks, so it's one to $2. That's all it takes. So it's, it's quite reasonable. So you can basically upload it. And ask any kind of questions, let's get into how is it already useful? So if you want to use it out of the box, and even let's say without like much coding experience, they have a nice playground. You can upload audio and you can ask questions. So how can you use it today? How do we use it

Mihaly:

and what is the reason that we talk about them? Because we use AssemblyAI for every single episode we use for this episode in the research phase. And also we use it in the post-production when we write the show notes. So what we are doing in the preparation is we are looking for similar episodes. We're talking about the similar topic. So for example, in the previous episode we talked about mid journey image generation, and I gathered five podcasts, long podcasts, about mid journey. And it's a socio black privacy reasons and every kind of topic. And what we did is just. Get the YouTube link, put it into the playground. In AssemblyAI, it took about five minutes and it has the whole transcript. And after that, there is an interface there as well, a t like, uh, chat interface where you can ask about the transcript so you don't have to read them all. And it's very long. So currently it's longer than the maximum capability of ChatGPT. So it's very hard to analyze it with ChatGPT. But you can ask in AssemblyAI about what are the key takeaways of this podcast? What are the highlights? What are the most actionable steps? And we gather it and we take it in note and, and we use it. And also we ask it, how could this. Podcast would be more viral. And how could it be more engaging? And this is we, what we try to incorporate in our

Viktor:

podcasts. Can you give an example, like how useful it is? So it's kinda like, what was the feedback, which is was either way just like a reaffirming or it was completely new or you didn't have it top of the mind. So yeah, it makes sense after the fact it says, but we, we weren't thinking actively about that.

Mihaly:

Yeah. So for example, there's a. Recurring feedback from AssemblyAI and this is lack of focus in podcast episodes. Also, I think this is true for us, uh, sometimes that we touch a lot of topics, but for example, in Mid Journey, it was a actual feedback from AssemblyAI for other podcasts episodes to keep focused on something and go deep. And this is, we try to achieve, for example, we talked a lot about how to create a logo, how to get inspiration to get a logo. So we try to make chapters and dig deep in chapters. Oh, okay. That,

Viktor:

that makes a lot

Mihaly:

of sense. This is the actual interface of AssemblyAI and this is our episode four when we talked, when we talked about marketing research and the question of AssemblyAI is, Uh, to summarize this trans transcript, like I'm 12 years old and actually this quiet Good. So it starts with Viktor and his friend. Oh, I guess that's me. Have worked together for a long time having different companies. They want to show how they have a small college called Magdia College. Okay. It's McDaniel, so it's not 100%, but still, uh, that was losing students. And uh, here is the next paragraph. I I just want to read this. They told the college, uh, to first to talk to students instead of making website. Viktor's friend gave students pizza to ask why they choose the school. Students said they like meeting people from other countries. And Viktor, what do you

Viktor:

think this is ex exactly what happened. So that's kind of like exactly how I would. Summarized for 12 years or because you did the research, but yeah, you offered a pizza. That was the incentive to involve the students in the research itself.

Mihaly:

So we not usually use this, but this is the default first thing that assembly offers you. The, the second one is we actually ask about other podcasts in the, in the pre production, in the preparation of our podcast. So how can I make this episode more viral and more engaging? And so it says, to make this episode more viral and engaging, focus more on specific stories and examples from research. The transcript focus is manual research method and tools, but concrete example of insights gained and how they shape the cooking class concept would make it more actionable and interesting for listeners. Uh, I can totally agree with it. What do you think Viktor?

Viktor:

Yeah, that's, that's spot on.

Mihaly:

Yeah. We, we didn't mention enough concrete insights. Okay. One more paragraph. Okay. Providing a full example of how one insight led to key decision or marketing message would have demonstrated the value of research technique discussed. Yeah. Okay. Where we were focusing too much on the marketing research and not, how would it, how uh, was it

Viktor:

implemented? Yeah. How, how to make it like more tangible, I guess. So it's not just. This is how we do it. But we normally give an exam examples when we present at conferences that, okay, we made a test on Facebook, for example, for a, a classical music concert. And, uh, we realized through testing that actually having a blue background doesn't hurt. And having a red background, even though it's contrasting compared to the blue Facebook color, it doesn't help. And what helps is having an extremely strong emotion. And, uh, that's kind of like what we implemented in the actual communication and, and, and creatives as well, that we showed the band or the piano guys in this example, uh, in pictures where they enjoyed playing music a lot. Okay.

Mihaly:

And the last one is, what are the most actionable key insights of this episode? And again, this is something that we use, uh, when we, uh, create the show notes. And oh, we got a very short answer. The most actionable key insights are to focus research on understanding customer's, pains and goals, identifying specific benefits to emphasize and segment the market into different customer groups that may have different needs. The speakers could have achieved their objective more effectively by conducting in-depth interviews with potential customers early on to gain a deeper understanding of customer needs. Viktor, I don't like this answer too much because it's not enough. What are the the most actionable key insights of this episode? I need bit refined this question and let's see, and we can see how fast it is as well. I

Viktor:

think it's very fast. So this is something about these large language models that they're not perfectly deterministic. So if you run the same prompt multiple times, you can get different results, but obviously you can mess around with the prompt to get, make sure you get the answer you're looking for.

Mihaly:

Yeah. Okay. So there are more useful, for example, the second one used to GP to do in an identify customer pains, gains, and jobs. Really good kind keyword research to find related question and keywords. This is actually what we mentioned, and it's very distilled down to one sentence. Analyze competitors, uh, benefits and website headlines. Great. And just one more.

Viktor:

It's, it's good. I mean, we can read, it's like read reviews to uncover additional benefits. That's what we do. So it's like we actually reading reviews to understanding a product or service. Uh, yeah. Use Chat GPT to summarize reviews and identify benefits. Talk to friends who own companies to learn from their experiences. Yeah. It's, it's also like talking to other companies and, and friends and, uh, easy to talk people, uh, create a basic landing page sales script or email to start testing. Yes. That's something we also mentioned to start lean and we can do, and then you can refine later on. Focus on iterating and testing as much as possible. Exactly. And get and gather data to double down on what is working. Yeah. So it's, if something is working, then like push it to the, turn it up to 11, right After a little bit of editing

Mihaly:

and maybe refining the pros, uh, a little bit. It'll be a good start for a LinkedIn post for example, or a Twitter

Viktor:

thread. What do you think? Yeah, sure, sure, sure, sure. And also it's like the, the easiest and. Lowest hanging fruit is like feedback as you said. So after we do an episode that's kind of like, yeah, okay, how could it be better? Right? And we get the feedback and we try to incorporate that. But also this research is, is quite ingenious because, uh, we can kind like go ahead and see what others produce. Then you can do the same. So it's like whatever field you are operating in, just go on YouTube, find some videos, and, uh, And analyze it. So basically just feed it into AssemblyAI and ask questions. Uh, so that's, that's quite neat. And also, I guess it's not just for content creators, but also for enterprise it's quite useful as well. So for example, in the case of AssemblyAI, they can use the same thing, right? So they can use the YouTube search about like whisper and Google products of these transcription services and see what people are talking about. Yeah. So for

Mihaly:

example, in case of open ai, they are famous enough. So a lot of YouTubers talk about their products and one of them can be whisper ai, right? Yes. And, and so they don't have to listen to hours of, uh, YouTube and some of them are beginners and it's very hard to extract the insights from these videos. But you can feed AssemblyAI with like 100 hours of podcast and uh, talking head videos of YouTube and you can just gather the insights, what they are

Viktor:

talking about, right? It's insane, like on not just the cost side, but also the time. Saving side as well. So you don't need to employ someone or several someones to parallel process everything and, and make your research, which takes like weeks or month or something. But in this case, it's literally takes like one hour. You can go through hours of video and analyze them and summarize them and see the big points, see the missing points, and then kind of like summarizing them. You have a good outline of, okay, what do you want to cover? And it's also good for, uh, podcasters as well, because if you think about that, like it takes around 10 to 15 hours at least to research someone, right? So it's to read everything what they produced, listen to everything, what they produced.

Mihaly:

So when you invite a guest for your podcast to interview, and you don't want to ask the same questions that everybody asks from him or her, like, and you want to get a new angle and the actual process is to read a lot, listen to a lot, like 10 to 15 hours, right? But how can assembly help this process?

Viktor:

Yeah, it's just like, in this case, if you already just like make a quick search on YouTube, it's already in a order, sort of the, the search is already sorted by obviously some relevant factor and those kind of things. So if you search for someone's name, then you can just basically copy the top five YouTube videos, feed in the assembly, and ask these exact questions. Okay. What are the main takeaways? How could these interviews be better and what are the missing points? If you just ask these question and summarize them literally in 15 minutes, you save like days or weeks of the work and, and you can create much better work. Yeah, this is obviously great for content creators, small content creators like us, but also it is good for enterprise. If you think about that, like, uh, sales, coaching and compliance just comes to my mind. That's something which is quite hard to do otherwise, right? Because if you want to keep a close eye on how your salespeople are following sales scripts, right, it's quite tough because you have to listen, right? And what's crazy, it's already can be done. It's already possible. We're not talking about like what is going to be possible in the future. But what you can do now is after every single sales call, you can integrate AssemblyAI and it transcribing the sales call. You provide the transcript because there's a contacts, you can provide contacts. So this is the, the transcript which have to be followed and ask, okay? Was it followed and how could it be better? Right. Not do anything else. So it's just, just giving a score after each call that how well it was, uh, the script was, uh, followed just by measuring everything gets improved. So that's kind of like management 1 0 1. You have to first measure, and what gets measured gets improved, right? So if you feed back a timely matter, so as soon as someone is finishing and they see, and you don't have to even like incorporate like any kind of incentives into it. What you have to do is just like, really just feed it back, right? So they see on a dashboard, like how they fare, so how many calls they make, they obviously know that already, right? They know the closing rate, but it's like kind of like a big gap between them, right? So they want to have higher closer rate, but that's what they see basically. Now it's like how many calls they make and how many successful sales or lead generation was done, right? But with this, it's actually, it's a fine tune. It's not a binary thing. And it's not just like a one. Number. It's, it's, it's, it's kind of like a moving scale of like how well they do. And they, if they see on a graph, obviously it's, they're gonna improve. So most of the people are going to improve if they have a timely feedback.

Mihaly:

Currently it's, it's reality. You can do it now. Right? So there is a API with assembly directly and also there are some third party, uh, applications who are integrating with CRM software like HubSpot and other ones we'll talk about them. So this is currently reality. This is not science fiction, what we're talking about.

Viktor:

Yeah. Right. And I mean, it's like nothing beats instant. So it's like that's the beauty of ChatGPT as well. I can ask stupid questions, which I wouldn't get into otherwise I wouldn't ask like what the difference between different kind of peppers. Like spices because there's actually like lots of peppers and I wouldn't ask it, but I could, I did it and I, I, my mind blows was blown that I ordered like 15 different peppers and they all different and like Chuan pepper and tea, pepper and some of them have numbing effects. Some of have like a very citrus flavors. It's the whole new word opened up in five minutes. Just because PT can answer an instant, that's for fun. Right? So it's, it's, it's not making more money for me, I just get more knowledgeable because I can ask this question because it, I know it won't take like half an hour of my time. It's in two minutes. I will be, I will know everything, what I want to know about a new topic. But in a business setting, if timely, instant feedback is given to your employees, to your salespeople, it's invaluable. Right? And they actually want to do better, I guess. So they actually want to learn more. They want to be more successful because that's why they chose the field of sales. But now there's a tool to not just do like a, a weekly, quarterly, monthly, I dunno, review and just like go through and listen to a few phone calls and going through them. It's like specifically for every single phone call, there's a feedback. And that's insane. This is mind blowing. And also what I want to mention regarding this is like, this is just like the easiest way of integrating it. So you don't have to do anything. Fancy. You just like give a feedback basically with the, the existing tool. But what you can do also, it's not even more complicated or complex, is ask like, okay, give me top three things which went well and top three things which could have been better. Right. And already like, uh, it's assembly. Yeah. I can coach and can give answer to, to these kind of questions. Right. That's also good because it's not like you have a score of 78% and you kind of like not follow these steps, but also you get like positive feedback of what man. Good. And also you can get, uh, very specific feedback about what could have been better and there's this, um, gigantic effect. I'm not sure that you're familiar with that. Can you tell me what is it? The Zeigernik effect is basically describing why Vaers. Uh, can keep in mind lots of orders, but as soon as the orders are done, they completely forget it. So, so it's, it's kind of like the distinction, the, the big contrast between they have a very, seemingly have a very good working memory, so store lots of orders in their head and suddenly as soon as that table is gone, it's just like the memory is vanished. So that's kind of like intriguing thing that SAIC effect is basically says that open issues take up mind space. Right. So that's what's happening with Raiders as well, so they have other orders in mind. And uh, it taking up mindspace until it's closed. So closure, it's kind of like freeing up mindspace. It's helping you and llm. So these larger language models are good because they get you an instant closure, right? That's why for example, if I'm reading a micro BU to read about some economic thing, which I don't understand, it's not bothering me because I'm just asking ChatGPT to explain in plain English and just copy paste the tweet and it's explaining to me. And that's the same here with sales people as well. That alright. Here was a sales call, let's get a closure. And you can get a timely feedback, a timely closure as soon as the call is done, evaluation can be done. Like, okay, what the percentage you did and what were the three top best thing, top three things, which could be better. And that's kinda like freeing up my space and then you can move on instead of just like reliving in your head, okay, what I'm not sure what could have been better. Right? So it's, it's like it's taking away this whole uncertainty out of the picture. One more thing here, which just came to mind that we talked about like with research and, and, and coaching the feedback for content creators. So it's not just post talk. So it's not just after the fact. It can be used before making a call and obviously it doesn't have to be assembly, it, it can be Chat GPT, kind of like large language model where you feed into the, the system. Like, okay, this is the company I'm calling, this is the person I'm calling, this is the LinkedIn profile of the person I'm calling. This is the website of the person I'm calling. And then give me top three things I should focus on considering this is our sales script, right? So it's, it's not just like, okay, there's a sales script and I don't really follow it because it's the same word, same word all the time. So it's not really personalized, but you can personalize it and just like this, to the top three things you should focus on even the history. So it's like, what was the communication chain of communication with the, with the exact person. And if a good friend of mine, they create a CRM system and they have 2,500 clients. And what he did is just like scraped over the websites of their companies and fell into Chat GPT, got back the gist of it. So what they are working on, and then make some clustering on it, on top of it. And now he has an understanding what are the different needs, which are actually solved by his software. So that's kind of like the idea you can do in today as well. It's not even complicated. It's, it's quite trivial to do. But that's kind of like just a food forough that you can do it the same way as with the content creation for the research purposes. You can do the same thing for CS as well. Yeah, it's

Mihaly:

very interesting. I just started the research for new client and this sparked me an idea that, uh, they have 400 clients as well, so this is big enough to scrape their websites and understand the profiles and cluster them. Of course, they have segmentation and things like this, but yeah, uh, it's a good idea, Viktor.

Viktor:

Thank you. I'm gonna send you an invoice. No worries. So, uh, let's move on, like in a business setting where you could use again, like today so you don't have to debate and we are not still, not speaking about the future. So in operations and customer support, aggregating voice communications a huge plus. So once again, don't know nothing fancy, right? It's not like you need to have a complicated pipeline of different, uh, modules of AI and, and those kind of things extremely easy. Just like you have shit load of voice data, just feed it through AssemblyAI or, uh, a similar system. And then just like nothing else, just like basically just aggregate them or, or make a summary of what is the need of, what is the category it should be put into and aggregating similar things together and processing batch processing them like, um, Variant issues for an e-commerce company, if someone is processing variant issues, it's completely different than payment issues, right? So if you have to process like 10 variant issues first, obviously whoever is doing it, it's much easier if it's already like aggregated, right? Categorizing into, and they do just the same task. That's kind of like a, once again, management 1 0 1 is like, it's easier and better and, and higher quality unless mind space is taken up, if your job is batched into similar tasks. So that's kind of like, what can we do today as well? And obviously on top of it, you can extract information and ask like, okay, how could we do better? Right? Here are lots of feedbacks, right? Which we get and how can we improve? If you just like ask the the questions like, what is missing? How could this serve this customer better? And you just get this answer and aggregate those answers. So you do this for the audio audios and then get the result and summarize it with, or even with it doesn't really matter. Then you can obviously get a very specific feedback about how you can improve. And it's not just like an, an interesting thing is like, okay, obviously this is something which you could do by just like, Employing lots of human capital, right? So just like throw a lot of people to the, to the problem, they have to listen. This is just a right. So it's like a ve flow of you have to listen to the call, you have to answer these questions. Okay, what is the problem? How could we do better? Then give the task to someone to summarize it. So this can be done already. And if you just like sit down, you should just go to the customer support team and have a discussion, be good discussions with them. It's going to be there already, like what are the top three issues? But what's intriguing and what's get unlocked, and this is the first time it can be used on a scale is like emerging problems. So what is a new problem? Right? What is below already the threshold of the really noticing, but it's kind of like starting to bubble up. What are those issues? So kind of like tackling these issues before they grow into a huge and even bigger problem, which is a bigger challenge for us. So that's kind of like, uh, also what's possible regarding customer support already. So if we are not speaking about like sci-fi and those kind of things, we, we only talk about what's possible. Also be, it is gonna be in the show notes. There's a PR R D R, uh, tool. Don't read summary example on, on open ai. It has an exact prompt, example prompt of, okay, this is the input tax and what is the, the summary of it so you can just like copy this information. And that's kinda like reducing all the r d to the T RDR R and getting the T RDRs and just reducing them to the biggest takeaways. So that's kind of like the workflow, if it makes sense.

Mihaly:

I would like to mention two other use cases here, which is currently available and already a lot of companies using. First one is meeting notes, and the second one is hiring. So about meeting notes, there are several applications that will transcribe your, uh, company meetings and also make the key highlights. So if currently people are taking notes, it takes a lot of time and, and money from your coworkers. And also, uh, it's very easy to not forget things. So for example, highlight the to-dos in a meeting note. And you can integrate it in, in your

Viktor:

humans, make, make mistakes as well. So it's like we love to make fun of AI models making mistakes like hallucinations coming up. It's something which is not true and those kind of things, but humans actually make mistakes as well. So our, uh, the quality of the output we, we do is not a hundred percent. It's, it's, it's depending on a task, it can be even much lower, like the 60%. And definitely if someone is like having a bad day or they're just doing the same task for a long time, they get fatigued. So like this task fatigue kicks in. So that's the beauty of these emo that they don't get fatigued. Yeah, obviously they can make mistakes, but as they improve, they constantly improve so they stay at that level and they don't have a bad day after that.

Mihaly:

There's a famous story about, I think about the new year field harmonies, that their admission was 15% women and they changed something and the next year, uh, they had a almost 50% admission of women. And Viktor. Can you guess what did

Viktor:

they change? I guess they just, uh, masked the gender of the applicant. So they, the, the hiring managers didn't see it.

Mihaly:

Yeah. Actually, these are very experienced musicians who are listening to the applicants and what they changed is in the previous year was they were able to see them, how do they perform? And also they saw their gender next year. It was like, they

Viktor:

just hear it. Yeah. Yeah. So they reduce it to, to an audio. It wasn't a video anymore. They just like, oh, because that's kind of like the output you're gonna produce. Right. So, Whatever is the role you are getting hired, you should be judged based on the output which is required, which is mostly, most of the time it's not tied to your gender or race or whatever, right? So in this case, because you're gonna be a musician, you're gonna produce music. So your output should be judged based on the music you produce. And this is genius, right? So, so I love it.

Mihaly:

In a case of big companies like Google, they have about five rounds of interviews for new hires, and in this case, assembly, AI can summarize these interviews and give them a rating and you can compare it with the people who are of course biased. I'm not sure, but maybe Assembly is less biased than people. But also this is developed by people and, and developed of content created by people. But I guess still there is an opportunity to make less biased. Decisions in hiring.

Viktor:

Yeah. And just to, to make it clear, we are not talking about automating a way to decision making, right? We're talking about like augmenting it and making it less biased or following like, uh, just the same bit sales script, right? So just following the sales script, following the hiring script better, right? So these are the criteria, objective criteria, which we look at. So please judge it. And obviously it, it has to be reevaluating how well it performs, right? But, As an addition. It can be huge in getting through more candidates and basically judging them faster and also giving them personalized feedback. Like, okay, this time is like it wasn't a good fit because of this, this, this, or We just keep you in the loop because you were very good fit, but you just didn't make the cut. So in the future, maybe we would like to hire you. So just like the communication wise and personalization wise. So basically just like supporting the decision as a supporting tool, aiding the decision. It can be invaluable and it can even make it much better and much more objective in my mind. But obviously you have to be mindful about the downside. So this is, these are not perfect. You shouldn't treat them like, like perfect, but just like testing them, it makes a lot of sense if hiring is an issue for you.

Mihaly:

Okay, so we talked about, uh, compliance, quality assurance. Hiring, meeting notes and CS as well. This

Viktor:

is nice. Let's move on to the next topic, which is giving ideas to entrepreneurs out there who want to build on top of assembly or similar tools, so they want to create businesses because now we have just covered how businesses can use it or the existing businesses, but obviously there is opportunity unlocked and let's look at that. What kind of opportunities are there? Okay, so

Mihaly:

first transcription for niches. What

Viktor:

is it? Viktor? Transcription is like, yeah, okay. Assembly is already. Providing the same solution. So how can we build a business on top of it? But that's the beauty of niches. So niching down like to, uh, blue ocean is, is always a valid strategy. So if you just focus on, for example, business meetings or negotiations or interviews for UX researchers or lecturers for educational institutions. So mos, so it's massive online courses. So if you think about these niches, they have specific needs. So even though the main engine is the same like in a car, right? So the engine can be the same, maybe the car itself is like, it's good for off-roading, it's good for Formula One, it's good for like putting out flames as a fire truck or something like that. So even though the, the main engine is the same, these different niches need. Different things. And if you understand them and if you have a, a good understanding about their pains, you can be at a better flow. So it's not just about like, uh, transcription, but also, uh, like for example, if you, uh, educational institute, it's not just transcribing, but also you would like to create like notes for the different lectures and also like study cards for the students and also like tests and also like mock tests.

Mihaly:

So the thing is here that you have a study material, which can be a lecture in, in the school, and usually one semester is about 10 to 20 hours of lectures in, in a given Yes. Uh, subject. And you can chat with these content, like you can ask about what are the key highlights. You can ask special questions, how can I solve this? What is mentioned about this topic? Right?

Viktor:

Yeah. So it's kinda like two ways. So first is like as the content producer yourself, You can like spin out different kind of contents from the same content, right? So it's got like, I'm having a lecture in English, but if I transcribe it, I can internationalize it, right? So I can internationalize the text and I can hire, or I can use text to speech solutions to produce like the same material in different languages, right? Or I can just create like more value for the listeners or for my students to like creating summaries automatically, right? So I can do it on a mass. Cause I have lots of data. Lots of audio. And it would take like an enormous effort on my side. But now it's easy because if I'm just like, want to make like a summary, I just want to make like a study card and those kind of things, I can provide it. I can give bigger and better value to my customers, right? And also what's unlock is I can create a chat interface. So the students, if they're stuck, they can ask in their own language. So they can ask in their own language at the specific point they're asking, and they can get a timely, instant feedback on it, right? So they can instantly, whatever the stacking point can be resolved. That's have like two ways. It's like either we as a service for their customers or just like producing more content. And it's for content producers, it's easy. It's like for us, for example, specifically if we have an episode, we have to create like, uh, transcript of course, but also like, uh, show notes, also LinkedIn posts, also Twitter posts, also email threads, written PDF protocols. So it's like different kind of different formats can be generated from the same source. And it's scalable, right? So it's, it's not like you have to hire someone to do it. If you have a process, it's scalable.

Mihaly:

Viktor, I think we should include the assembly summary and chat interface for this episode in the show notes, right? So our listeners can ask about, uh, the content of this episode.

Viktor:

Yeah, sure, sure. Absolutely. We can do that. It's uh, it's not even hard. So you're gonna do that. So it's gonna be a link in the show notes and you can click on it. AssemblyAI is going to pop up and you can ask any kind of question about our episode. So that's quite neat. Yes. Also, just like, not just translating, but also video can be generated, right? So it's like text to video with like d i d or this like text audio, like 11 labs and those kind of things. So you can generate more audio from the text, which you were generating from the original audio, so you can actually spread out a huge new batch of content, and that's quite useful. I think that that wasn't really possible before on a scale, or at least wasn't really affordable. Okay, so just

Mihaly:

one example here that we, we talked about that Udemy is the biggest online course platform currently with, uh, about 130,000 courses. They have the biggest traffic on the internet. It's bigger than Coursera and, and Skillshare

Viktor:

and other ones. How many people are visiting them in a month? About 60 million. Oh, wow.

Mihaly:

Okay. And we checked, and 97% of these, uh, 130,000 courses are below 15 hours. So currently what they could do, maybe just start with the most popular courses, but they can make a, a transcription from the whole course, and, and students can ask about its content. So it can be an interactive experience, uh, without

Viktor:

any extensive investment. Yeah, yeah. Right. So that's neat. It's, it's not like you, you have to machine learning operations and, and find new models and test models and those kind of things. You just stay kind of like. Plug it in and do it. It's like even less smaller scale. You don't have to do it for, uh, 130,000 courses, but as you said, maybe just like, get the top thousand, split them in B groups. Do it for the B groups, right? This chat integration, leave the A group alone. So the 500 is like nothing changed, and the 500 is, it's integrated and let's see. So it's like you just put it out and you can actually measure the difference of, of maybe engagement or maybe, uh, purchasing, maybe, uh, satisfaction. So actually, uh, what can be done is like Udemy can do this test and evaluate whether the investment makes sense or not, because if it makes sense, they can once again just like even rule it out to other courses or they just can maybe iterate quicker, uh, before rolling out. So they can do tests quite quickly instead of just like trying to figure out how this AI model is work. All right, let's go to the next business, which is, uh, a research tool for content creators, radio, tv, podcasters. Uh, so automating the finding analysis of content, we already share that. We use it, so it's like we already use it for every single episode for researching.

Mihaly:

And also, as far as I know, a lot of media companies is using it to find the topics of their extensive amount of media videos. Yes. Audios and, and they use it. How should they promote it on social

Viktor:

media? Yeah. Right. So it's, um, we already use it, but this can be a tool as well. So if someone is out there and want to make it to the tool and make it easy for anyone without any coding experience, this could be a tool. And, um, last time we talk about using testimonials instead of personas, because personas can be hard to relate to, and the testimony can convey the gist quicker and better. Right. And I just coined it as a testy model. So just using test model for this research tool, let me give you, uh, a short testimony, you know, so it's as an overburdened astro phys podcaster, the content research automation tool has been my lifeline. It has miracle, closely transformed 15 hours research logs into mere minutes, sourcing, analyzing, obscure, obscure, yet crucial content with pinpoint accuracy. It's not just a tool, but an invaluable team member keeping my podcast fresh and my life balanced for content creators drawing in research. I wholeheartedly recommend this game changer. So this is like, kinda like a, a sport testimony, which I just made using my testimony tool. To bring it alive. So if this is resonating with you creating this content research automation tool, please do it because I guess the market would be extremely happy about that. Alright, let's do two of these and then we can move to the next step. So there's also customer support analysis. So we talked about that. Okay. We use it, right? Yes, we use it, but uh, to analyze feedback and it's already, it can be used and implemented, but this could be an easy to use tool in any kind of niche. So once again, using the testy model as a customer support manager in a bustling tech firm. The A B C O D analyze this tool has revolutionized our service. It's like having a vigilant analyst on board proficient, deciphering customer sentiment from hundreds of support calls daily. Not only has it improved our understanding of customer pain points, but it has also tremendously streamlined our workflow. It is a priceless asset for any customer-centric organization seeking to optimize their service and delivery. So once again, it's kind of like giving you a better understanding of what this specific tool could give someone. And let's do one more and then we can move on. So it's a sales course to the service. Once again, upload calls and give specific recommendation to improve. So the testing model for this as the head of CS at the fast growing startup, the CS Court service has been instrumental to our success by simply uploading our CS calls, we get clear, actionable feedback to refine our strategies. The precision and relevance of the recommendations have led to notable improvements in our team's performance. I wholeheartedly recommend this service to N Cs team looking to step up their game. So once again, I guess this is kind of like a, brings this whole tool to life much better than a personal could. And if it's resonating with you and you want to create a business around this, I can wholeheartedly recommend it because, uh, yes, this is, this is painful and this is actually useful and practical in any field where, where yes, people are, are, are employed. Okay, let's move to the community part. If you want to build a business on top, Let's see how supportive they are. How could AssemblyAI be better? So let's do some quick analysis of first, what are the good and the ugly part of, or, or how could they be better? So let's start with a good one. Yes. I guess most of the listeners, uh, who are familiar with Whisper, they may ask like, okay, open AI is a similar service. You just upload an audio, you get that transcript. What's different first? Uh, it's more accurate so it makes less error. So S M D I makes less error and also VS. As through API has a file size limit of 20 megabytes. So what does it mean in practice? If you want to want to use this out of the box and you don't want to mass speed the model seven you want to use through an api, then you have to first chunk the audio file, right? So you have to chunk it up and make sure that it is chunked up properly at max one 20 megabytes. Then you have to feed it into, sounds very painful to me. You have to feed it into, you get back a transcript with, obviously with timestamps, but then you have to offset it each timestamp because you are making it 20 minutes chunks. Right. It's kind like, takes a lot of steps. At least more steps than you would guess. Right. And in comparison with sm, yeah, you can feed up to 10 hours to the little more. Uh, model and it's, I guess it's easier and quicker to use. And it sounds like price similarly. Yeah, so for our

Mihaly:

videos, it's about two gigabytes per hour. So 10 hours of video can be 20 gigabytes, and it's compared to 20 megabytes. I mean, it's 1000 more,

Viktor:

right? In our case, because it set from YouTube. I'm not sure whether they use the two gigabyte, uh, fire size or fire resolution. I'm not sure about that. Uh, but yes, it's, uh, it's, it's bigger. Also, they just

Mihaly:

use the audio, so it's not a perfect

Viktor:

comparison. We are talking about like hundreds of megabytes. It's like 150 for video, like us, if you only look at the audio. So it's, it takes like still seven to eight chunks. And it's kinda like a painful, uh, in, in open AI case. So yeah. And also it's the, the customer support you, you have some experience with them, so you can sh I guess share. What was your experience so far? I was using it

Mihaly:

for episode two or three, so it was a few weeks ago. And the interface stacked at one point of the process of generating the transcript. And I was very frustrated because I, I wanted to get it done fast, the whole editing of this episode, and I just write to the online chat on their website and in two minutes they reply that, oh, I see you, you have a problem. I will check it with the same video link, what you are doing. They checked it in one minute and they thought, oh, this is, this is, this is a big problem here. So I will escalate it to the developers. And in 10 minutes they wrote that, okay, we fix the problem. Now your transcript is ready and you can do it with other videos. So it was like mind blowing, 10 minutes,

Viktor:

Viktor. That shows that they are actually AI as a service or AI model as a service has an upside of, you don't have to spend like days and lots of, uh, engineering hours to fix an issue if they come up. Because their main and only job is to provide a model and make its reliable. Right. That's extremely good. So also their social media presence is quite neat. And even the, the developer education is quite good as well. I remember I came across them. At the end of last year, and they made a blog post about how to use Whisper at, at that time. But they also make lots of posts about different technological, uh, aspects of what they do. But also on LinkedIn, they have 21,000 followers. They constantly posting videos, post case studies and use cases. So their marketing communication and is quite neat. I guess.

Mihaly:

Yes, they're focusing on YouTube, but focusing on LinkedIn. But for example, they have only 30 likes on Facebook and not too much content. So I think why they have 60 million in funding, they are very focused on where the target audience is. They're very focused on developer education and also on their, uh, hiring page. They are hiring marketing people and almost all of the roles are in developer education. Yeah. So they, they know what their way of

Viktor:

doing things. Yeah. Right. And uh, I mean, as a developer I can attest to it that I learn a lot on YouTube. So if I want to learn something, the first thing is I just go on YouTube and see who did, uh, a recap or overview the topic I'm, I'm interested in. So that's, that's quite neat. And, but there are a few ways, or few aspects where they could do better and let's go through them. So for example, in open AI's case, if you go to the platform to open ai.com, which is for developers, the first page you see like the use cases and actually just one from one click you can get to the examples that you get a big list of examples and use cases and you can instead the try them. And that's quite neat. So that's something, the cute copy. To have a huge as list of, okay, how this in practice of what just we discussed now in practice, how in each field, like just click here. This is a sales call example, sales call, and this is the exact question, how you can improve it and those kind of things. So just like making it extremely simple to understand the use case and with just one click, you just click on it and you get it, get the result, and you can actually change the prompt a little bit and mess around. So it's just kind of like reducing the cognitive load of trying to come up with something useful. Just to stay on this train of thought, uh, actually just a, a quick chat or quiz. So understanding the use case of the person recommending use cases based on it. So it's like if you have the library of the examples, it's quite trivial to Okay, ask a few questions and recommending, okay, these are the top three examples we can recommend for you. So it's not just like, here are the examples that you can search them. You can go through them one by one, but also just like I have three questions really. It's like, okay, what's the field you are in? What are you dealing with? What is the biggest hurdle for you regarding ODU and those kind of things. So just ask a few questions and actually these large language models are insanely good at just like picking the top three examples which are relevant for you. So that could be done as well. Jasper ai, which has the best onboarding experience ever. So if you are a marketer out there, if you're an entrepreneur out there, just go to Jasper AI and check out their how they onboard you and be really mindful because what they do is they push you towards the education journey. So if you watch videos, educational videos, you get points and you can use those points as credits. So actually you can spend them with their service. Right? And it's ingenious because they push you through from not being really familiar with their offering to actually, you watch all their videos, you know exactly how it can be done, and you actually earn money to spend in the meantime. So they gamify the onboarding

Mihaly:

experience, right? Yes, sure. Which is very rare. I work with a software service company, and onboarding is very hard to find the right amount of pushing the people in one direction because the thing is that you want to just show one kind of function and you want your users to experience value and be happy. Then after comes the next, uh, function and experience value. And if you do it too much, they will like burn out and cognitive overload and go away. But this gamification thing is I don't see with a lot

Viktor:

of companies, right? Yes. It's it's not common. So it's, uh, doing it well. It's not common and Jasper is doing, it's like. Word class. So I, I honestly don't know anyone else doing better than them, so, uh, absolutely can learn a lot from them. As you said, it has to be sequential, so it's, it, it cannot overwhelm the users. It's like you have to be quite mindful about like, okay, what's the next step that's actually happens? It's physical education as well. So in sports and fitness, if you have a client who never worked out in their lives, uh, before, then you have to offer them something which is easy and at their level, right? And, and this progressive nature of like giving hard and harder exercises, it's kind of like the gist and bread and butter of a good, a good coach, right? So they pinpoint exactly where, where are you at currently on the big picture and then. They can just like, okay, help you to make the next step, basically. And that's, that's kind of like not trivial, but you, you can get lots of inspiration from Jasper guys. Okay. One

Mihaly:

more thing. I, uh, when we started to prep, uh, for this episode, I ask you why do they do audio? Only because they have a longer context window than cha, which means you can feed longer text, and currently you can only feed audio to AssemblyAI. And I was listening to a few podcasts with the founder of, Uh, AssemblyAI. And I understand they do three things. So first they do the transcription part. So you'll get the audio and it'll generate a text. But this text is usually not very, uh, human readable. So the next step is to make it more readable. What does it mean? Make sentences? Capitalize the sentences. Also make paragraphs. Make chapters. So this is the

Viktor:

second part. This is for those who are listening normally. So back in that day, like the first, these transcription services, they basically transcribe words. So it's like a sea of words, what you get back without punctuation, without proper paragraphs or sentences, and so on and so on. So it's just like a sea of words. And what they do is like, yeah, obviously for this transcribe, but then they kind of like edit, quote unquote, edit the transcription to. Make it like a proper text. Yes. And the,

Mihaly:

and the third part is the fine tuning. So they can fine tune it for, make a summary or the quality assurance for sales calls, or also in customer support. They can fine tune it to create, uh, SMS based on the customer support call and make it available in the CRM system. So in this case, the employee just only have to click on one button, just send the s m s for the user. Based on the content of the call. So there are three parts, transcribing the test, making it more readable, and also comes to fine tuning. And this is what AssemblyAI is doing in a scale that they try to fine tune for a lot of use cases. So they try to find a lot of product market fit, not just one, but they say the ultimate product market feed is the 100% speech recognition without errors. It's very similar to autonomous driving. So there is a use case to have an autonomous car in a university campus and there is a use case for having you a highway. So it can go on a highway and turn, but after that you have to steer manually. Yeah. But the ultimate use case for autonomous cars is to handle every kind of environment, handle the city, handle the big traffic, the low traffic, the highway. So this is what they are doing. They are, they're focusing on the big use cases.

Viktor:

Yeah, that makes a lot of sense. And also since they, uh, have their own models, so it's like the engine, they made their own engine, basically. They train it off on, on 65 terabytes of data or something like that.

Mihaly:

Yeah. And it's very interesting that, that you mentioned in the previous episode that ChatGPT is trained on, uh, less than one terabyte of data when it's text only, but now it's audio, so it's a magnitude bigger. It's, it's 60 terabytes, it's about 650,000 hours of audio and they, now we can say that

Viktor:

they're very good. The error related freight is lower than WhisperAI, and WhisperAI is already mind blowing. So obvious it needs, so since they're building everything on the engine now, it is all the only, but in the future it's gonna make a lot of sense to just like focus on the problems they solve and yeah, their model is just like the audio engine is one thing, but also open it up to other kind of inputs as well. Let's discuss some possible business ideas. Right.

Mihaly:

Great. Okay, first one, how do book analysis. What do

Viktor:

you think about this sector? Sell it to me. So what's the problem here? I think for

Mihaly:

marketing, so think about a writer, uh, writes a book and they have to generate a lot of social media content. For example, social media, post summaries, blog post, and I think assembly I make, it's very easy first. Again, 95% of books are less than 10 hours if somebody reads it. So they fit in the current limitations in AssemblyAI, it can chunk up from different kind of content and, uh, 10 hour book can give enough content for like half a year for the writers to post on every kind of social media channel.

Viktor:

It's basically like, uh, content repurposing and promotional materials for writers and, yeah. Who are writing books. Yeah, you need like a landing page, right? What should be the landing page? Like, uh, what should be the, uh, the heading, the subheading, the what, uh, structure should it have? And also emails, right? So it's like email sequence if someone's, uh, signing off. So what could be the values given the email sequence? What could be a Twitter post? What could be LinkedIn posts and those kind of things. So it's like extracting and repurposing, uh, like, uh, a book. It, it makes a lot of sense. And this is where, for example, now it's like, yeah, currently to use assembly you should make like, uh, audio from it first so that generate an audio and then you can like, uh, get all these informations out from it. But now if this is like, if they find like a product market fit here, it makes a lot of sense. That just upload like the. Raw data of the book itself, which is already, uh, written because it's, it's easier to work from it. I like this idea. I'm curious though, it's like how it could be priced and so the median, uh, book writer doesn't earn a lot, lot of money, so they don't have a huge budget. So it's kind of like a question how well this could be monetized. But uh, yeah, it's definitely. Saving a lot of time. And this can be like, uh, it doesn't have to be sold to, to writers directly. It can be sold to editors, to publishing houses. Yeah. It can be sold to the aggregators basically.

Mihaly:

Next one is without explainer feedback. So in a one of our previous company, we created a two minute video, explainer video, and it took at least one month or maybe six weeks. It was very slow, the process and it just the creation. And now we can write a script with the half of Chat GPT, we can get feedback on that. And also we can generate the audio with our voice, for example, with that script. So we have the audio of explainer video, and now we can get feedback, uh, from AssemblyAI. How can we make it more viral? How can we make it more engaging? Where should we put hooks in the video? And we can ask these coaching questions and we can produce an explain video, I guess, in two weeks. And this could be, uh, way better quality

Viktor:

also, it's like, Integrating into other tools as well. You mentioned lots of no-code tools and, uh, low-code tools and you used them in the past. So I guess they could grow by integrating into like bubble ggl, adao and chat interfaces, uh, like chat gpt plugin or databases like database or superb base. So just kind of like aim to the developers or local developers, uh, who are using different tools to speed up their workflows so it would make it easier to developers. Actually using and building on top of, uh, AssemblyAI. Next one is

Mihaly:

for YouTube creators. And YouTube creator is a huge community. Viktor. There are, I think more than a million millions of YouTube creators. There are excellent tools like Tube Buddy for AB testing, uh, IL images of YouTube videos. And, and so, so there are a lot of companies focusing on these creators. I think, uh, one, our company should move with the help of AssemblyAI for this place as well. For, so for example, instantly analyzing a video and, uh, asking the question of how to be Mr. Beast. So for example, having a hooking at the beginning and how to keep the attention on a word class lab work like Mr. Beast. So this is the questions you can actually ask from SMB ai.

Viktor:

Yeah, that's, that's quite neat. So I guess since Mr. Beast is so inspirational for most of the, the creators, just helping them because Mr. Beast is, is quite generous. Regarding sharing all the secrets. So say he's willing to share what makes a good video, but it's not practical in a sense that, okay, he shares it like, okay, the first, I dunno, the 30 seconds should be, uh, a lot of cuts. So it should be at least 15 cuts in the, in the first, third, 30 seconds to engage and pull in the viewer. But that's kind of like not relevant and not instantly applicable for you. But with the large language model, with the AssemblyAI model, these kind of things can be tailor made for your own content. So if you are like a cooking channel, you have a coaching channel, whatever channel, you just like testing gadgets, no matter what you have, it can be tailor made for you. So a tool can be extremely useful I guess, that, okay, here are my videos, how do I make it better And just like coach me to, to be more like Miab beast, but it can be others as well. So it can be aspirational in a sense to other kind of creators as well.

Mihaly:

Okay. Also, it can also generate short clips. This is, uh, one of our challenge for us as well. Uh, we want to cut up this, uh, a long podcast to very short TikTok style videos and also longer YouTube clips, like five minutes and it's very hard to find, uh, what is the interesting part, which, what would would go viral. And this is again, a question you can ask from AssemblyAI.

Viktor:

Yeah. And creating show notes is quite trivial as well. So, and we already use it for that, so that's quite neat. What other use cases are

Mihaly:

there? So with AssemblyAI, you can feed the, all the lectures from semester in their interface and you can ask about questions about all the lectures and think about it can be a great cheating tool as well. So during the exam, you can ask the actual exam questions from a AssemblyAI about the lecture content and it'll just give you the answer.

Viktor:

Yeah, it's a holy grail of, uh, cheating, uh, during university. But if we want to like, uh, frame it. Uh, positively and, uh, ethically then it's a good coaching tool for sure. So if you want to get ready for an exam, then obviously you feed all the lectures inside and ask all the questions. And most of the time they have example exams, right? So these mock exams and getting the questions from the mock exams, you can generate similar questions, which at GPT or even, uh, AssemblyAI. And then generating those questions, you can try to answer them and compare what SMDs telling you back. And you can, if you, uh, just stack you just like type, uh, manually, like ask follow up questions. And it's insane because it's timely, it's well tailored, it's just like fine tuned for your specific need and it's already can be done. It's, it's already there. It's nothing as is needed.

Mihaly:

Okay. And the last business idea that I would like to mention is about Spotify. So my Spotify for so-called your episodes. I have about 300 episodes, at least 60% of them are boring and I don't want to hear them, but I usually listening to posts when I'm training, when I'm running or working out, and it's very hard to change track. And it's like, oh no. Again, a bad episode. But think about AssemblyAI could hear all these 300 episodes. It can give me a summary of them, or it can just rate it based on my preference and, and give the list of by subject, by topic and give a list. What podcast is the best on my, your episode list?

Viktor:

Yes. It's kind of like a personalized recommendation tool within Spotify to Okay. Just like. Just have like a, a personal assistant, like you give them a, like a list of 300 podcasts or listen to it and give me the five I would enjoy the most. And it can, it can be done now. And one more last idea, it's healthcare. Because when I'm studying on site in at, at Stanford doing the Innovative Healthcare Leader program, it turned out that actually it's quite, because I'm an engineer, so I don't have healthcare background normally. And what was stunning for me is that for doctors, it's a lot of pain to sit down and type right, uh, whatever they speak with the patient. And what they use is like, they have assistance, right? But it's not always scalable, right? So, and not everyone has assistance, but now it's possible. So now it's like, okay, just like speaking. With the patient itself and everything is transcribed and can like give even words like, like it can be treated like a digital assistant. Like, okay, make sure that you make a note about this or put it in the important section and this can be done now as well. So it's like, once again, nothing has to be invented for this to work. It's already there. Every building book is there and I think it's insanely valuable. And what they do now, or at least back when I was at Stanford, they were outsourcing it to like India for example. And on the other side they were sitting someone who was a doctor or medical doctor actually, and they were like a virtual assistant for doctors. But that's not scalable neither. So I think this can unlock a huge value and also can increase the empathy toward the patient because. The doctor can be 100% present, can look you in the eye, can talk to you, and only can like, just speak like, okay, this is important. Let make a note and just like that in two seconds, in five seconds it's done. And you can keep on speaking. It has some implications about like privacy and those kind of things. Of course. I mean, besides that, technically this is already possible.

Mihaly:

Okay. Can we touch our last part? Recruitment.

Viktor:

This is the part of, uh, these, uh, strategic deep dives of companies where we look at the recruitment page of these companies and we try to understand what they are aiming for now, what is the direction for them and also for the listeners, if you want to refer companies like these, which are well funded, uh, you think that what they do, their mission is resonating with you, then maybe we can give you an idea what kind of open positions they have so maybe can jump on board and obviously, Since this whole field is booming, the stock of shampoo can be quite valuable. So it makes a lot of sense in, in a few years frame that, uh, working for a company like this, it is an extremely low risk as an employee and the reward is outsized. So you don't have to be an entrepreneur to make a huge valve. With

Mihaly:

low risk. Okay. So we check their positions and they talk about that. Their engineering team, there are small engineering sub teams. There are one researcher and there are two research engineers for every researcher. So what they are doing in these small teams, they are, uh, looking for a possible solution, what they can do, and they fine tune their model for this, so they are doing it. So there are a few open position for engineers in the marketing part. Most of the open positions are for developer education. We already touch this topic. They have a lot of content on YouTube and also push it to LinkedIn. And the third one is product development. And for example, one of their open position and the product development is growth product manager. And what they are doing is, uh, API onboarding experience. So this is the same software, the service companies where. You mentioned just for ai, and that they are focusing on the onboarding process because it'll bear attention and, uh, it'll decrease the churn in the long term. So onboarding is very important, but they're doing this for api, the API experience for developers.

Viktor:

This is like a, a good point for those who like to develop but don't like to code that much. So even just educating developers and, and working on these kind of things makes a lot of sense. You don't have to be a hardcore machine, machine learning engineer for you to find a position here, because there, I have a hunch from the open positions that they're quite well. Positioned on the deep technical side and what they need help is actually scaling it, finding ways to teach it and product market fits, and actually how to help others to use their tools. So it's not that much on the deep, deep, deep technical side, but also I guess if you are deep into these, uh, technology like audio to text, uh, you could find a position for sure. But also another point is CS engineer is kind of like a similar, so that's, if I have had to choose one position, which I personally like from the open position is a CS engineer, because if you love solving problems, then you can get inside look of how to solve real problems for lots of clients. And that's kind of like a cheat sheet in a sense that, uh, you go there and you understand their technology, how to use it, and you apply it to lots of clients, lots of use cases, and you can see the use cases, but how people build. What they build. And if you spend like a few years there, you can get a good gist and good understanding and maybe you can do your own stick based on that. What if you see where the needs are? So I think that position is quite sexy for me as an engineer, but I'm curious what jumps out for you. The gross product

Mihaly:

manager, what I mentioned, the API onboarding, uh,

Viktor:

experience manager. Yeah. Okay. Why we, we, we had this segment is because it's a low risk, high return, uh, big funding, good product. So if you listen to my first million, it's kind of like a sellers list. So that's kind of like a shout out, which is basically, uh, working for companies with, uh, startup companies with good funding and where the stock option pool can be quite reasonable even in a few years, uh, timeframe. So just to wrap this up, uh, one final question is like, okay. We talked about lots of ideas, how they can improve if you're an entrepreneur, how you can use them, if you want to be an employee, how you can join them, and then these kind of topics were covered well, but. What kind of companies are already using them? This is kind of like missing and let's have a wrap with this segment. Okay. So

Mihaly:

I would like to quickly mention a few companies. I think the most accessible is called grain.co and they are making AI powered meetings. Uh, so they make the transcription, uh, they make the highlights. And also key takeaways, uh, from your meetings. It can be, uh, for remote teams, customer success, high-end recruiting. What we already mentioned very similar is fireflies.ai. Uh, also meeting notes for bigger companies. I think allo all.com is interesting, uh, because they are managing the whole phone system and all the phone processes and, and transcribing your calls so you can make the calls, uh, through their system and they. Directly putting the transcription of the phone calls in your CRM system like HubSpot and also the Salesforce and all the big CRM systems. And for us, I just started to experiment with, uh, that it's called video.ai. This is a service that makes, uh, short clips like TikTok style clips from long videos. And I think it's very useful for everybody who make content in longer form. And the last one is runway aml.com. Usually I think a very huge company in a few months or few years because uh, they are doing text to video. Generation, and I think this is the next big milestone in generative ai. It's also this so, so-called generation two model, which is available for a few people but not available to the general public. But they say that available soon. So what you can do, you can you write a prompt to change P mid journey, but you will get a whole video and think about Viktor, how new dimensions will it open up?

Viktor:

Last time we talked about how even mid journey five x the clickthrough rate on, on Facebook ads for you. And that was static images, right? So it's like imagine if you can explore quickly different kind of video formats for advertisements. So there is a reason why Facebook already integrated text to image generation within their adsd. Interface and why Runway ml, which were actually behind stable diffusion models as well. So they are also integrated deeply into this ecosystem, and they ensure just like creating a web based, uh, video, uh, which is enhanced by ai. So that's their shtick and it's, what they do is, is, is mind blowing because they, you can just like, remove people, remove objects, uh, you can do basically magical things so we can cover them as well in the future. Uh, and we are using a lots of, uh, AI technology. Uh, for this podcast as well. So if you're listening to this episode, then you are still with us at at the end. I'm really grateful for that. So, and also, please give us a feedback and, uh, we would like to know whether you are interested in further episodes where we dig deep into, uh, the strategy of these companies. We analyzing how they do what they do, and what are their strengths and weaknesses and how you can apply them as an entrepreneur or developer or, uh, as an employee if you're interested to work for these companies. If this is something valuable, please provide us the feedback and we, we are gonna make more episodes like this because we are, as I said, using a lots of AI tools and they're amazing. Thank you for your attention. I thank you, Viktor. Yeah, that's a wrap. Bye bye. Okay, just quickly in two minutes, look at the link or business model com of AssemblyAI. So first, look at the backend of what is the problem it's solving. First it's overwhelming amount of audio, so you cannot process them. And also you don't know how could it be better either with at sales or podcasting or whatever. Uh, and there are also existing alternatives like whisper, author and also chat. GPTs could be alternative helping you to quote you, but it's not audio. And the solution, uh, is transcription service, chat with transcription and also sentiment analysis. And the key metrics are audio process per month and also recurring API calls per month. So it's not just having a lots of people to do, uh, audio processing one time, but also hopefully. People are returning so you can help them more. And the unique value proposition, which is unique to you, is the 10 hour audio processing, which is unlike any other solutions out there. And unfair advantage is the conformer one, uh, audio to tax model. Uh, so they can have their own even better than V, which is currently one of the best models out there. And they're using channels like LinkedIn blog. Also, they could use more like YouTube or low-code tools. So integrating with low-code tools. Customer segments are small content creators like us, but also sales teams, customer support or publishing houses. And early adopters are AI tool creators. So those entrepreneurs and, uh, developers who wish to create their own tool and, and sell their own tool. And the cost structure, basically employees, uh, mostly the, the fixed size and they have a. Variable cost of GPUs. So basically running their own models. These are the big, uh, two costs. And they have, and the main sources in of income is metered api, but also they could expand this in the future, like consultation or creating a marketplace, for example. So having consultants, uh, and also companies needing for help, uh, could be something they expand into and, uh, uh, that's how they could crew as well. But this is kind of like a high level overview of their, uh, lean or business product cameras.