AI Strategy Dimensions & NLP Pipeline

Keep in mind the dimensions to AI Strategy, Live Bootcamp

Hi You!

In today’s edition, I will share some insights into the field, things I have been working on, AI resources, and the upcoming Live bootcamp.

Happy AI.

Today’s Content →

  • AI Leadership👑Data and AI Strategy Dimensions

  • Concept 🧑‍💻 → NLP Data Cleaning Pipeline

  • Sponsor🦾AI Tool Report

  • Community⭐ → Like-Minded Humans

AI Leadership 👑

Data and AI Strategy Dimensions

Testimonial

I discussed the dimensions with a Data Professional from Europe.

There are 2 major dimensions of AI strategy →

  1. Data Engineering/Architecture

  2. AI Product Management with Business ROI

You need both to reduce your risk of building a failed AI.

Whether you are integrating AI into an existing product or building a separate tool.

Data Engineering/Architecture

You have data all over and around us and you still crave insights.

I will recommend you 1 book for it.

Modern Data Architecture on AWS by Behram Irani

Himanshu Ramchandani with the book Modern Data Architecture on AWS

You will find amazing content to build your knowledge around how data platforms work.

This is specifically for AWS but once you are comfortable with any cloud provider like GCP or Azure, you can easily switch.

I will recommend spending a few months on a single cloud provider of your choice after that you can switch it if you want.

Now the second part is understanding the technology as per the business requirement in your product.

AI Product Management with Business ROI

You need to make sure the product you are building is aligned with the customer's needs.

Otherwise, you will be next in the long list of AI graveyards.

You need to understand the technology at a level that you can easily communicate with the engineering team as well as stakeholders.

You probably have this issue of not understanding the AI and Data jargon used by the engineering team.

To solve this problem I am starting a Live Bootcamp 

Neurons to GenerativeAI → Beginner Friendly, 64 Chapters, 5 Weeks, 2 hours each session.

PS → 1 : 1 AI solutions brainstorm Let’s develop your AI idea.

Learn AI in 5 Minutes a Day

AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.

Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.

Concept 🧑‍💻

NLP Data Cleaning Pipeline

As humans, if you read any text, you will only remember important words.

The key information you will keep in mind, not word to word.

Similarly, we want to process only important words, not the words that are not helping in prediction.

These words are called stop words.

Removal of stop words is one of the parts of text pre-processing.

Let me give you a pipeline that will help you understand the whole pre-processing of text, before feeding it into any language model.

Text Pre-Processing Pipeline

We will start with raw text data; at the end of the process, you will see what the data will look like.

You already know different types of LLMs like GPT were trained on internet data.

You cannot just feed that text internet data directly to the model.

Why?

Because the model will consider “India” and “INDIA” as 2 different entities.

We call this, the data is not normalized.

We want our text data to be normalized and should be on the same scale so that the model learns from it.

Following are the steps of cleaning the text data and making it ready to feed into the model.

Raw Text Data

<SUBJECT LINE>Leaders details.<END><BODY>Heres are 2 content for you, 1st is technologist and 2nd is medicals@.

When you pull data from the internet, it can be any form. The above line is just an example.

Encoding Removal

Leaders details. Heres are 2 content for you, 1st is technologist and 2nd is medicals@.

Encoding types are UTF-8, ASCII, Unicode, etc.

These encoding types can convert a character into a number and will be converted into binary for the computer to understand.

Lower casing

leaders details. heres are 2 content for you, 1st is technologist and 2nd is medicals@.

All the capital letters will be changed to small letters.

Digits to Words

leaders details. heres are two content for you, first is technologist and second is medicals@.

Remove Special Characters - @!#$.%^&*,

leaders details heres are two content for you first is technologist and second is medicals

Correction of Spelling

leaders detail here are two content for you first is technologist and second is medicals

Remove Stop Words

leaders detail two content first technologist second medicals

Stemming

leader detail two content first technologist second medical

Lemmatization

leader detail here two content first technologist second medical

Now the final text is ready to feed into a model.

Socials

Be part of 50,000+ like-minded AI professionals across the platform

How satisfied are you with today's Newsletter?

This will help me serve you better

Login or Subscribe to participate in polls.

Please reply to this email with your requirements or suggestions on what you want in future newsletter content.

PS: build your newsletter, → Here

Reply

or to participate.