What to learn in AI as a Leader?

AI Leaders and Text feature engineering pipeline components: Text Normalization and Tokenization

In partnership with

In today’s edition, I will share what to learn in AI as a Leader, and what you should keep in mind while learning about it.

I also shared what Text Normalization and Tokenization is and how you can implement it using Python.

Let’s dive in.

Today’s Content →

  • AI Leadership👑What to learn in AI as a Leader?

  • Today’s Sponsor → Growth School

  • Concept 🧑‍💻 → NLP Data Cleaning Pipeline

AI Leadership 👑

What to Learn in AI as a Leader?

Himanshu Ramchandani

That’s a tricky question.

You will not find the exact topics to learn and improve in the field.

Majorly you will find these →

  • how the model works

  • where to use that model in the real world

  • using the model to solve a particular business problem

Everything will revolve around these things while learning.

As a leader, you don’t need to learn Python, RAG strategies, transformers, etc.

But,

If you have the technical understanding, you will see a change in how you decide on this technology (in a positive way).

You already know the user end of the product, you only have to work on the engineering end.

It’s like building a hybrid skill of knowing the business end as well as the engineering end.

Here are some key pointers to focus on →

  • Data knowledge like quality, privacy, and the set of rules around it.

  • The Implementation of AI Strategy to align it with your business goals.

  • Monitoring and change management as AI is not a superpower, you should keep the limitations in mind.

  • In team meetings, framing the right question is a must, this is only possible if you have the basic knowledge of the technology.

Note → You should not ignore 100% of the technical part as ultimately it has an impact on making the right decision.

Specific to GenerativeAI, I have a roadmap that you can follow to make yourself bulletproof from the dumbness of AI.

Live Bootcamp

I am starting a Live Bootcamp closes in 3 days. Lifetime access and assistance.

PS → Neurons to GenerativeAI → Beginner Friendly, 64 Chapters, 5 Weeks, 2 hours each session.

FREE AI & ChatGPT Masterclass to automate 50% of your workflow

More than 300 Million people use AI across the globe, but just the top 1% know the right ones for the right use-cases.

Join this free masterclass on AI tools that will teach you the 25 most useful AI tools on the internet – that too for $0 (they have 100 free seats only!)

This masterclass will teach you how to:

  • Build business strategies & solve problems like a pro

  • Write content for emails, socials & more in minutes

  • Build AI assistants & custom bots in minutes

  • Research 10x faster, do more in less time & make your life easier

You’ll wish you knew about this FREE AI masterclass sooner 😉

Concept 🧑‍💻

Text Normalization and Tokenization

This is part of the feature engineering process in an NLP pipeline.

Text Normalization

Normalization means keeping all the words on the same scale.

All the words should pass through the NLP pre-processing pipeline we discussed earlier.

In any format, the text is, it will be normalized into, as mentioned above.

Tokenization

It means we are breaking down the whole corpus of text into smaller chunks.

There are different ways to do that →

  • sentence tokenization

  • word tokenization

  • regular expression tokenization (if you want sub-words also)

You can create smaller chunks of sentences, individual words, and sub-words.

Following are the code snippets for text tokenization →

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords

text = 'this is a single sentence.'

tokens = word_tokenize(text)

print(tokens)

Output →

['this', 'is', 'a', 'single', 'sentence', '.']

no_punctuation = [word.lower() for word in tokens if word.isalpha()]
no_punctuation

['this', 'is', 'a', 'single', 'sentence']

text = 'this is the first sentence. this is the second sentence. this is the document.'

print(sent_tokenize(text))

['this is the first sentence.', 'this is the second sentence.', 'this is the document.']

print([word_tokenize(sentence) for sentence in sent_tokenize(text)])

[['this', 'is', 'the', 'first', 'sentence', '.'], ['this', 'is', 'the', 'second', 'sentence', '.'], ['this', 'is', 'the', 'document', '.']]

stop_words = stopwords.words('english')

print(stop_words[:20])

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his']

text = 'this is the first sentence. this is the second sentence. this is the document.'

tokens = [token for token in word_tokenize(text) if token not in stop_words]

print(tokens)

['first', 'sentence', '.', 'second', 'sentence', '.', 'document', '.']

Socials

Be part of 50,000+ like-minded AI professionals across the platform

Please reply to this email with your requirements or suggestions on what you want in future newsletter content.

PS: build your newsletter, → Here

Reply

or to participate.