the master
Posts
What to learn in AI as a Leader?

What to learn in AI as a Leader?

AI Leaders and Text feature engineering pipeline components: Text Normalization and Tokenization

Himanshu Ramchandani
August 02, 2024 • Estimated Reading Time: 7 minutes

In partnership with

In today’s edition, I will share what to learn in AI as a Leader, and what you should keep in mind while learning about it.

I also shared what Text Normalization and Tokenization is and how you can implement it using Python.

Let’s dive in.

Today’s Content →

AI Leadership👑 → What to learn in AI as a Leader?
Today’s Sponsor → Growth School
Concept 🧑‍💻 → NLP Data Cleaning Pipeline

AI Leadership 👑

What to Learn in AI as a Leader?

Himanshu Ramchandani

That’s a tricky question.

You will not find the exact topics to learn and improve in the field.

Majorly you will find these →

how the model works
where to use that model in the real world
using the model to solve a particular business problem

Everything will revolve around these things while learning.

As a leader, you don’t need to learn Python, RAG strategies, transformers, etc.

But,

If you have the technical understanding, you will see a change in how you decide on this technology (in a positive way).

You already know the user end of the product, you only have to work on the engineering end.

It’s like building a hybrid skill of knowing the business end as well as the engineering end.

Here are some key pointers to focus on →

Data knowledge like quality, privacy, and the set of rules around it.
The Implementation of AI Strategy to align it with your business goals.
Monitoring and change management as AI is not a superpower, you should keep the limitations in mind.
In team meetings, framing the right question is a must, this is only possible if you have the basic knowledge of the technology.

Note → You should not ignore 100% of the technical part as ultimately it has an impact on making the right decision.

Specific to GenerativeAI, I have a roadmap that you can follow to make yourself bulletproof from the dumbness of AI.

Live Bootcamp

I am starting a Live Bootcamp closes in 3 days. Lifetime access and assistance.

PS → Neurons to GenerativeAI → Beginner Friendly, 64 Chapters, 5 Weeks, 2 hours each session.

FREE AI & ChatGPT Masterclass to automate 50% of your workflow

More than 300 Million people use AI across the globe, but just the top 1% know the right ones for the right use-cases.

Join this free masterclass on AI tools that will teach you the 25 most useful AI tools on the internet – that too for $0 (they have 100 free seats only!)

Get it now for absolutely free! (for first 100 users only) 🎁

This masterclass will teach you how to:

Build business strategies & solve problems like a pro
Write content for emails, socials & more in minutes
Build AI assistants & custom bots in minutes
Research 10x faster, do more in less time & make your life easier

You’ll wish you knew about this FREE AI masterclass sooner 😉

Concept 🧑‍💻

Text Normalization and Tokenization

This is part of the feature engineering process in an NLP pipeline.

Text Normalization

Normalization means keeping all the words on the same scale.

All the words should pass through the NLP pre-processing pipeline we discussed earlier.

In any format, the text is, it will be normalized into, as mentioned above.

Tokenization

It means we are breaking down the whole corpus of text into smaller chunks.

There are different ways to do that →

sentence tokenization
word tokenization
regular expression tokenization (if you want sub-words also)

You can create smaller chunks of sentences, individual words, and sub-words.

Check this tokenizer by OpenAI.

Following are the code snippets for text tokenization →

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords

text = 'this is a single sentence.'

tokens = word_tokenize(text)

print(tokens)

Output →

['this', 'is', 'a', 'single', 'sentence', '.']

no_punctuation = [word.lower() for word in tokens if word.isalpha()]
no_punctuation

['this', 'is', 'a', 'single', 'sentence']

text = 'this is the first sentence. this is the second sentence. this is the document.'

print(sent_tokenize(text))

['this is the first sentence.', 'this is the second sentence.', 'this is the document.']

print([word_tokenize(sentence) for sentence in sent_tokenize(text)])

[['this', 'is', 'the', 'first', 'sentence', '.'], ['this', 'is', 'the', 'second', 'sentence', '.'], ['this', 'is', 'the', 'document', '.']]

stop_words = stopwords.words('english')

print(stop_words[:20])

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his']

text = 'this is the first sentence. this is the second sentence. this is the document.'

tokens = [token for token in word_tokenize(text) if token not in stop_words]

print(tokens)

['first', 'sentence', '.', 'second', 'sentence', '.', 'document', '.']

Socials

Be part of 50,000+ like-minded AI professionals across the platform

→ LinkedIn → YouTube → Twitter → Instagram → Medium

→ Telegram → Discord Server → GitHub & Code Resource

→ WhatsApp Community Group

How satisfied are you with today's Newsletter?

This will help me serve you better

Please reply to this email with your requirements or suggestions on what you want in future newsletter content.

PS: build your newsletter, → Here

Reply

or to participate.