Chapter 15: Natural Language Processing Assignments

Practice your NLP skills with the following questions, categorized by difficulty.

Basic (10 Questions)

  1. Write Python code to tokenize a given paragraph into sentences and words using NLTK.
  2. Clean a sample text by removing punctuation, digits, and converting to lowercase.
  3. Use NLTK’s stopwords list to remove stopwords from a given text.
  4. Write a program to perform stemming on a list of words using Porter Stemmer.
  5. Write a program to perform lemmatization using WordNet Lemmatizer.
  6. Compare stemming vs lemmatization on the same word list and explain differences.
  7. Extract named entities from a sample sentence using NLTK’s ne_chunk.
  8. Perform POS tagging on a given sentence and display the tags.
  9. Use TextBlob to get the polarity and subjectivity of a given sentence.
  10. Write a simple program to summarize a short paragraph using NLTK or TextBlob.

Intermediate (10 Questions)

  1. Implement a custom text cleaning pipeline that removes special characters, stopwords, and normalizes whitespace.
  2. Write a function to tokenize text and filter out tokens with length less than 3.
  3. Train a simple Naive Bayes classifier for sentiment analysis on a small labeled dataset.
  4. Perform topic modeling on a set of documents using Latent Dirichlet Allocation (LDA).
  5. Implement aspect-based sentiment analysis to identify sentiment toward product features.
  6. Use spaCy or NLTK to extract noun phrases from text.
  7. Write code to detect sarcasm or negation in sentences (basic heuristic approach).
  8. Visualize sentiment scores over time for a collection of tweets using Matplotlib or Seaborn.
  9. Use pre-trained Hugging Face transformer models for sentiment classification and interpret results.
  10. Compare the output of different stemming algorithms on the same dataset and analyze performance.

Advanced (10 Questions)

  1. Build a custom deep learning model (e.g., LSTM) for sentiment analysis using Keras or PyTorch.
  2. Implement a multi-class emotion detection system from text using transformers.
  3. Use word embeddings (e.g., Word2Vec, GloVe) to improve sentiment classification accuracy.
  4. Perform multilingual sentiment analysis using language detection and appropriate models.
  5. Develop a pipeline to combine topic modeling and sentiment analysis for social media monitoring.
  6. Implement a summarization algorithm using sequence-to-sequence deep learning models.
  7. Train a Named Entity Recognition (NER) model on a custom labeled dataset.
  8. Analyze the impact of different text preprocessing techniques on downstream NLP tasks.
  9. Develop a sarcasm detection system using deep learning approaches.
  10. Evaluate and compare the performance of lexicon-based and machine learning sentiment analysis methods on real-world datasets.