Exception handling and bug removal #211
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issue
[using efficient data structure to reduce memory and add exceptio handling code]
Description
[Removed bug in code which caused OSError and PermissionError and added error handling code incase the directory already exists to prevent exception by adding code snippet: import os
os.makedirs('data_scrapped', exist_ok=True)
df.to_csv('data_scrapped/data_rotten_tomatoes.csv', index=False)
Also added additional exception handling blocks in case movie titles or reviews doesn't exist def getReviewText(review_url):
'''Returns the user review text given the review soup.'''
tag = review_url.find('p', attrs={'class': 'review-text'}) # Use select_one for efficient CSS selector
if tag:
return tag.get_text(strip=True) # Use strip=True to remove extra whitespace
return None # Handle case where review text is not found
def getMovieTitle(review_url):
'''Returns the movie title from the review soup.'''
tag = review_url.find('title')
if tag:
title_tag = list(tag.children)[0].get_text()
movie_title = title_tag.split(' - Movie Reviews | Rotten Tomatoes')[0]
return movie_title
return None # Handle case where title is not found
To use less memory use set instead of dict.fromkeys() to remove duplicates # remove duplicate links
unique_movie_links = list(set(tag['href'] for tag in movie_tags))
To remove ModuleNotFoundError: No module named 'textblob' exception added pip install textblob]
Type of PR
Screenshots / videos (if applicable)
[Attach any relevant screenshots or videos demonstrating the changes]

Checklist:
Additional context:
[I would also like to add more documentation to code snippets to help others understand code better]