pawankg
diff --git a/‎SpellChecker_using_NLP/EditDistExample1.PNG
82.2 KB b/‎SpellChecker_using_NLP/EditDistExample1.PNG
82.2 KB
diff --git a/‎SpellChecker_using_NLP/EditDistFill2.PNG
109 KB b/‎SpellChecker_using_NLP/EditDistFill2.PNG
109 KB
diff --git a/‎SpellChecker_using_NLP/EditDistInit4.PNG
68.7 KB b/‎SpellChecker_using_NLP/EditDistInit4.PNG
68.7 KB
diff --git a/‎SpellChecker_using_NLP/GenericListComp3.PNG
36.8 KB b/‎SpellChecker_using_NLP/GenericListComp3.PNG
36.8 KB
diff --git a/‎SpellChecker_using_NLP/III_Spellchecker_using_NLP.ipynb
+2,250 b/‎SpellChecker_using_NLP/III_Spellchecker_using_NLP.ipynb
+2,250
diff --git a/‎SpellChecker_using_NLP/II_Candidates_from_String_Edits.ipynb
+311 b/‎SpellChecker_using_NLP/II_Candidates_from_String_Edits.ipynb
+311
@@ -0,0 +1,311 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# NLP : Building The Model\n",
+    "\n",
+    "<br>\n",
+    "# Candidates from String Edits\n",
+    "Create a list of candidate strings by applying an edit operation\n",
+    "<br>\n",
+    "### Imports and Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# data\n",
+    "word = 'dearz' # 🦌"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Splits\n",
+    "Find all the ways you can split a word into 2 parts !"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['', 'dearz']\n",
+      "['d', 'earz']\n",
+      "['de', 'arz']\n",
+      "['dea', 'rz']\n",
+      "['dear', 'z']\n",
+      "['dearz', '']\n"
+     ]
+    }
+   ],
+   "source": [
+    "# splits with a loop\n",
+    "splits_a = []\n",
+    "for i in range(len(word)+1):\n",
+    "    splits_a.append([word[:i],word[i:]])\n",
+    "\n",
+    "for i in splits_a:\n",
+    "    print(i)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "('', 'dearz')\n",
+      "('d', 'earz')\n",
+      "('de', 'arz')\n",
+      "('dea', 'rz')\n",
+      "('dear', 'z')\n",
+      "('dearz', '')\n"
+     ]
+    }
+   ],
+   "source": [
+    "# same splits, done using a list comprehension\n",
+    "splits_b = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n",
+    "\n",
+    "for i in splits_b:\n",
+    "    print(i)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Delete Edit\n",
+    "Delete a letter from each string in the `splits` list.\n",
+    "<br>\n",
+    "What this does is effectivly delete each possible letter from the original word being edited. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "word :  dearz\n",
+      "earz  <-- delete  d\n",
+      "darz  <-- delete  e\n",
+      "derz  <-- delete  a\n",
+      "deaz  <-- delete  r\n",
+      "dear  <-- delete  z\n"
+     ]
+    }
+   ],
+   "source": [
+    "# deletes with a loop\n",
+    "splits = splits_a\n",
+    "deletes = []\n",
+    "\n",
+    "print('word : ', word)\n",
+    "for L,R in splits:\n",
+    "    if R:\n",
+    "        print(L + R[1:], ' <-- delete ', R[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It's worth taking a closer look at how this is excecuting a 'delete'.\n",
+    "<br>\n",
+    "Taking the first item from the `splits` list :"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "word :  dearz\n",
+      "first item from the splits list :  ['', 'dearz']\n",
+      "L :  \n",
+      "R :  dearz\n",
+      "*** now implicit delete by excluding the leading letter ***\n",
+      "L + R[1:] :  earz  <-- delete  d\n"
+     ]
+    }
+   ],
+   "source": [
+    "# breaking it down\n",
+    "print('word : ', word)\n",
+    "one_split = splits[0]\n",
+    "print('first item from the splits list : ', one_split)\n",
+    "L = one_split[0]\n",
+    "R = one_split[1]\n",
+    "print('L : ', L)\n",
+    "print('R : ', R)\n",
+    "print('*** now implicit delete by excluding the leading letter ***')\n",
+    "print('L + R[1:] : ',L + R[1:], ' <-- delete ', R[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So the end result transforms **'dearz'** to **'earz'** by deleting the first character.\n",
+    "<br>\n",
+    "And you use a **loop** (code block above) or a **list comprehension** (code block below) to do\n",
+    "<br>\n",
+    "this for the entire `splits` list."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['earz', 'darz', 'derz', 'deaz', 'dear']\n",
+      "*** which is the same as ***\n",
+      "earz\n",
+      "darz\n",
+      "derz\n",
+      "deaz\n",
+      "dear\n"
+     ]
+    }
+   ],
+   "source": [
+    "# deletes with a list comprehension\n",
+    "splits = splits_a\n",
+    "deletes = [L + R[1:] for L, R in splits if R]\n",
+    "\n",
+    "print(deletes)\n",
+    "print('*** which is the same as ***')\n",
+    "for i in deletes:\n",
+    "    print(i)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Ungraded Exercise\n",
+    "You now have a list of ***candidate strings*** created after performing a **delete** edit.\n",
+    "<br>\n",
+    "Next step will be to filter this list for ***candidate words*** found in a vocabulary.\n",
+    "<br>\n",
+    "Given the example vocab below, can you think of a way to create a list of candidate words ? \n",
+    "<br>\n",
+    "Remember, you already have a list of candidate strings, some of which are certainly not actual words you might find in your vocabulary !\n",
+    "<br>\n",
+    "<br>\n",
+    "So from the above list **earz, darz, derz, deaz, dear**. \n",
+    "<br>\n",
+    "You're really only interested in **dear**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "vocab :  ['dean', 'deer', 'dear', 'fries', 'and', 'coke']\n",
+      "edits :  ['earz', 'darz', 'derz', 'deaz', 'dear']\n",
+      "candidate words :  {'dear'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "vocab = ['dean','deer','dear','fries','and','coke']\n",
+    "# edits = list(deletes)\n",
+    "\n",
+    "print('vocab : ', vocab)\n",
+    "print('edits : ', edits)\n",
+    "\n",
+    "# vocal = set(vocab)\n",
+    "# edits = set(edits)\n",
+    "\n",
+    "candidates=[]\n",
+    "\n",
+    "### START CODE HERE ###\n",
+    "#candidates = ??  # hint: 'set.intersection'\n",
+    "# cadidates = set(vocab).intersection(set(edits))\n",
+    "### END CODE HERE ###\n",
+    "\n",
+    "print('candidate words : ', set(vocab).intersection(set(edits)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Expected Outcome:\n",
+    "\n",
+    "vocab :  ['dean', 'deer', 'dear', 'fries', 'and', 'coke']\n",
+    "<br>\n",
+    "edits :  ['earz', 'darz', 'derz', 'deaz', 'dear']\n",
+    "<br>\n",
+    "candidate words :  {'dear'}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Summary\n",
+    "You've unpacked an integral part of the assignment by breaking down **splits** and **edits**, specifically looking at **deletes** here.\n",
+    "<br>\n",
+    "Implementation of the other edit types (insert, replace, switch) follow a similar methodology and should now feel somewhat familiar when you see them.\n",
+    "<br>\n",
+    "This bit of the code isn't as intuitive as other sections, so well done!\n",
+    "<br>\n",
+    "You should now feel confident facing some of the more technical parts of the assignment at the end of the week."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}