Added test and removed dead code for Sanity Checker dealing with maps with same key #153

leahmcguire · 2018-10-08T19:55:43Z

Related issues
#151

Describe the proposed solution
Looks like this error was caused by some dead code in sanity checker. I removed this code and added a test for maps with the same keys.

tovbinm · 2018-10-08T19:59:26Z

core/src/test/scala/com/salesforce/op/stages/impl/preparators/SanityCheckerTest.scala

+    val features = Seq(id, target, textMap1, textMap2, doubleMap).transmogrify()
+    val checked = targetResponse.sanityCheck(features)
+    val output = new OpWorkflow().setResultFeatures(checked).transform(mapDataFrame)
+    output.show()


please add some actual checks. perhaps check metadata produced?

please remove output.show()

tovbinm · 2018-10-08T20:00:57Z

core/src/main/scala/com/salesforce/op/stages/impl/preparators/SanityChecker.scala

-
-    nullGroups.groupBy(_._1).foreach {
-      case (group, cols) =>
-        require(cols.length == 1, s"Vector column $group has multiple null indicator fields: $cols")


since you are removing this check, also remove this ignored test -

TransmogrifAI/core/src/test/scala/com/salesforce/op/stages/impl/preparators/BadFeatureZooTest.scala

Line 107 in f37113b

ignore should "Group groupings separately for transformations computed on same feature" in {

codecov · 2018-10-08T21:02:51Z

Codecov Report

❗ No coverage uploaded for pull request base (master@b1aec92). Click here to learn what that means.
The diff coverage is 95.34%.

@@            Coverage Diff            @@
##             master     #153   +/-   ##
=========================================
  Coverage          ?   86.36%           
=========================================
  Files             ?      299           
  Lines             ?     9749           
  Branches          ?      551           
=========================================
  Hits              ?     8420           
  Misses            ?     1329           
  Partials          ?        0

Impacted Files	Coverage Δ
...sforce/op/utils/spark/OpVectorColumnMetadata.scala	`75.55% <0%> (ø)`
...rce/op/stages/impl/preparators/SanityChecker.scala	`91.82% <97.61%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1aec92...44c14bb. Read the comment docs.

…to lm/sanityFix

Jauntbox · 2018-10-08T21:55:23Z

core/src/main/scala/com/salesforce/op/stages/impl/preparators/SanityChecker.scala

-        grouping <- col.grouping
-      } yield (grouping, (col, col.index))
-
-    nullGroups.groupBy(_._1).foreach {


Do you think we should take out the whole check for multiple null indicators right now?

Afaik, we still have an issue when a single parent feature is vectorized with two different unary stages (eg. if you were to vectorize a real with both the normal RealVectorizer and also bucketize). SanityChecker will still group these into a single contingency matrix since their grouping will just be the parent feature name, so I think we should leave this check in until we fix that issue.

Do you really think it is better to fail the whole flow when that happens? I would be better to automatically drop one than fail the flow...

Or even have some incorrect statistics (in my opinion)

Can we at least keep it in as a warning we log so that we can monitor when it's happening?

@Jauntbox I fixed it so that it will remove duplicates from the categorical calculations

tovbinm · 2018-10-11T02:19:31Z

features/src/main/scala/com/salesforce/op/utils/spark/OpVectorColumnMetadata.scala

+   * Get the feature grouping qualified by the parent feature name
+   * @return Optional string of feature grouping
+   */
+  def featureGroup(): Option[String] = grouping.map(g => s"${parentFeatureName.mkString}${g}")


dont you want to do something similar to s"${parentFeatureName.mkString("_")}${grouping.map("_" + _).getOrElse("")}" (same as we do in makeColName() function)

tovbinm · 2018-10-15T17:20:24Z

core/src/test/scala/com/salesforce/op/stages/impl/preparators/SanityCheckerTest.scala

 import com.salesforce.op.stages.impl.feature.{HashSpaceStrategy, RealNNVectorizer, SmartTextMapVectorizer}
 import com.salesforce.op.test.{OpEstimatorSpec, TestFeatureBuilder, TestSparkContext}
 import com.salesforce.op.utils.json.JsonUtils
+=======


@leahmcguire merge conflict?

tovbinm · 2018-10-15T17:29:59Z

features/src/main/scala/com/salesforce/op/utils/spark/OpVectorColumnMetadata.scala

+   * Get the feature grouping qualified by the parent feature name
+   * @return Optional string of feature grouping
+   */
+  def featureGroup(): Option[String] = grouping.map(g => s"${parentFeatureName.mkString("_")}_$g")


@leahmcguire repeating my previous comment here - dont you want to do something similar to s"${parentFeatureName.mkString("_")}${grouping.map("_" + _).getOrElse("")}" (same as we do in makeColName() function)?

Especially because you do call .get here - https://github.com/salesforce/TransmogrifAI/pull/153/files#diff-55c6247553ff142cb1657c1bac3c728fR459

no because before calling that get there is a check on whether this is a None which determines if the categorical stats are computed

tovbinm

lgtm

… with same key (#153)

added test and removed dead code

dfbb6fe

leahmcguire requested a review from tovbinm as a code owner October 8, 2018 19:55

leahmcguire requested a review from Jauntbox October 8, 2018 19:55

tovbinm reviewed Oct 8, 2018

View reviewed changes

Merge branch 'master' into lm/sanityFix

7b2c0dd

leahmcguire added 3 commits October 8, 2018 14:36

addressing comments

11bce1a

Merge branch 'lm/sanityFix' of github.com:salesforce/TransmogrifAI in…

c44c2fa

…to lm/sanityFix

added meta check

5d2c26c

Jauntbox reviewed Oct 8, 2018

View reviewed changes

leahmcguire added the work in progress label Oct 10, 2018

changed group usage in sanity checker to qualify by parent feature

9b79f27

tovbinm reviewed Oct 11, 2018

View reviewed changes

leahmcguire added 2 commits October 15, 2018 10:18

added test of stats and cleaned up

5c086b3

merged

c348134

tovbinm reviewed Oct 15, 2018

View reviewed changes

merged

44c14bb

leahmcguire added ready for review and removed work in progress labels Oct 15, 2018

tovbinm reviewed Oct 15, 2018

View reviewed changes

tovbinm approved these changes Oct 16, 2018

View reviewed changes

leahmcguire merged commit a35108b into master Oct 16, 2018

leahmcguire deleted the lm/sanityFix branch October 16, 2018 17:20

ericwayman pushed a commit that referenced this pull request Feb 8, 2019

Added test and removed dead code for Sanity Checker dealing with maps…

d4e804b

… with same key (#153)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added test and removed dead code for Sanity Checker dealing with maps with same key #153

Added test and removed dead code for Sanity Checker dealing with maps with same key #153

leahmcguire commented Oct 8, 2018

tovbinm Oct 8, 2018

tovbinm Oct 8, 2018

codecov bot commented Oct 8, 2018 •

edited

Loading

Jauntbox Oct 8, 2018

leahmcguire Oct 9, 2018

leahmcguire Oct 9, 2018

Jauntbox Oct 9, 2018

leahmcguire Oct 12, 2018

tovbinm Oct 11, 2018

tovbinm Oct 15, 2018 •

edited

Loading

tovbinm Oct 15, 2018 •

edited

Loading

tovbinm Oct 15, 2018

leahmcguire Oct 15, 2018

tovbinm left a comment

Added test and removed dead code for Sanity Checker dealing with maps with same key #153

Added test and removed dead code for Sanity Checker dealing with maps with same key #153

Conversation

leahmcguire commented Oct 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 8, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovbinm Oct 15, 2018 • edited Loading

Choose a reason for hiding this comment

tovbinm Oct 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovbinm left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 8, 2018 •

edited

Loading

tovbinm Oct 15, 2018 •

edited

Loading

tovbinm Oct 15, 2018 •

edited

Loading