Skip to content

Fix how quotes are handled in guess phase #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 12, 2019

Conversation

davidanthoff
Copy link
Member

@davidanthoff davidanthoff commented Dec 31, 2018

Fixes #97.

The problem with the old version is that the whole guess phase breaks down if there are line feeds inside quotes in the rows that are used to guess the column types.

I want to trigger code coverage first to get a sense what extra tests I need to add.

@shashi Alright, now ready to be reviewed.

@codecov-io
Copy link

codecov-io commented Dec 31, 2018

Codecov Report

Merging #98 into master will increase coverage by 0.92%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #98      +/-   ##
==========================================
+ Coverage   74.76%   75.68%   +0.92%     
==========================================
  Files          11       11              
  Lines        1153     1197      +44     
==========================================
+ Hits          862      906      +44     
  Misses        291      291
Impacted Files Coverage Δ
src/util.jl 74.13% <100%> (+8.75%) ⬆️
src/csv.jl 78.99% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e34a12a...aa6993f. Read the comment docs.

@davidanthoff davidanthoff changed the title WIP Fix how quotes are handled in guess phase Fix how quotes are handled in guess phase Jan 1, 2019
@davidanthoff davidanthoff requested a review from shashi January 1, 2019 18:56
@shashi
Copy link
Collaborator

shashi commented Jan 3, 2019

Is there any hope of using the existing quoted String token parser with includenewline=true? I think it will be awesome if we can use the same code since this code is non trivial.

I think one can just parse quoted stringtoken fields until a field ends exactly at a new line. There's no need to getrowend or even quotedsplit.

@davidanthoff
Copy link
Member Author

Yeah, probably. I started writing getrowend, thinking it would be trivial. Then I thought of more and more corner situations, addressed them, and ended up with this PR... At that point I also felt that there must be a better way to reuse existing stuff.

I think one can just parse quoted stringtoken fields until a field ends exactly at a new line. There's no need to getrowend or even quotedsplit.

That sounds very plausible. I'm not sure when I'll have time again to get to that, though... Any chance you might take a stab at this?

@davidanthoff
Copy link
Member Author

@shashi what should we do about this? Could we maybe merge it for now? It seems to fix a pretty clear bug for now. I agree it is not the most elegant way to do this, and it would be nicer to use the existing machinery for that, but right now from my point of view it would be preferable to fix this bug, and then we can see if someone has time later to make the code more elegant?

@shashi shashi merged commit 529b529 into queryverse:master Jan 12, 2019
@shashi
Copy link
Collaborator

shashi commented Jan 12, 2019

Sounds good! :-)

@davidanthoff davidanthoff deleted the fix-guess-with-quotes branch January 12, 2019 16:44
@davidanthoff davidanthoff mentioned this pull request Jan 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants