Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #2811 - Fixes mime type for some endpoints #2812

Merged
merged 3 commits into from
Mar 7, 2019

Conversation

marimeireles
Copy link
Member

Hey @karlcow this problem is similar to #2802.

The problem here is that untriaged.js uses a different endpoint that I hadn't fixed: /api/issues/category/needstriage?per_page=5

@marimeireles
Copy link
Member Author

The same issue was also happening to when you make a request to this route: @api.route('/issues/<username>/<parameter>'). I fixed the same way.
You think I should open a new issue and PR for that or we can do this in this one?

@marimeireles marimeireles changed the title Fixes 2811 - Fixes mime type of untriaged bugs Fixes 2811 - Fixes mime type for some endpoints Feb 22, 2019
@miketaylr miketaylr changed the title Fixes 2811 - Fixes mime type for some endpoints Fixes #2811 - Fixes mime type for some endpoints Feb 22, 2019
@miketaylr miketaylr requested a review from karlcow February 22, 2019 19:47
@miketaylr
Copy link
Member

@marimeireles I still see a request when testing out this branch on localhost:

screen shot 2019-02-22 at 1 45 44 pm

@marimeireles
Copy link
Member Author

Thanks @miketaylr. I hadn't noticed that.
I'll keep looking for a solution.

@marimeireles
Copy link
Member Author

@marimeireles I still see a request when testing out this branch on localhost:

@miketaylr have you tested on a clean profile? I've tested on firefox and chrome and both of them don't show the images anymore.

@miketaylr
Copy link
Member

Yes, new profile in Nightly:

screen shot 2019-02-22 at 3 36 00 pm

And same results from my regular profile in DevEdition.

And from Chrome:

screen shot 2019-02-22 at 3 37 02 pm

@miketaylr
Copy link
Member

getDescription is the initiator of the request, so our approach of parsing the body into the DOM to get the description is the culprit.

@miketaylr
Copy link
Member

It's a dumb hack, but what if we inserted the body content into an inert element.. maybe <script type=whatever>, or <pre>? I would guess the browser wouldn't bother fetching image sources.

@marimeireles
Copy link
Member Author

That's weird.
Do you think running npm run build could make any difference here?

My firefox release version:
screen shot 2019-02-22 at 18 39 59

And chrome:
screen shot 2019-02-22 at 18 40 48

@marimeireles
Copy link
Member Author

@miketaylr maybe I could also try https://developer.github.com/v3/media/#text

@miketaylr
Copy link
Member

No, npm run build makes no difference -- the JS is served unminified on localhost anyways. @marimeireles which test repo are you pointing at? If it's not webcompat/webcompat-tests, I would point it to that and see if it reproduces.

@marimeireles
Copy link
Member Author

Good point!
Ok. Now I can reproduce it.
I'll try to request text only.

@marimeireles
Copy link
Member Author

marimeireles commented Feb 22, 2019

Putting stuff inside a <pre> tag didn't work, neither the request for text only.
That's super weird.. I thought my answer here #2811 (comment) would be correct because it was happening something alike in #2802 and the HTML request was causing it.

@marimeireles
Copy link
Member Author

The images on the requests are the images from the first comment of the issue, aka, the description of the issue.
And this is happening for the home page and the sort by user page see https://staging.webcompat.com/activity/marimeireles for instance...
I was thinking about trying to clean the body issue object that the files untriaged.js and user-activity.js receives. Is this a terrible idea?

@miketaylr
Copy link
Member

I was thinking about trying to clean the body issue object that the files untriaged.js and user-activity.js receives. Is this a terrible idea?

I'm not sure it's terrible. What if we just added the title to the issue model and used that to populate the description? Isn't it the same thing?

@miketaylr
Copy link
Member

Currently we do:

title: this.getTitle(
        this.getDomain(response.title),
        this.getDescription(response.body_html),
        response.title
      )

Ah, I see why not.

However, why don't we just fix this on the python side of things on issue creation and make what we want the real title? @karlcow is there any reason we don't do that now?

@marimeireles
Copy link
Member Author

This solution Mike proposed didn't work. :/

@karlcow
Copy link
Member

karlcow commented Feb 25, 2019

so let's summarize here :)

  1. We removed the markdown thing for improving performance on client side and get the html directly
  2. Having the HTML, made it harder to parse the body
  3. Parsing the html body to convert the body into DOM, grab the images… (Why on earth there is no plain_text->html_nodes without downloading secondary resources. See below DOMParser())
  4. We are trying to jungle around it.

So that's an interesting conundrum.

When we discussed about the Content type, I pointed out to the comments API

In there, there is the application/vnd.github.VERSION.full+json which returns both JSON and HTML: body, body_text, and body_html.

If I do http GET https://api.github.com/repos/webcompat/web-bugs/issues/15000 "Accept: application/vnd.github.v3.full+json"

I get in return (I removed the other parts of the issue):

    "body": "<!-- @browser: Firefox 59.0 -->\n<!-- @ua_header: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0 -->\n<!-- @reported_with: desktop-reporter -->\n<!-- @extra_labels: type-stylo -->\n\n**URL**: https://www.prudential.com.sg/en/our-solutions/products/prushield/\n\n**Browser / Version**: Firefox 59.0\n**Operating System**: Windows 7\n**Tested Another Browser**: Yes\n\n**Problem type**: Design is broken\n**Description**: Text rendering missing\n**Steps to Reproduce**:\n\r\nlayout.css.servo.enabled: true\r\n[![Screenshot Description](https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16-thumb.jpg)](https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16.jpg)\n\n\n\n_From [webcompat.com](https://webcompat.com/) with ❤️_", 
    "body_html": "\n\n\n\n<p><strong>URL</strong>: <a rel=\"nofollow\" href=\"https://www.prudential.com.sg/en/our-solutions/products/prushield/\">https://www.prudential.com.sg/en/our-solutions/products/prushield/</a></p>\n<p><strong>Browser / Version</strong>: Firefox 59.0<br>\n<strong>Operating System</strong>: Windows 7<br>\n<strong>Tested Another Browser</strong>: Yes</p>\n<p><strong>Problem type</strong>: Design is broken<br>\n<strong>Description</strong>: Text rendering missing<br>\n<strong>Steps to Reproduce</strong>:</p>\n<p>layout.css.servo.enabled: true<br>\n<a href=\"https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16.jpg\" rel=\"nofollow\"><img src=\"https://camo.githubusercontent.com/c2797655bd8d6452c61f922cfb3c0bfc87afebd5/68747470733a2f2f776562636f6d7061742e636f6d2f75706c6f6164732f323031382f312f36393836356533312d383966302d346332342d613430642d6632366466636266326431362d7468756d622e6a7067\" alt=\"Screenshot Description\" data-canonical-src=\"https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16-thumb.jpg\" style=\"max-width:100%;\"></a></p>\n<p><em>From <a href=\"https://webcompat.com/\" rel=\"nofollow\">webcompat.com</a> with <g-emoji class=\"g-emoji\" alias=\"heart\" fallback-src=\"https://github.githubassets.com/images/icons/emoji/unicode/2764.png\">❤️</g-emoji></em></p>", 
    "body_text": "URL: https://www.prudential.com.sg/en/our-solutions/products/prushield/\nBrowser / Version: Firefox 59.0\nOperating System: Windows 7\nTested Another Browser: Yes\nProblem type: Design is broken\nDescription: Text rendering missing\nSteps to Reproduce:\nlayout.css.servo.enabled: true\n\nFrom webcompat.com with ❤️", 

DOMParser

So indeed this doesn't show image download, you can query the html, but unfortunately you can't convert to text.

var myText = "<p><img src='http://localhost:5000/uploads/2019/2/36b4786d-da2a-41f6-9646-4662794c8aaa.jpeg'> </p>";
var myparser = new DOMParser();
var myfragment = myparser.parseFromString(myText , "text/html");
myfragment.body.innerHTML
// "<p><img src=\"http://localhost:5000/uploads/2019/2/36b4786d-da2a-41f6-9646-4662794c8aaa.jpeg\"> </p>"
myfragment.body.getElementsByTagName('p')
// HTMLCollection [ p ]
 myfragment.body.getElementsByTagName('img')
// HTMLCollection [ img ]
myfragment.body.textContent
// " "
myfragment.body.innerText
// " "

@karlcow
Copy link
Member

karlcow commented Feb 25, 2019

@miketaylr

However, why don't we just fix this on the python side of things on issue creation and make what we want the real title? @karlcow is there any reason we don't do that now?

Let's rewind a bit the history of this:

  1. we were asking for the title
  2. then we added the possibility to get the description instead if it's there, and fallback on the title (cut at 75)
  3. then we are trying to reparse the html body as text to get the description.

Now we have on the python side access to

  1. The body_html "body_html": "…………"
  2. the title "title": "www.prudential.com.sg - design is broken",

Some ideas:

  • We could parse the html, then extract the text on the python side. This will have a CPU cost for html parsing. Then convert to text. Then parse the text.
  • We could request the 3 bodies then extract the text directly on the python side. This will have a bytes cost for transfert. no need to convert to text. then parse the text.

@karlcow
Copy link
Member

karlcow commented Feb 25, 2019

Ah I had missed about the context of @miketaylr asking
on issue creation and make what we want the real title?

We can for the future. :) But that doesn't change we need to address the past. :)
And that's partly already what is proposed in #2629. By creating a limit there, we can reuse that directly as a title.

@karlcow
Copy link
Member

karlcow commented Feb 25, 2019

another funny thing, we could be less constrained if we had -webkit-line-clamp ;)

@miketaylr
Copy link
Member

another funny thing, we could be less constraint if we had -webkit-line-clamp ;)

that's coming in the not too distant future. :)

@marimeireles
Copy link
Member Author

Thanks for trying to parse the content on DOM @karlcow.
I'll try to extract the text on the python side!

@marimeireles
Copy link
Member Author

This new change fixed the issue for me...
I forgot to erase one mime_type that was inside an if 🙈

@marimeireles
Copy link
Member Author

Can you please test @miketaylr ? If it works for you. -when you have the time
Thanks!

@miketaylr
Copy link
Member

Can you please test @miketaylr ? If it works for you. -when you have the time

Yep! Working here.

@marimeireles
Copy link
Member Author

Oh, so. What do we do @karlcow ?
You still think it's a better idea to extract the text on python?
I think we could leave it as it is and we can fix this #2629 in a different way.

@marimeireles
Copy link
Member Author

@karlcow does it look good to merge?

@karlcow
Copy link
Member

karlcow commented Mar 7, 2019

Ah missed this one. Sorry. I will check tonight in ~12h.

Copy link
Member

@karlcow karlcow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marimeireles shortest 12h… review. I just decided to look at it anyway, before leaving here. And yes this is good. Thanks a lot @marimeireles

@karlcow karlcow merged commit a1f1f1f into webcompat:master Mar 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants