Fixes #2811 - Fixes mime type for some endpoints #2812

marimeireles · 2019-02-22T15:42:03Z

Hey @karlcow this problem is similar to #2802.

The problem here is that untriaged.js uses a different endpoint that I hadn't fixed: /api/issues/category/needstriage?per_page=5

marimeireles · 2019-02-22T15:45:27Z

The same issue was also happening to when you make a request to this route: @api.route('/issues/<username>/<parameter>'). I fixed the same way.
You think I should open a new issue and PR for that or we can do this in this one?

miketaylr · 2019-02-22T19:47:37Z

@marimeireles I still see a request when testing out this branch on localhost:

marimeireles · 2019-02-22T20:42:55Z

Thanks @miketaylr. I hadn't noticed that.
I'll keep looking for a solution.

marimeireles · 2019-02-22T21:33:53Z

@marimeireles I still see a request when testing out this branch on localhost:

@miketaylr have you tested on a clean profile? I've tested on firefox and chrome and both of them don't show the images anymore.

miketaylr · 2019-02-22T21:37:17Z

Yes, new profile in Nightly:

And same results from my regular profile in DevEdition.

And from Chrome:

miketaylr · 2019-02-22T21:40:38Z

getDescription is the initiator of the request, so our approach of parsing the body into the DOM to get the description is the culprit.

miketaylr · 2019-02-22T21:42:43Z

It's a dumb hack, but what if we inserted the body content into an inert element.. maybe <script type=whatever>, or <pre>? I would guess the browser wouldn't bother fetching image sources.

marimeireles · 2019-02-22T21:42:45Z

That's weird.
Do you think running npm run build could make any difference here?

My firefox release version:

And chrome:

marimeireles · 2019-02-22T21:43:39Z

@miketaylr maybe I could also try https://developer.github.com/v3/media/#text

miketaylr · 2019-02-22T21:44:13Z

No, npm run build makes no difference -- the JS is served unminified on localhost anyways. @marimeireles which test repo are you pointing at? If it's not webcompat/webcompat-tests, I would point it to that and see if it reproduces.

marimeireles · 2019-02-22T21:47:22Z

Good point!
Ok. Now I can reproduce it.
I'll try to request text only.

marimeireles · 2019-02-22T22:02:46Z

Putting stuff inside a <pre> tag didn't work, neither the request for text only.
That's super weird.. I thought my answer here #2811 (comment) would be correct because it was happening something alike in #2802 and the HTML request was causing it.

marimeireles · 2019-02-22T22:22:34Z

The images on the requests are the images from the first comment of the issue, aka, the description of the issue.
And this is happening for the home page and the sort by user page see https://staging.webcompat.com/activity/marimeireles for instance...
I was thinking about trying to clean the body issue object that the files untriaged.js and user-activity.js receives. Is this a terrible idea?

miketaylr · 2019-02-22T23:04:17Z

I was thinking about trying to clean the body issue object that the files untriaged.js and user-activity.js receives. Is this a terrible idea?

I'm not sure it's terrible. What if we just added the title to the issue model and used that to populate the description? Isn't it the same thing?

miketaylr · 2019-02-22T23:07:05Z

Currently we do:

title: this.getTitle(
        this.getDomain(response.title),
        this.getDescription(response.body_html),
        response.title
      )

Ah, I see why not.

However, why don't we just fix this on the python side of things on issue creation and make what we want the real title? @karlcow is there any reason we don't do that now?

marimeireles · 2019-02-22T23:30:41Z

This solution Mike proposed didn't work. :/

karlcow · 2019-02-25T00:36:16Z

so let's summarize here :)

We removed the markdown thing for improving performance on client side and get the html directly
Having the HTML, made it harder to parse the body
Parsing the html body to convert the body into DOM, grab the images… (Why on earth there is no plain_text->html_nodes without downloading secondary resources. See below DOMParser())
We are trying to jungle around it.

So that's an interesting conundrum.

When we discussed about the Content type, I pointed out to the comments API

In there, there is the application/vnd.github.VERSION.full+json which returns both JSON and HTML: body, body_text, and body_html.

If I do http GET https://api.github.com/repos/webcompat/web-bugs/issues/15000 "Accept: application/vnd.github.v3.full+json"

I get in return (I removed the other parts of the issue):

    "body": "<!-- @browser: Firefox 59.0 -->\n<!-- @ua_header: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0 -->\n<!-- @reported_with: desktop-reporter -->\n<!-- @extra_labels: type-stylo -->\n\n**URL**: https://www.prudential.com.sg/en/our-solutions/products/prushield/\n\n**Browser / Version**: Firefox 59.0\n**Operating System**: Windows 7\n**Tested Another Browser**: Yes\n\n**Problem type**: Design is broken\n**Description**: Text rendering missing\n**Steps to Reproduce**:\n\r\nlayout.css.servo.enabled: true\r\n[![Screenshot Description](https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16-thumb.jpg)](https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16.jpg)\n\n\n\n_From [webcompat.com](https://webcompat.com/) with ❤️_", 
    "body_html": "\n\n\n\n<p><strong>URL</strong>: <a rel=\"nofollow\" href=\"https://www.prudential.com.sg/en/our-solutions/products/prushield/\">https://www.prudential.com.sg/en/our-solutions/products/prushield/</a></p>\n<p><strong>Browser / Version</strong>: Firefox 59.0<br>\n<strong>Operating System</strong>: Windows 7<br>\n<strong>Tested Another Browser</strong>: Yes</p>\n<p><strong>Problem type</strong>: Design is broken<br>\n<strong>Description</strong>: Text rendering missing<br>\n<strong>Steps to Reproduce</strong>:</p>\n<p>layout.css.servo.enabled: true<br>\n<a href=\"https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16.jpg\" rel=\"nofollow\"><img src=\"https://camo.githubusercontent.com/c2797655bd8d6452c61f922cfb3c0bfc87afebd5/68747470733a2f2f776562636f6d7061742e636f6d2f75706c6f6164732f323031382f312f36393836356533312d383966302d346332342d613430642d6632366466636266326431362d7468756d622e6a7067\" alt=\"Screenshot Description\" data-canonical-src=\"https://webcompat.com/uploads/2018/1/69865e31-89f0-4c24-a40d-f26dfcbf2d16-thumb.jpg\" style=\"max-width:100%;\"></a></p>\n<p><em>From <a href=\"https://webcompat.com/\" rel=\"nofollow\">webcompat.com</a> with <g-emoji class=\"g-emoji\" alias=\"heart\" fallback-src=\"https://github.githubassets.com/images/icons/emoji/unicode/2764.png\">❤️</g-emoji></em></p>", 
    "body_text": "URL: https://www.prudential.com.sg/en/our-solutions/products/prushield/\nBrowser / Version: Firefox 59.0\nOperating System: Windows 7\nTested Another Browser: Yes\nProblem type: Design is broken\nDescription: Text rendering missing\nSteps to Reproduce:\nlayout.css.servo.enabled: true\n\nFrom webcompat.com with ❤️",

DOMParser

So indeed this doesn't show image download, you can query the html, but unfortunately you can't convert to text.

var myText = "<p><img src='http://localhost:5000/uploads/2019/2/36b4786d-da2a-41f6-9646-4662794c8aaa.jpeg'> </p>";
var myparser = new DOMParser();
var myfragment = myparser.parseFromString(myText , "text/html");
myfragment.body.innerHTML
// "<p><img src=\"http://localhost:5000/uploads/2019/2/36b4786d-da2a-41f6-9646-4662794c8aaa.jpeg\"> </p>"
myfragment.body.getElementsByTagName('p')
// HTMLCollection [ p ]
 myfragment.body.getElementsByTagName('img')
// HTMLCollection [ img ]
myfragment.body.textContent
// " "
myfragment.body.innerText
// " "

karlcow · 2019-02-25T00:44:42Z

@miketaylr

However, why don't we just fix this on the python side of things on issue creation and make what we want the real title? @karlcow is there any reason we don't do that now?

Let's rewind a bit the history of this:

we were asking for the title
then we added the possibility to get the description instead if it's there, and fallback on the title (cut at 75)
then we are trying to reparse the html body as text to get the description.

Now we have on the python side access to

The body_html "body_html": "…………"
the title "title": "www.prudential.com.sg - design is broken",

Some ideas:

We could parse the html, then extract the text on the python side. This will have a CPU cost for html parsing. Then convert to text. Then parse the text.
We could request the 3 bodies then extract the text directly on the python side. This will have a bytes cost for transfert. no need to convert to text. then parse the text.

karlcow · 2019-02-25T01:26:57Z

Ah I had missed about the context of @miketaylr asking
on issue creation and make what we want the real title?

We can for the future. :) But that doesn't change we need to address the past. :)
And that's partly already what is proposed in #2629. By creating a limit there, we can reuse that directly as a title.

karlcow · 2019-02-25T01:27:30Z

another funny thing, we could be less constrained if we had -webkit-line-clamp ;)

miketaylr · 2019-02-25T15:04:51Z

another funny thing, we could be less constraint if we had -webkit-line-clamp ;)

that's coming in the not too distant future. :)

marimeireles · 2019-02-26T17:46:05Z

Thanks for trying to parse the content on DOM @karlcow.
I'll try to extract the text on the python side!

marimeireles · 2019-02-26T18:42:05Z

This new change fixed the issue for me...
I forgot to erase one mime_type that was inside an if 🙈

marimeireles · 2019-02-26T18:42:40Z

Can you please test @miketaylr ? If it works for you. -when you have the time
Thanks!

miketaylr · 2019-02-26T18:46:33Z

Can you please test @miketaylr ? If it works for you. -when you have the time

Yep! Working here.

marimeireles · 2019-02-26T18:50:05Z

Oh, so. What do we do @karlcow ?
You still think it's a better idea to extract the text on python?
I think we could leave it as it is and we can fix this #2629 in a different way.

marimeireles · 2019-03-06T16:43:48Z

@karlcow does it look good to merge?

karlcow · 2019-03-07T07:26:26Z

Ah missed this one. Sorry. I will check tonight in ~12h.

karlcow

@marimeireles shortest 12h… review. I just decided to look at it anyway, before leaving here. And yes this is good. Thanks a lot @marimeireles

marimeireles added 2 commits February 22, 2019 12:38

Fixes 2811 - Fixes mime type of untriaged bugs

6a43c2b

Fixes 2811 - Fixes the mime type of issues by user

16e3912

marimeireles changed the title ~~Fixes 2811 - Fixes mime type of untriaged bugs~~ Fixes 2811 - Fixes mime type for some endpoints Feb 22, 2019

miketaylr changed the title ~~Fixes 2811 - Fixes mime type for some endpoints~~ Fixes #2811 - Fixes mime type for some endpoints Feb 22, 2019

miketaylr requested a review from karlcow February 22, 2019 19:47

marimeireles added the status: do not merge label Feb 22, 2019

karlcow mentioned this pull request Feb 25, 2019

Set a max characters number on Description #2629

Open

Fixes 2811 - Removes html mime type from endpoint

bb112d6

marimeireles removed the status: do not merge label Feb 27, 2019

karlcow approved these changes Mar 7, 2019

View reviewed changes

karlcow merged commit a1f1f1f into webcompat:master Mar 7, 2019

karlcow added status: add to changelog and removed status: add to changelog labels Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #2811 - Fixes mime type for some endpoints #2812

Fixes #2811 - Fixes mime type for some endpoints #2812

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

miketaylr commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019 •

edited

Loading

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

karlcow commented Feb 25, 2019

karlcow commented Feb 25, 2019

karlcow commented Feb 25, 2019

karlcow commented Feb 25, 2019 •

edited

Loading

miketaylr commented Feb 25, 2019

marimeireles commented Feb 26, 2019

marimeireles commented Feb 26, 2019

marimeireles commented Feb 26, 2019

miketaylr commented Feb 26, 2019

marimeireles commented Feb 26, 2019

marimeireles commented Mar 6, 2019

karlcow commented Mar 7, 2019 •

edited

Loading

karlcow left a comment

Fixes #2811 - Fixes mime type for some endpoints #2812

Fixes #2811 - Fixes mime type for some endpoints #2812

Conversation

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

miketaylr commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

marimeireles commented Feb 22, 2019 • edited Loading

marimeireles commented Feb 22, 2019

miketaylr commented Feb 22, 2019

miketaylr commented Feb 22, 2019

marimeireles commented Feb 22, 2019

karlcow commented Feb 25, 2019

DOMParser

karlcow commented Feb 25, 2019

karlcow commented Feb 25, 2019

karlcow commented Feb 25, 2019 • edited Loading

miketaylr commented Feb 25, 2019

marimeireles commented Feb 26, 2019

marimeireles commented Feb 26, 2019

marimeireles commented Feb 26, 2019

miketaylr commented Feb 26, 2019

marimeireles commented Feb 26, 2019

marimeireles commented Mar 6, 2019

karlcow commented Mar 7, 2019 • edited Loading

karlcow left a comment

Choose a reason for hiding this comment

marimeireles commented Feb 22, 2019 •

edited

Loading

karlcow commented Feb 25, 2019 •

edited

Loading

karlcow commented Mar 7, 2019 •

edited

Loading