Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #2659 - Make console logs in browser configuration details easier to read #3103

Merged
merged 7 commits into from
Dec 16, 2019

Conversation

ksy36
Copy link
Contributor

@ksy36 ksy36 commented Dec 3, 2019


from webcompat import app

console_logs = Blueprint('console_logs', __name__, url_prefix='/console_logs')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have a super strong opinion, but is there a reason you didn't just add support for uploading JSON content in the /uploads route? What are the advantages of having a new route to handle uploads of a different kind of file type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should probably make the code versatile enough so that it can handle images and json.So it would be probably better to have the uploads route (blueprint) to behave differently with the content-type. So depending on the Content-Type, there is a validation process depending probably on the content of the file, size, etc. Like we do already for images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've moved console log saving to uploads blueprint, could you please take another look @miketaylr @karlcow

Copy link
Member

@karlcow karlcow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksy36 Thanks a lot for working on this.
So sorry for the delay in reviewing.
A couple of things in there and Probably a big change on making the Upload more generic.


from webcompat import app

console_logs = Blueprint('console_logs', __name__, url_prefix='/console_logs')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should probably make the code versatile enough so that it can handle images and json.So it would be probably better to have the uploads route (blueprint) to behave differently with the content-type. So depending on the Content-Type, there is a validation process depending probably on the content of the file, size, etc. Like we do already for images.

@@ -151,6 +151,11 @@ def test_contact(self):
rv = self.app.get('/contact')
self.assertEqual(rv.status_code, 200)

def test_console_logs(self):
"""Test that /console_logs exists."""
rv = self.app.get('/console_logs/11/20/1c0a7174-2c15-480f-9cee-e6183e6a781b') # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self, we probably needs to create a console_logs directory

Copy link
Member

@karlcow karlcow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot better. Wonderful @ksy36 for the nicely done job.
I will check tomorrow morning again to make extra-sure I didn'T miss anything and I will torture a bit the uploads.

Thanks 👩‍🚀 🚀 ⭐️

@miketaylr
Copy link
Member

miketaylr commented Dec 10, 2019

🎈 I'll deploy this to staging so we can play around with it.

(before merging)

@miketaylr
Copy link
Member

@ksy36 I filed a test issue on staging:

https://staging.webcompat.com/issues/2274

Screen Shot 2019-12-10 at 12 57 03 PM

@miketaylr
Copy link
Member

Screen Shot 2019-12-10 at 1 00 14 PM

Looks like we just need to sanitize the JSON, as it may contain HTML strings (and would be an XSS attack vector).

@karlcow
Copy link
Member

karlcow commented Dec 11, 2019

@ksy36

Looks like we just need to sanitize the JSON, as it may contain HTML strings (and would be an XSS attack vector).

so that's what I wanted to check but @miketaylr beat me to it. Cool!
my hunch was motivated by

def get_data_from_request(request):
if 'image' in request.files and request.files['image'].filename:
return True, request.files['image']
elif 'image' in request.form:
return True, request.form['image']
elif 'console_logs' in request.form:
return False, request.form['console_logs']
else:
return False, None

Instead of True, False here the function should probably report a type, be Image or JSON (These could be strings). To return this specific type we need to validate it, the validation criteria needs to make it clear what we accept or not.

Then there is the question that @miketaylr was pointing about sanitization. server side or client side?

@ksy36
Copy link
Contributor Author

ksy36 commented Dec 11, 2019

Instead of True, False here the function should probably report a type, be Image or JSON (These could be strings). To return this specific type we need to validate it, the validation criteria needs to make it clear what we accept or not.

For both of them validation happens later on, in their corresponding classes after an instance is created and before saving attempt. I wanted to separate these two paths (image/json) and have related methods contained within their classes.

So get_data_from_request method just meant to indicate which class should be used based on request contents. I'll change True/False to return type as string (image/json) as @karlcow suggests, since now it's a bit confusing.

And validation we have these:

For image (existing validation):

def to_image_object(self, imagedata):
'''Method to return a Pillow Image object from the raw imagedata.'''
try:
# Is this a file upload (i.e., issue form upload)?
if isinstance(imagedata, FileStorage):
return Image.open(imagedata)
# Is this a base64 encoded image (i.e., bug report screenshot)?
if (isinstance(imagedata, str) and
imagedata.startswith('data:image/')):
# Chop off 'data:image/.+;base64,' before decoding
imagedata = re.sub('^data:image/.+;base64,', '', imagedata)
# This will fix any incorrectly padded data.
imagedata = imagedata + '==='
return Image.open(io.BytesIO(base64.b64decode(imagedata)))
raise TypeError('TypeError: Not a valid image format')
except TypeError:
# Not a valid format
abort(415)

For json, it just checks that passed string is a valid JSON

def process_data(self, data):
try:
return json.loads(data)
except ValueError:
abort(400)

Maybe we could use something like https://python-jsonschema.readthedocs.io/en/latest/ to validate json schema.

What do you think @karlcow ?

@ksy36
Copy link
Contributor Author

ksy36 commented Dec 11, 2019

For sanitization part, @miketaylr mentioned https://api.jquery.com/text/ will escape HTML tags. So I've tried it and it works ok, I'll test it a bit more :)

@karlcow
Copy link
Member

karlcow commented Dec 11, 2019

So get_data_from_request method just meant to indicate which class should be used based on request contents. I'll change True/False to return type as string (image/json) as @karlcow suggests, since now it's a bit confusing.

Good idea.

For image (existing validation):

def to_image_object(self, imagedata):
'''Method to return a Pillow Image object from the raw imagedata.'''
try:
# Is this a file upload (i.e., issue form upload)?
if isinstance(imagedata, FileStorage):
return Image.open(imagedata)
# Is this a base64 encoded image (i.e., bug report screenshot)?
if (isinstance(imagedata, str) and
imagedata.startswith('data:image/')):
# Chop off 'data:image/.+;base64,' before decoding
imagedata = re.sub('^data:image/.+;base64,', '', imagedata)
# This will fix any incorrectly padded data.
imagedata = imagedata + '==='
return Image.open(io.BytesIO(base64.b64decode(imagedata)))
raise TypeError('TypeError: Not a valid image format')
except TypeError:
# Not a valid format
abort(415)

For json, it just checks that passed string is a valid JSON

def process_data(self, data):
try:
return json.loads(data)
except ValueError:
abort(400)

So for images we try to convert to a Pillow Object (and Pillow handles the validation by raising TypeError if it doesn't work)

Probably we should rename the process_data as process_json
The schema validation is an interesting idea, but I have the fear that we would put ourselves in an dependency maintenance nightmare. Aka each time devtools is changing the format we would have to update the schema. So probably not necessary right now, but open question.

Thanks a lot @ksy36

@ksy36
Copy link
Contributor Author

ksy36 commented Dec 11, 2019

So I'm thinking maybe a safer way for sanitization is to escape html tags before saving to the server. That way we don't have malicious code uploaded (in theory :)).

I've tried this and it seems to be working good given that the structure of json we get from the reporter is consistent:

    import html
    ....
    def process_json(self, data):
        try:
            d = json.loads(data)
            new_set = []
            for item in d:
                new_dict = {}
                for key, value in item.items():
                    if isinstance(value, list):
                        new_dict[key] = ', '.join(list(map(lambda x: html.escape(x), value)))
                    else:
                        new_dict[key] = html.escape(value)
                new_set.append(new_dict)
            return new_set
        except ValueError:
            abort(400)

So it converts strings like this

Loading failed for the <script> with source https://...ads.js.

To be:
Loading failed for the &lt;script&gt; with source https://...ads.js.

This involves some extra processing on the python side though. Do you have any thoughts on this @karlcow @miketaylr ?

@miketaylr
Copy link
Member

I agree escaping before we save to disk is a better idea - we don't want to accidentally regress the client side escape...

Looking at https://github.com/webcompat/webcompat.com/pull/3103/files#diff-fbcd0a4c752e94ffddefb108ac14160fR196

I found the following answer on stackoverflow: https://stackoverflow.com/a/27382959

Which points to htmlsafe_dumps which comes with the flask json module: https://github.com/pallets/flask/blob/1351d0a56580df36872b466eb245e7634c20dab5/src/flask/json/__init__.py#L264.

Can we just use that instead of dumps?

@ksy36
Copy link
Contributor Author

ksy36 commented Dec 15, 2019

Moving json rendering to the server (with jinja) helped with sanitization. So a <script> element, for example, is returned as &lt;script&gt; with no extra processing on our side. Also the js script is no longer needed :)

Copy link
Member

@miketaylr miketaylr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants