Skip to content

Confusing "Upload is in an invalid state" error on GCS permission failure #858

Open
@nelhage

Description

@nelhage

I attempted to write to a GCS object for which I didn't have permission, like so:

with smart_open.smart_open(path, 'w') as fh:
  fh.write("hello\n")

On exit from the context manager, we have this chain of events:

  • We enter smart_open.utils.FileLikeProxy.__exit__ (source link)
  • That first calls super().__exit__(*args, **kwargs). The wrapped TextIOWrapper tries to flush/close the underlying buffer, which fails with a InvalidResponse 403 error
  • In the finally block, we try to call self.__inner.__exit__. This is the same Google API object as self.__wrapped__.buffer, and was already close above. Because the close failed, it is not marked as closed, and so it attempts to close again.
  • It notices the previous failure, and raises "Upload is in an invalid state. To recover call recover()" (source link)

The user is now presented with a very confusing ValueError about the inconsistent state of an implementation detail invisible-to-them, instead of the permission error. The InvalidResponse is preserved under the exception's __context__, but it's still very confusing, especially for relatively-less-sophisticated users.

Example stack trace spew:

---------------------------------------------------------------------------
InvalidResponse                           Traceback (most recent call last)
File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/smart_open/utils.py:220, in FileLikeProxy.__exit__(self, *args, **kwargs)
    219 try:
--> 220     return super().__exit__(*args, **kwargs)
    221 finally:

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/smart_open/utils.py:207, in TextIOWrapper.__exit__(self, exc_type, exc_val, exc_tb)
    206 if exc_type is None:
--> 207     self.close()

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:437, in BlobWriter.close(self)
    436 if not self._buffer.closed:
--> 437     self._upload_chunks_from_buffer(1)
    438 self._buffer.close()

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:417, in BlobWriter._upload_chunks_from_buffer(self, num_chunks)
    416 for _ in range(num_chunks):
--> 417     upload.transmit_next_chunk(transport, **kwargs)
    419 # Wipe the buffer of chunks uploaded, preserving any remaining data.

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py:515, in ResumableUpload.transmit_next_chunk(self, transport, timeout)
    513     return result
--> 515 return _request_helpers.wait_and_retry(
    516     retriable_request, self._get_status_code, self._retry_strategy
    517 )

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/_request_helpers.py:155, in wait_and_retry(func, get_status_code, retry_strategy)
    154 try:
--> 155     response = func()
    156 except _CONNECTION_ERROR_CLASSES as e:

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py:511, in ResumableUpload.transmit_next_chunk.<locals>.retriable_request()
    507 result = transport.request(
    508     method, url, data=payload, headers=headers, timeout=timeout
    509 )
--> 511 self._process_resumable_response(result, len(payload))
    513 return result

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/_upload.py:690, in ResumableUpload._process_resumable_response(self, response, bytes_sent)
    670 """Process the response from an HTTP request.
    671
    672 This is everything that must be done after a request that doesn't
   (...)
    688 .. _sans-I/O: https://sans-io.readthedocs.io/
    689 """
--> 690 status_code = _helpers.require_status_code(
    691     response,
    692     (http.client.OK, http.client.PERMANENT_REDIRECT),
    693     self._get_status_code,
    694     callback=self._make_invalid,
    695 )
    696 if status_code == http.client.OK:
    697     # NOTE: We use the "local" information of ``bytes_sent`` to update
    698     #       ``bytes_uploaded``, but do not verify this against other
   (...)
    703     #       * ``stream.tell()`` (relying on fact that ``initiate()``
    704     #         requires stream to be at the beginning)

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/_helpers.py:108, in require_status_code(response, status_codes, get_status_code, callback)
    107         callback()
--> 108     raise common.InvalidResponse(
    109         response,
    110         "Request failed with status code",
    111         status_code,
    112         "Expected one of",
    113         *status_codes
    114     )
    115 return status_code

InvalidResponse: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PERMANENT_REDIRECT: 308>)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 with smart_open.smart_open(path, 'w') as fh:
      2     fh.write("hello\n")

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/smart_open/utils.py:222, in FileLikeProxy.__exit__(self, *args, **kwargs)
    220     return super().__exit__(*args, **kwargs)
    221 finally:
--> 222     self.__inner.__exit__(*args, **kwargs)

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:437, in BlobWriter.close(self)
    435 def close(self):
    436     if not self._buffer.closed:
--> 437         self._upload_chunks_from_buffer(1)
    438     self._buffer.close()

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/cloud/storage/fileio.py:417, in BlobWriter._upload_chunks_from_buffer(self, num_chunks)
    415 # Upload chunks. The SlidingBuffer class will manage seek position.
    416 for _ in range(num_chunks):
--> 417     upload.transmit_next_chunk(transport, **kwargs)
    419 # Wipe the buffer of chunks uploaded, preserving any remaining data.
    420 self._buffer.flush()

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/requests/upload.py:503, in ResumableUpload.transmit_next_chunk(self, transport, timeout)
    424 def transmit_next_chunk(
    425     self,
    426     transport,
   (...)
    430     ),
    431 ):
    432     """Transmit the next chunk of the resource to be uploaded.
    433
    434     If the current upload was initiated with ``stream_final=False``,
   (...)
    501             does not match or is not available.
    502     """
--> 503     method, url, payload, headers = self._prepare_request()
    505     # Wrap the request business logic in a function to be retried.
    506     def retriable_request():

File ~/.pyenv/versions/3.11.11/lib/python3.11/site-packages/google/resumable_media/_upload.py:613, in ResumableUpload._prepare_request(self)
    611     raise ValueError("Upload has finished.")
    612 if self.invalid:
--> 613     raise ValueError(
    614         "Upload is in an invalid state. To recover call `recover()`."
    615     )
    616 if self.resumable_url is None:
    617     raise ValueError(
    618         "This upload has not been initiated. Please call "
    619         "initiate() before beginning to transmit chunks."
    620     )

ValueError: Upload is in an invalid state. To recover call `recover()`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions