Skip to content

feat: allow mime_type to be guessed for ByteStream #9563

Open
@kanenorman

Description

@kanenorman

Is your feature request related to a problem? Please describe.
When using ByteStream.from_file_path, the mime type must be provided manually. Since it can often be inferred from the file, adding support for automatic detection would streamline usage.

Describe the solution you'd like
Add a guess_mime_type parameter to from_file_path. If True and mime_type is not provided, the method should infer the mime type using a new helper method. (Similar behavior is already implemented in FileTypeRouter)

Example:

from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, Optional


@dataclass(repr=False)
class ByteStream:
...

@classmethod
def from_file_path(
    cls,
    filepath: Path,
    mime_type: Optional[str] = None,
    meta: Optional[Dict[str, Any]] = None,
    guess_mime_type: bool = False
) -> "ByteStream":
    if not mime_type and guess_mime_type:
        mime_type = cls._guess_mime_type(filepath)
    with open(filepath, "rb") as fd:
        return cls(data=fd.read(), mime_type=mime_type, meta=meta or {})

@staticmethod
def _guess_mime_type(path: Path) -> Optional[str]:
    ...

Describe alternatives you've considered
Writing an external utility function to guess the MIME type and passing the result as an argument to the ByteStream. This works but feels redundant and can be abstracted away in the ByteStream class itself.

Additional context
This logic exists in components like FileTypeRouter, but centralizing it in ByteStream would reduce duplication and simplify usage across the codebase (e.g., in get_bytestream_from_source). Similar issues #7670

Metadata

Metadata

Assignees

Labels

P2Medium priority, add to the next sprint if no P1 available

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions