Skip to content

Missing File::Info data. #8357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
didactic-drunk opened this issue Oct 20, 2019 · 26 comments
Open

Missing File::Info data. #8357

didactic-drunk opened this issue Oct 20, 2019 · 26 comments

Comments

@didactic-drunk
Copy link
Contributor

Crystal::System::FileInfo is missing public accessors. I'd like to create a PR to expose some or all of the information below.

Most of the data is cross platform with some exceptions. The list is not exhaustive.

Name Platforms Notes
atime POSIX, Windows
ctime POSIX, Windows
birth_time Dragonfly, FreeBSD, Linux, MacOS, Windows
ino POSIX, Windows*
dev POSIX, ? Is the volume serial number on Windows equivalent?
nlink POSIX, Windows
blocks POSIX Not technically required in the spec but implemented almost universally.
blksize POSIX Not technically required in the spec but implemented almost universally.
flags Linux, *BSD, MacOS, ?

Suggested names:

  • access_time or last_access_time.
  • creation_time or birth_time.
  • change_time or metadata_change_time.
  • link_count.
  • io_block_size.
  • flags is used by something else. I don't know what to call it.

Currently I need ctime, birth_time, ino, dev, nlink, flags. I assume others may want the full stat structure when more non web applications are written.

@ysbaddaden
Copy link
Contributor

What are your use cases?

@didactic-drunk
Copy link
Contributor Author

didactic-drunk commented Oct 21, 2019

Detecting hard links needs[ino, dev, nlink] and birth_time ideally. ctime as a fallback but I also need ctime when displaying file info.

Also getting or setting flags like APPEND.

@RX14
Copy link
Member

RX14 commented Oct 29, 2019

We specifically didn't expose ino and dev because we have same_file? which compares ino and dev, but doesn't expose these platform-specific details.

Exposing the number of links would be fine - with a usecase.

Exposing more times is interesting: many filesystems don't support the birth time, and there's discrepancies between windows and linux on ctime and atime, iirc. Go supports only modification time for this same reason.

Stat flags are already exposed.

The blocks and block size are useless on modern-day filesystems.

@j8r
Copy link
Contributor

j8r commented Oct 30, 2019

There is also st_size missing, to get the disk size.

@ysbaddaden
Copy link
Contributor

File::Info#same_file? is nice, but maybe hard to find? There is the higher level File.same?(a, b) but I had to dig the docs to find it, and the documentation is lacking details. Just "compares the device and inode on UNIX to detect hard links" could be a nice addition.

I recently had a use case for atime: a disk cache with an automated cleanup of files not accessed since N days —mounted with relatime on Linux for the 24h granularity.

ctime is useful to detect that metadata changed, for example the file owner or permissions changed, but there was no writes so mtime didn't change; a backup application could use it.

Note that Go has a non-portable direct access to the stat struct through FileInfo Sys(). See https://stackoverflow.com/a/55303743/199791 for example. It's a nice solution. It only supports high level portable data, but allows applications to have very specific use cases for hardly portable data (e.g. atime on a relatime mounted device on a Linux target) that can be legitimate, but too specific to bother with platform agnostic methods.

@j8r
Copy link
Contributor

j8r commented Oct 30, 2019

A platform specific API should exist for this kind of non-portable, low-level operations, like Rust std::os::unix, and Go unix package.
It may be kind of already present with src/crystal/system/unix and src/crystal/system/win32, but this files aren't meant to be used as-is.

@didactic-drunk
Copy link
Contributor Author

Stat flags are already exposed.

No they aren't. Crystal split mode in to permissions and suid/sugid/sticky calling it flags.

I need what BSD systems refer to as flags or Linux as attributes (not to be confused with extended attributes). On BSD it's part of the stat structure in st_flags. On Linux it's available through one of the stat interfaces, but I don't remember how exactly.

https://en.wikipedia.org/wiki/Chattr

@didactic-drunk
Copy link
Contributor Author

We specifically didn't expose ino and dev because we have same_file? which compares ino and dev, but doesn't expose these platform-specific details.

My laptop has 3 million files. How do I find the hard links? I can't keep 3 million files open. Comparing every file against every file is unacceptably slow so that won't work either. I also can't compare incrementally or between program runs by saving state. So no this doesn't work for my use cases at all. I've mentioned what I for porting one ruby program.

What is the solution? Either I make my own shard which monkey patches or duplicates the stat structure already available in src/crystal/system/unix and src/crystal/system/win32 or crystal exposes thing like atime/ctime/birth_time which may vary in their use between platforms.

mtime can vary. I've encountered NFS systems that return epoch mtime/ctime/atime for every file. Every reported time in the structure including mtime is not just platform specific but file system specific. Especially so when using network file systems.

I can handle the differences and expect to. On systems supporting birth time I use birth time (which includes Windows). Otherwise fall back to POSIX ctime and nlinks. Both solutions need dev and ino.

Additional use cases that I need dev and ino for:

  • Detecting when crossing a mount point and running user defined triggers.
  • Comparing [dev, ino] with [birth_time, ctime, nlink] and between program runs.

Additional use cases:

  • Getting and setting flags similar to chflags/chattr.

Most of the flags could be handled by an enum. The ones I need are mostly portable like immutable and append. Linux version flag is an outlier with additional data. I have no need for it.

So how do I make this work in crystal considering when everything except birth_time is available in ruby and working for ~12 years?

@RX14
Copy link
Member

RX14 commented Oct 31, 2019

Note that Go has a non-portable direct access to the stat struct through FileInfo Sys().

Yeah, this is the solution I prefer too - just expose a stat struct with a info.platform_specific.foo.

No they aren't. Crystal split mode in to permissions and suid/sugid/sticky calling it flags.

My bad. Looks like this needs statx though. Just like birth_time. I don't think we should bind statx, or use statx by default on linux. A statx binding should be a shard, since the difference is only visible on the platfom-specific members.

My laptop has 3 million files. How do I find the hard links?

Thanks for the usecase! I agree, indexing hardlinks is impossible with same_file?. We should expose the platform-specific members.

@didactic-drunk
Copy link
Contributor Author

Yeah, this is the solution I prefer too - just expose a stat struct with a info.platform_specific.foo.

So make info.stat public and document it's platform specific? That would solve almost every use case.

@ysbaddaden
Copy link
Contributor

What about File::Info#raw? It would return @stat on UNIX and @file_attributes on Windows.

@RX14
Copy link
Member

RX14 commented Oct 31, 2019

I prefer platform_specific - but I'd rather ensure everyone agrees with this approach (:+1: / :-1: this comment) before bikeshedding on that.

@asterite
Copy link
Member

No platform specific data, please. A Crystal program should compile and run exactly the same on all platforms. Or, said another way, the API should be exactly the same for all platforms. But it's fine if a method raises on one platform but works on another, given that it's documented to only work on certain platforms.

@didactic-drunk
Copy link
Contributor Author

No platform specific data, please. A Crystal program should compile and run exactly the same on all platforms. Or, said another way, the API should be exactly the same for all platforms. But it's fine if a method raises on one platform but works on another, given that it's documented to only work on certain platforms.

Instead of handling it at compile time:

if stat.responds_to?(:birth_time)
  ...
else
  # ctime handler
end

I have to use exception handling at runtime:

begin
  stat.birth_time
rescue
  # ctime handler
end

@asterite How much overhead does that add when traversing file systems with > 10 million files? 100 million? They won't use spinning rust so seek times are not as much of a concern.

But it's not one exception handler. It's several. One for flags, another for acl's, another for resource forks, another for extended attributes plus anything I missed.

I'd much rather use feature checks than exception handling. To me the code is clearer. I know it's a platform feature check and only runs on specific platforms. With the exception handler am I handling an os error or the platform unsupported? It's even less clear when trying to understand someone else's code. What did they intend?

@asterite
Copy link
Member

How much overhead does that add when traversing file systems with > 10 million files? 100 million?

Exception handling doesn't add overhead unless an exception is raised (as far as I know).

Additionally, if a method doesn't raise (the compiler knows which methods raise), and methods you want to use in an OS and they are available won't raise, then the compiler will skip the entire exception handler (or LLVM will do this). So zero overhead, really.

I'd much rather use feature checks than exception handling.

The problem is that if someone forgets to check for a feature flag in a library, nobody can use that library in some OS, even if the library never calls that code (for example if they call it conditionally at runtime). This was discussed in the past.

@didactic-drunk
Copy link
Contributor Author

I'd much rather use feature checks than exception handling.

The problem is that if someone forgets to check for a feature flag in a library, nobody can use that library in some OS, even if the library never calls that code (for example if they call it conditionally at runtime). This was discussed in the past.

@asterite If they forget a rescue nobody can use that library on the different OS. How is that different?

If anything it's worse. A clear compile time error is changed in to a maybe run time error. Do the specs test that part of the code? If not the program appears to compile and function correctly but raises unhandled exceptions when run in the real world.

In both compile time feature checking and exception:

  • They need additional code.
  • They need aditional testing.
  • The code is nearly identical. if responds_to? vs rescue

The difference is when the error occurs. Compile or runtime. Compile time is more robust.

@didactic-drunk
Copy link
Contributor Author

Additionally, if a method doesn't raise (the compiler knows which methods raise), and methods you want to use in an OS and they are available won't raise, then the compiler will skip the entire exception handler (or LLVM will do this). So zero overhead, really.

@asterite No OS has all the features I'm checking. That means 2-3 exceptions on average, not zero overhead.

@straight-shoota
Copy link
Member

When the methods raise on specific platforms, you can use conditional macro branches:

{% if flag?(:win32) %}
  stat.birth_time
{% else %}
  stat.ctime
{% end %}

This ensures only the non-raising methods are invoked on a platform.

Some of these features however are not even platform-specific but depend on the file system. In that case, the API should provide nilable getters to avoid exception overhead.

if birth_time = stat.birth_time?
  birth_time
elsif ctime = stat.ctime?
  ctime
end

@didactic-drunk
Copy link
Contributor Author

didactic-drunk commented Oct 31, 2019

When the methods raise on specific platforms, you can use conditional macro branches:

{% if flag?(:win32) %}
  stat.birth_time
{% else %}
  stat.ctime
{% end %}

This ensures only the non-raising methods are invoked on a platform.

But that's worse than a feature flag!

What's the point of raising an exception? Without rescue's the code won't run on unsupported platforms. Exactly the opposite of what @asterite claims the exceptions are for.

You also duplicated code. Crystal already has to know which platforms are supported or not in order to raise. Instead of if responds_to every developer needs to figure out which platforms are supported or not and use if flag :platform for every feature tested.

@asterite
Copy link
Member

asterite commented Nov 1, 2019

It would be intersting to know how this is solved in Go, Java and other languages.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 1, 2019

@asterite as said above, Go has a platform agnostic interface (FileInfo) with the portable info that works everywhere the same, but also exposes the raw, system specific, info (FileInfo Sys()).

This is IMO an acceptable solution. We have a platform agnostic API that works everywhere for 99% of use cases, but allow the 1% remaining use cases to be implemented.

Also I prefer a program that won't compile on some platforms (I must deal with it), than a program that silently compiles but will raise NotImplementError exceptions at runtime (useless).

It could just be that some methods don't exist for some targets, but I think the Go way to handle FileInfo is better, and makes the distinction between (not) portable API.

@didactic-drunk
Copy link
Contributor Author

didactic-drunk commented Nov 1, 2019

Rather than do it the java/gofy way, I'd rather ask: "what would @asterite do?" @asterite had this great idea to take ruby and add nil checking. Maybe he had the answer.

So I crossed the vast ocean, climbed the highest mountian and slept with the ugliest of mountain goats (it was cold... and lonely).

He wasn't there.

So I used the internet to ask the great sage: "Is a method that may or may not be like a the nil problem you originally solved? Could you treat it exactly like the compile time nil checking already used?"

stat.ino => Compile error "Not available on all platforms.  Check with .responds_to?"
if stat.respond_to?(:birth_time)
  stat.birth_time
else
  stat.ctime
end

Side note, if you use lambskin they think it's another goat.

And in a booming voice the great sage @asterite squeeked:

@asterite
Copy link
Member

asterite commented Nov 1, 2019

My child.

You have come far in your journey and learned much.
You have served our cause with the truest faith.
Therefore I name you blessed and beloved.

The nilable approach suggested by @straight-shoota seems to be a good option: you ask it, but you have to check whether it's really supported. But maybe it depends on the API.

@didactic-drunk
Copy link
Contributor Author

That's not quite what I had i mind but I'll take it.

It'd be nice to have a clear split between platform if's and runtime nil checks.

# baz may return nil
if foo = bar.baz
  ...
end

# baz may not exist depend on platform an OS version.
if foo = bar(.)(.)baz
  ...
end

The operator above is only an example. Feel free to change it. I prefer to check if the bar has (.)(.)'s.

@didactic-drunk
Copy link
Contributor Author

I'd like to propose a standard annotation or other method to indicate platform variant behavior.

Advantages:

  • Documentation can show platform specific behavior in a standardized format.
  • Shards and applications can use the same annotation.
  • Code coverage tools can show red flags when missing platform specific handlers.
  • Code coverage tools can have different coverage settings for platform specific behavior. 95% for normal. 100% for cross platform.
  • Code coverage tools can turn off extra platform coverage errors when working on the target platform to get the current platform to 100%.
  • Or maybe just split the coverage % between current and variant platforms.

This should work regardless of whether using nil, responds_to?, or other methods.

@HertzDevil
Copy link
Contributor

HertzDevil commented Apr 28, 2023

It looks like atime and birthtime are available on all supported platforms already?

EDIT: birth time is available via statx on Linux. Not sure if WebAssembly exposes the same thing. I couldn't find any references to file creation time in DragonFly BSD's libc.

On Windows ctime is accessible in the Win32 API via FILE_BASIC_INFO using GetFileInformationByHandleEx. Ruby is definitely incorrect here as ftCreationTime is the birth time. Python is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants