Skip to content

[Feature] Make scoop info accept pipeline input as object-stream in the powershell spirit #4889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hgkamath opened this issue Apr 26, 2022 · 6 comments

Comments

@hgkamath
Copy link

hgkamath commented Apr 26, 2022

Bug Report

I filed this as a bug instead of as a feature, as the inability to do so is a bit glaring. The general powershell philosophy (in contrast to that of unix-shell) is that the output should be a object stream.

Current Behavior

No way to get an scoop info * object stream.
So i tried to work around it using scoop search whose output is a mix of text-stream, perhaps a mix of console-host text-stream and debug/error text-stream
Consider how long the following script takes : 33 minutes on my i7-haswell laptop, with scoop on HDD.

PS C:\WINDOWS\system32> scoop search | Select-String -Pattern "^   " |  ConvertFrom-String | %{ scoop info $_.P2.ToString() } | %{ -join($_.Name, " : " , $_.Description ) }  > C:\tmpq\Downloads\scoop_search_op.txt
'extras' bucket:
'java' bucket:
'main' bucket:
PS C:\WINDOWS\system32>

The $_.P2.ToString() was required to avoid wrong-type-expected-errors, as app-names with length of single-char would have type SystemValue.Char instead of String.
The output-file is that I was interested is for your perusal scoop_search_op.txt
My motivation to write the above, was to introduce myself to many apps/commands that I had not heard of. There are so many apps/commands in scoop. (which is good!). When I do a scoop update and see app-manifests list of an updated scoop-bucket of apps other than what I have installed, I am interested in knowing what those apps are, just to learn about and know if some app may be useful to me. I could do scoop info by manually typing for each such curiosity. I wonder if there is some easy way to capture scoop update and do a scoop info on them. Unsure if you'd see that as a sufficiently valuable feature-add.
The above script takes a long time to finish. The trouble is: that powershell has to do a second command invocation scoop info, which could have been avoided if there was a way to get an object-stream output.
scoop info does not take * as an argument, i.e. there is no scoop info "*"
If scoop search/scoop info * output a object stream, and if the objects had suitable default to-string() printing, then the output-objects can more quickly and easily parsed and complex queries can be performed.
This could in theory apply to any scoop sub-command that has informative output.

Expected Behavior

An object stream is more rapidly processable by powershell. Objects can be manipulated by methods and properties.

Additional context/output

PS C:\WINDOWS\system32> scoop info "*"
Compare-Version : Cannot process argument transformation on parameter 'ReferenceVersion'. Cannot convert value to type
System.String.
At C:\vol\scoop_01\SCOOP\apps\scoop\current\lib\core.ps1:353 char:68
+ ... utdated = ((Compare-Version -ReferenceVersion $status.version -Differ ...
+                                                   ~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Compare-Version], ParameterBindingArgumentTransformationException
    + FullyQualifiedErrorId : ParameterArgumentTransformationError,Compare-Version

WARN  error: The given path's format is not supported.
Could not find manifest for 'MixedRealityRuntime'.

An example scoop info output

PS C:\WINDOWS\system32> scoop info mc


Name        : mc
Description : Native GNU Midnight Commander for Win32
Version     : 4.8.27
Bucket      : extras
Website     : https://midnight-commander.org
License     : GPL-3.0-or-later
Updated at  : 3/7/2022 1:56:35 AM
Updated by  : github-actions[bot]
Binaries    : mc.exe
Shortcuts   : mc

Possible Solution

perhaps, an implementation of scoop info * would internally invoke scoop search and output an object stream of scoop info in a manner similar to the script suggested above. This would be more performant.

System details

Windows version: [e.g. 7, 8, 10]
10

OS architecture: [e.g. 32bit, 64bit]
64bit

PowerShell version: [output of "$($PSVersionTable.PSVersion)"]

PS C:\WINDOWS\system32> $($PSVersionTable.PSVersion)

Major  Minor  Build  Revision
-----  -----  -----  --------
5      1      19041  1645

Additional software: [(optional) e.g. ConEmu, Git]

Scoop Configuration

NA

//# Your configuration here
@hgkamath hgkamath added the bug label Apr 26, 2022
@rashil2000
Copy link
Member

The output of something like scoop info * will be an immensely large PSCustomObject array, containing like 2000-3000 items, each similar to the above example. What is the possible use case for this?

@hgkamath
Copy link
Author

hgkamath commented Apr 26, 2022

The use case would be whenever user is interested in apps and queries/explores out of curiosity. It helps by
a) being faster
b) lending itself to powershell-script-based complex-query-filtering in powershell.
There is similar desire to query available apps in a bucket #4852.
Users want web-search for the same reason #4627
I'll admit it is a rare use case. Now that I made a txt-file, the next time I'll regenerate it is when I want to explore, but save time by not invoking scoop-info one at a time. When sufficient time has passed by and many new apps to have been added, the txt-file will have become obsolete.
What is slowing the given script down isn't scoop search, that fetches the names of all 2700 apps under 7 seconds. So it may not the size, number or file-read of the manifest files that is slowing it down, but the fact that a new powershell-process is being invoked for each app info query.
So scoop info * may take less time if it is a single powershell-process interpreted function-call. This claim ought to be tried and measured.

PS C:\WINDOWS\system32> Measure-Command -Expression {scoop search}
'extras' bucket:
'java' bucket:
'main' bucket:


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 6
Milliseconds      : 212
Ticks             : 62129437
TotalDays         : 7.19090706018518E-05
TotalHours        : 0.00172581769444444
TotalMinutes      : 0.103549061666667
TotalSeconds      : 6.2129437
TotalMilliseconds : 6212.9437

PS C:\WINDOWS\system32> scoop search | Measure-Object
'extras' bucket:
'java' bucket:
'main' bucket:


Count    : 2713
Average  :
Sum      :
Maximum  :
Minimum  :
Property :

ps.1. I was checking out https://rasa.github.io/scoop-directory/by-bucket#ScoopInstaller_Main . Awesome website.
What would be interesting is if the table for apps in a bucket (ScoopInstaller/Main) could also do sort-by-last-updated in addition to the presently-unchange-able sort-by-alphabetical. This would match with the scoop update output. Perhaps, also a sort-by-License. This way, one can get to know the apps in the order of of them being recently updated.

ps.2. Additionally maybe expand command line syntax to

  • Expand scoop info command line syntax
    1. accept regexp and find all package app-names matching regexp scoop info app-regexp
    2. recognize optional bucket prefix: scoop info <bucket-regexp/app-regexp> facilitating [Feature] List app manifests from user added/official buckets. #4852.
    3. accept multiple arguments scoop info app1 <app2> <app3>
  • Alternatively, first improve scoop search to do regexp, bucket-prefix and multiple-arguments, as just mentioned previously. Then to avoid duplicating the search functionality of scoop search, make scoop search output an object stream. Then make scoop info accept object stream pipe in from scoop search . This way, the object stream from scoop search could then be piped into scoop info or taken as an argument, which would in turn would output corresponding scoop info objects . Being two powershell invocations, this could take 7*2=14 secs. ex:
    • scoop search <bucket-regexp/app-regexp> | scoop info
    • or scoop info -Input $(scoop search <bucket-regexp/app-regexp>)

@rashil2000
Copy link
Member

The command scoop info xxx takes 7-8 seconds on a cold cache. Number of typical manifests a user will have is around 2500 (main+extras). Summing that up gives us ~4 hours. I still don't understand a use case for scoop info *, but we can see it is obviously impractical.

But I do like the idea of converting the output of scoop search into a PowerShell object(s). I've recently been refactoring the codebase to return PSCustomObjects wherever possible (already implemented for info and list), and search was next on my plan. I haven't started that yet (and I'm not sure when I will) but if you want to take a jab at this, please do! I myself want it to happen.

@hgkamath
Copy link
Author

hgkamath commented Apr 27, 2022

Definitely not 4 hrs. The unoptimized scriptlet given in the description took about 33 minutes. The gain is perhaps happen due to exe-caching, dir-list-caching or file-system caching. So script-time for single invocation script will be bounded between 7 sec and 33 minutes. If, I were to take a guess, it would be around 30 sec. (edit: guess turns out to be wrong, but that raised more questions, but 15 min is better than 33 min, ie. saving of 18 min)

This was my thinking: I have 3 buckets main, extras, java, so around 2700 app manifest. In the given scriptlet, in %{ scoop info $_.P2 }, the % is alias for ForEach-Object , and the expression inside the braces {,} powershell will evaluate to the output of a new command invocation (like bash, my assumption is, I don't think powershell treats it like an internal call). Unlike Linux, Windows is very bad at starting new processes/subshells. A single process reading 2700 manifest files, would be faster than 2700 processes each reading a single manifest file.

I experimented and attempted some measurements

  • I first made a new file with list of apps, one app-name per line.
    Get-Content C:\tmpq\Downloads\scoop_search_op.txt | ConvertFrom-String | % { $_.P1.ToString() } | ?{$_.Trim() -ne "" } > C:\tmpq\Downloads\scoop_app_list.txt
  • Then attempted a loop inside scoop-info.ps1 , loops $app over app-list ignoring given argument
    PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> C:\vol\scoop_01\scoopg\apps\git\current\usr\bin\diff.exe .\scoop-info_orig.ps1 .\scoop-info.ps1
    17c17,18
    <
    ---
    > Get-Content -Path 'C:\tmpq\Downloads\scoop_app_list.txt' | % {
    > $app = "$_"
    179c180
    <
    ---
    > }
  • command works as expected, but took 15 minutes
    PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> Measure-Command -Expression {scoop info 7zip}
    
    
    Days              : 0
    Hours             : 0
    Minutes           : 14
    Seconds           : 47
    Milliseconds      : 981
    Ticks             : 8879816015
    TotalDays         : 0.0102775648321759
    TotalHours        : 0.246661555972222
    TotalMinutes      : 14.7996933583333
    TotalSeconds      : 887.9816015
    TotalMilliseconds : 887981.6015
    
    
  • I also observed in taskmgr that there seemed to be only two powershell.exe corresponding to two windows I had open. So it wasn't spawning multiple processes
  • I ran the original scriptlet and also noticed that it wasn't spawning powershells in taskmgr. Original scriptlet takes 33 min. The errors here are perhaps due to blank lines [EDIT] This is fixed, this was due to single-char app-names typed as Systemvalue.Char see the p.s.
    PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> Measure-Command -Expression {scoop search | Select-String -Pattern "^   " |  ConvertFrom-String | %{ scoop info $_.P2.ToString() } | %{ -join($_.Name, " : " , $_.Description ) }}
    'extras' bucket:
    'java' bucket:
    'main' bucket:
    
    
    Days              : 0
    Hours             : 0
    Minutes           : 32
    Seconds           : 49
    Milliseconds      : 330
    Ticks             : 19693305775
    TotalDays         : 0.0227931779803241
    TotalHours        : 0.547036271527778
    TotalMinutes      : 32.8221762916667
    TotalSeconds      : 1969.3305775
    TotalMilliseconds : 1969330.5775
    
    
  • perhaps powershell has a way to smartly avoid creating subprocess subshells, or taskmgr is not catching short-lived processes. resource-monitor shows that the powershell processes have about 14 threads each, but one that is running script has higher cpu utilization. The experiment did show that there is an invocation overhead, avoiding which, saved 18 minutes.
  • I conclude scoop-info is too slow for what it is doing, and needs to be profiled. All its doing in my opinion is pulling up a manifest file and dumping out some properties. Like, why should printing small textual info from 2700 files take 15 minutes, only 3 files per second ? Perhaps inefficient search algorithms for fetching manifest/bucket or json parsing.

*EDIT:
p.s. I edited some of the previous code-pastes to fix and remove the following error which appeared 4 times.

Method invocation failed because [System.Char] does not contain a method named 'startswith'.
At C:\vol\scoop_01\SCOOP\apps\scoop\current\lib\getopt.ps1:34 char:12
+         if($arg.startswith('--')) {
+            ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : MethodNotFound

This happened because there are 4 apps with names z, q, r & v which are single letters . The ConvertFrom-String powershell command converts them to type SystemValue.Char . This caused scoop info to choke on the SystemValue.Char argument. So, I used a $_.P2.ToString() to force type conversion back to String.

@rashil2000
Copy link
Member

The bulk of time taken by scoop info is due to the git subprocess (for fetching last author and date), and not due to powershell reading the manifest slowly.

@hgkamath
Copy link
Author

hgkamath commented Apr 27, 2022

I moved the git-using chunks of code that determine properties .'Updated at', .'Updated by' & .Installed to inside if ($verbose) { } guards so that they don't evaluate and sure enough ...

... Finishes in 37 seconds as expected.

:
:
Name        : zstd
Description : High compression ratios compression algorithm
Version     : 1.5.2
Bucket      : main
Website     : https://facebook.github.io/zstd
License     : BSD-3-Clause
Binaries    : zstd.exe

PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> Measure-command -Expression { scoop info 7zip }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 36
Milliseconds      : 631
Ticks             : 366311764
TotalDays         : 0.000423971949074074
TotalHours        : 0.0101753267777778
TotalMinutes      : 0.610519606666667
TotalSeconds      : 36.6311764
TotalMilliseconds : 36631.1764

Here-on-out, as to what to do (create an optarg/create a static json/etc), its in the realm of your decision space.

@rashil2000 rashil2000 changed the title [Bug] Ensure scoop info * outputs an object-stream in the powershell spirit [Feature] Ensure scoop info * outputs an object-stream in the powershell spirit Jun 10, 2022
@rashil2000 rashil2000 changed the title [Feature] Ensure scoop info * outputs an object-stream in the powershell spirit [Feature] Make scoop info accept pipeline input as object-stream in the powershell spirit Jun 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants