Skip to content

Commit 86ea726

Browse files
committed
Refactor: CLI and package support with v2
1 parent 2f687c7 commit 86ea726

File tree

8 files changed

+648
-328
lines changed

8 files changed

+648
-328
lines changed

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 003random
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+123-92
Original file line numberDiff line numberDiff line change
@@ -1,109 +1,140 @@
1-
# GetJS
2-
[![License](https://img.shields.io/badge/license-MIT-_red.svg)](https://opensource.org/licenses/MIT)
3-
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/003random/getJS/issues)
1+
<h2 align="center">JavaScript Extraction CLI & Package</h2>
2+
<p align="center">
3+
<a href="https://pkg.go.dev/github.com/003random/getJS">
4+
<img src="https://pkg.go.dev/badge/github.com/003random/getJS">
5+
</a>
6+
<a href="https://github.com/003random/getJS/releases">
7+
<img src="https://img.shields.io/github/release/003random/getJS.svg">
8+
</a>
9+
<a href="https://github.com/003random/getJS/blob/master/LICENSE">
10+
<img src="https://img.shields.io/badge/license-MIT-blue.svg">
11+
</a>
12+
</p>
413

5-
getJS is a tool to extract all the javascript files from a set of given urls.
614

7-
The urls can also be piped to getJS, or you can specify a singel url with the -url argument. getJS offers a range of options,
15+
[getJS](https://github.com/003random/getJS) is a versatile tool designed to extract JavaScript sources from web pages. It offers both a command-line interface (CLI) for straightforward URL processing and a package interface for more customized integrations.
816

9-
varying from completing the urls, to resolving the files.
17+
## Table of Contents
1018

11-
## Prerequisites
19+
- [Installation](#installation)
20+
- [CLI Usage](#cli-usage)
21+
- [Options](#options)
22+
- [Examples](#examples)
23+
- [Package Usage](#package-usage)
24+
- [Importing the Extractor](#importing-the-extractor)
25+
- [Example](#example)
26+
- [Version Information](#version-information)
27+
- [Contributing](#contributing)
28+
- [License](#license)
1229

13-
Make sure you have [GO](https://golang.org/) installed on your system.
30+
## Installation
1431

15-
### Installing
32+
To install `getJS`, use the following command:
1633

17-
getJS is written in GO. You can install it with `go get`:
34+
`go get github.com/003random/getJS`
1835

19-
```
20-
go install github.com/003random/getJS@latest
21-
```
36+
## CLI Usage
2237

23-
# Usage
24-
Note: When you supply urls from different sources, e.g. with stdin and an input file, it will add all the urls together :)
25-
Example: `echo "https://github.com" | getJS --url https://example.com --input domains.txt`
26-
27-
To get all options, do:
28-
```bash
29-
getJS -h
30-
```
31-
32-
33-
| Flag | Description | Example |
34-
|------|-------------|---------|
35-
| --url | The url to get the javascript sources from | getJS --url https://poc-server.com |
36-
| --method | The request method. e.g. POST or GET. Default: "GET"| getJS --url https://poc-server.com --method POST |
37-
| --timeout | The request timeout. Default: 10 (secs) | getJS --url https://poc-server.com --timeout 15 |
38-
| --insecure | Skip SSL certificate verification. Use when the cert is expired or invalid | getJS --url https://poc-server.com --insecure |
39-
| --header | Custom request header(s) | getJS --url https://poc-server.com --header "Authorization: Bearer token" |
40-
| --input | Input file with urls | getJS --input domains.txt |
41-
| --output | The file where to save the output to | getJS --output output.txt |
42-
| --verbose | Display info of what is going on | getJS --verbose |
43-
| --complete | Complete the urls. e.g. /js/index.js -> htt<span></span>ps://example.<span></span>com/js/index.js | getJS --complete |
44-
| --resolve | Resolve the output and filter out the non existing files (Can only be used in combination with --complete) | getJS --complete --resolve |
45-
| --nocolors | Don't color the output | getJS --nocolors |
46-
47-
## Examples
48-
49-
![screenshot](https://poc-server.com/getJS/screenshot_.png)
50-
51-
52-
getJS supports stdin data. To pipe urls to getJS, use the following:
53-
54-
```bash
55-
$ cat domains.txt | getJS
56-
```
57-
58-
To save the js files, you can use:
59-
```bash
60-
$ getJS --complete --url https://poc-server.com | xargs wget
38+
### Options
39+
40+
`getJS` provides several command-line options to customize its behavior:
41+
42+
- `-url string`: The URL from which JavaScript sources should be extracted.
43+
- `-input string`: Optional URLs input files. Each URL should be on a new line in plain text format. Can be used multiple times.
44+
- `-output string`: Optional output file where results are written to. Can be used multiple times.
45+
- `-complete`: Complete/Autofill relative URLs by adding the current origin.
46+
- `-resolve`: Resolve the JavaScript files. Can only be used in combination with `--complete`.
47+
- `-threads int`: The number of processing threads to spawn (default: 2).
48+
- `-verbose`: Print verbose runtime information and errors.
49+
- `-method string`: The request method used to fetch remote contents (default: "GET").
50+
- `-header string`: Optional request headers to add to the requests. Can be used multiple times.
51+
- `-timeout duration`: The request timeout while fetching remote contents (default: 5s).
52+
53+
### Examples
54+
55+
#### Extracting JavaScript from a Single URL
56+
57+
`getJS -url https://destroy.ai`
58+
59+
or
60+
61+
`curl https://destroy.ai | getJS`
62+
63+
#### Using Custom Request Options
64+
65+
`getJS -url "http://example.com" -header "User-Agent: foo bar" -method POST --timeout=15s`
66+
67+
#### Processing Multiple URLs from a File
68+
69+
`getJS -input foo.txt -input bar.txt`
70+
71+
#### Saving Results to an Output File
72+
73+
`getJS -url "http://example.com" -output results.txt`
74+
75+
## Package Usage
76+
77+
### Importing the Extractor
78+
79+
To use `getJS` as a package, you need to import the `extractor` package and utilize its functions directly.
80+
81+
### Example
82+
83+
```Go
84+
package main
85+
86+
import (
87+
"fmt"
88+
"log"
89+
"net/http"
90+
"net/url"
91+
92+
"github.com/003random/getJS/extractor"
93+
)
94+
95+
func main() {
96+
baseURL, err := url.Parse("https://google.com")
97+
if (err != nil) {
98+
log.Fatalf("Error parsing base URL: %v", err)
99+
}
100+
101+
resp, err := extractor.FetchResponse(baseURL.String(), "GET", http.Header{})
102+
if (err != nil) {
103+
log.Fatalf("Error fetching response: %v", err)
104+
}
105+
defer resp.Body.Close()
106+
107+
// Custom extraction points (optional).
108+
extractionPoints := map[string][]string{
109+
"script": {"src", "data-src"},
110+
"a": {"href"},
111+
}
112+
113+
sources, err := extractor.ExtractSources(resp.Body, extractionPoints)
114+
if (err != nil) {
115+
log.Fatalf("Error extracting sources: %v", err)
116+
}
117+
118+
// Filtering and extending extracted sources.
119+
filtered, err := extractor.Filter(sources, extractor.WithComplete(baseURL), extractor.WithResolve())
120+
if (err != nil) {
121+
log.Fatalf("Error filtering sources: %v", err)
122+
}
123+
124+
for source := range filtered {
125+
fmt.Println(source.String())
126+
}
127+
}
61128
```
62-
63-
If you would like the output to be in JSON format, you can combine it with [@Tomnomnom's](https://github.com/tomnomnom) [toJSON](https://github.com/tomnomnom/hacks/tree/master/tojson):
64-
```bash
65-
$ getJS --url https://poc-server.com | tojson
66-
```
67-
68-
To feed urls from a file use:
69-
```bash
70-
$ getJS --input domains.txt
71-
```
72-
73-
To save the results to a file, and don't display anything, use:
74-
```bash
75-
$ getJS --url https://poc-server.com --output results.txt
76-
```
77-
78-
If you want to have a list of full urls as output use:
79-
```bash
80-
$ getJS --url domains.txt -complete
81-
```
82-
83-
If you want to only show the existing js files, use:
84-
```bash
85-
$ getJS --url domains.txt --complete --resolve
86-
```
87-
88-
## Built With
89-
90-
* [GO](http://golang.org/) - GOlanguage
91-
* [Goquery](https://github.com/PuerkitoBio/goquery) - HTML parser with syntaxes like jquery, in GO
92129

130+
## Version Information
131+
132+
This is the v2 version of `getJS`. The original version can be found under the tag [v1](https://github.com/003random/getJS/tree/v1).
93133

94134
## Contributing
95135

96-
You are free to submit any issues and/or pull requests :)
136+
Contributions are welcome! Please open an issue or submit a pull request for any bugs, feature requests, or improvements.
97137

98138
## License
99139

100-
This project is licensed under the MIT License.
101-
102-
## Acknowledgments
103-
104-
* [@jimen0](https://github.com/jimen0) for helping getting me started with GO
105-
106-
107-
---
108-
109-
*This is my first tool written in GO. I created it to learn the language more. (useful feeback is always welcome!)*
140+
This project is licensed under the MIT License. See the [LICENSE](https://github.com/003random/getJS/blob/master/LICENSE) file for details.

extractor/extractor.go

+133
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
package extractor
2+
3+
import (
4+
"fmt"
5+
"io"
6+
"log"
7+
"net/http"
8+
"net/url"
9+
10+
"github.com/PuerkitoBio/goquery"
11+
)
12+
13+
// ExtractionPoints defines the default HTML tags and their attributes from which JavaScript sources are extracted.
14+
var ExtractionPoints = map[string][]string{
15+
"script": {"src", "data-src"},
16+
}
17+
18+
// FetchResponse fetches the HTTP response for the given URL.
19+
func FetchResponse(u string, method string, headers http.Header) (*http.Response, error) {
20+
req, err := http.NewRequest(method, u, nil)
21+
if err != nil {
22+
return nil, err
23+
}
24+
25+
req.Header = headers
26+
27+
return http.DefaultClient.Do(req)
28+
}
29+
30+
// ExtractSources extracts all JavaScript sources found in the provided HTTP response reader.
31+
// The optional extractionPoints can be used to overwrite the default extraction points map
32+
// with a set of HTML tag names, together with a list of what attributes to extract from.
33+
func ExtractSources(input io.Reader, extractionPoints ...map[string][]string) (<-chan url.URL, error) {
34+
doc, err := goquery.NewDocumentFromReader(input)
35+
if err != nil {
36+
return nil, err
37+
}
38+
39+
var (
40+
urls = make(chan url.URL)
41+
points = ExtractionPoints
42+
)
43+
44+
if len(extractionPoints) > 0 {
45+
points = extractionPoints[0]
46+
}
47+
48+
go func() {
49+
defer close(urls)
50+
for tag, attributes := range points {
51+
doc.Find(tag).Each(func(i int, s *goquery.Selection) {
52+
for _, a := range attributes {
53+
if value, exists := s.Attr(a); exists {
54+
u, err := url.Parse(value)
55+
if err != nil {
56+
log.Println(fmt.Errorf("invalid attribute value %s cannot be parsed to a URL: %w", value, err))
57+
continue
58+
}
59+
60+
urls <- *u
61+
}
62+
}
63+
})
64+
}
65+
}()
66+
67+
return urls, nil
68+
}
69+
70+
// Filter applies options to filter URLs from the input channel.
71+
func Filter(input <-chan url.URL, options ...func([]url.URL) []url.URL) (<-chan url.URL, error) {
72+
output := make(chan url.URL)
73+
go func() {
74+
defer close(output)
75+
var urls []url.URL
76+
for u := range input {
77+
urls = append(urls, u)
78+
}
79+
80+
for _, option := range options {
81+
urls = option(urls)
82+
}
83+
84+
for _, u := range urls {
85+
output <- u
86+
}
87+
}()
88+
return output, nil
89+
}
90+
91+
// WithComplete is an option to complete relative URLs.
92+
func WithComplete(base *url.URL) func([]url.URL) []url.URL {
93+
return func(urls []url.URL) []url.URL {
94+
var result []url.URL
95+
for _, u := range urls {
96+
result = append(result, complete(u, base))
97+
}
98+
return result
99+
}
100+
}
101+
102+
// WithResolve is an option to filter URLs that resolve successfully.
103+
func WithResolve() func([]url.URL) []url.URL {
104+
return func(urls []url.URL) []url.URL {
105+
var result []url.URL
106+
for _, u := range urls {
107+
if resolve(u) {
108+
result = append(result, u)
109+
}
110+
}
111+
return result
112+
}
113+
}
114+
115+
// complete completes relative URLs by adding the base URL.
116+
func complete(source url.URL, base *url.URL) url.URL {
117+
if source.IsAbs() {
118+
return source
119+
}
120+
return *base.ResolveReference(&source)
121+
}
122+
123+
// resolve checks if the provided URL resolves successfully.
124+
func resolve(source url.URL) bool {
125+
resp, err := http.Get(source.String())
126+
if err != nil {
127+
return false
128+
}
129+
defer resp.Body.Close()
130+
131+
_, err = io.Copy(io.Discard, resp.Body)
132+
return err == nil && (resp.StatusCode >= http.StatusOK && resp.StatusCode < http.StatusMultipleChoices)
133+
}

go.mod

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
module github.com/003random/getJS/v2
2+
3+
go 1.22
4+
5+
require github.com/PuerkitoBio/goquery v1.8.1
6+
7+
require (
8+
github.com/andybalholm/cascadia v1.3.1 // indirect
9+
golang.org/x/net v0.7.0 // indirect
10+
)

0 commit comments

Comments
 (0)