|
4 | 4 | [](https://crates.io/crates/webgraph/reverse_dependencies)
|
5 | 5 | 
|
6 | 6 | 
|
7 |
| -[](https://github.com/vigna/webgraph-rs). |
| 7 | +[](https://github.com/vigna/webgraph-rs) |
| 8 | +[](https://crates.io/crates/webgraph) |
| 9 | +[](https://docs.rs/webgraph) |
8 | 10 |
|
9 |
| -A Rust implementation of the [WebGraph framework](https://webgraph.di.unimi.it/) |
10 |
| -for graph compression. |
| 11 | +A Rust implementation of the [WebGraph framework] for graph compression. |
11 | 12 |
|
| 13 | +WebGraph is a framework for graph compression aimed at studying web graphs, but |
| 14 | +currently being applied to several other type of graphs. It |
| 15 | +provides simple ways to manage very large graphs, exploiting modern compression |
| 16 | +techniques. More precisely, it is currently made of: |
| 17 | + |
| 18 | +- A set of simple codes, called ζ _codes_, which are particularly suitable for |
| 19 | + storing web graphs (or, in general, integers with a power-law distribution in a |
| 20 | + certain exponent range). |
| 21 | + |
| 22 | +- Algorithms for compressing web graphs that exploit gap compression and |
| 23 | + differential compression (à la |
| 24 | + [LINK](http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-175.html)), |
| 25 | + intervalisation, and ζ codes to provide a high compression ratio (see [our |
| 26 | + datasets](http://law.di.unimi.it/datasets.php)). The algorithms are controlled |
| 27 | + by several parameters, which provide different tradeoffs between access speed |
| 28 | + and compression ratio. |
| 29 | + |
| 30 | +- Algorithms for accessing a compressed graph without actually decompressing |
| 31 | + it, using lazy techniques that delay the decompression until it is actually |
| 32 | + necessary. |
| 33 | + |
| 34 | +- Algorithms for analysing very large graphs, such as {@link |
| 35 | + it.unimi.dsi.webgraph.algo.HyperBall}, which has been used to show that |
| 36 | + Facebook has just [four degrees of |
| 37 | + separation](http://vigna.di.unimi.it/papers.php#BBRFDS). |
| 38 | + |
| 39 | +- A [Java implementation](http://webgraph.di.unimi.it/) of the algorithms above, |
| 40 | + now in maintenance mode. |
| 41 | + |
| 42 | +- This crate, providing a complete, documented implementation of the algorithms |
| 43 | + above in Rust. It is free software distributed under either the [GNU Lesser |
| 44 | + General Public License |
| 45 | + 2.1+](https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html) or the [Apache |
| 46 | + Software License 2.0](https://www.apache.org/licenses/LICENSE-2.0). |
| 47 | + |
| 48 | +- [Data sets](http://law.di.unimi.it/datasets.php) for large graph (e.g., |
| 49 | + billions of links). |
| 50 | + |
| 51 | +## Citation |
| 52 | + |
| 53 | +You are welcome to use and improve WebGraph for your research work! If you find |
| 54 | +our software useful for research, please cite the following papers in your own: |
| 55 | + |
| 56 | +- [“WebGraph: The Next Generation (Is in |
| 57 | + Rust)”](http://vigna.di.unimi.it/papers.php#FVZWNG), by Tommaso Fontana, |
| 58 | + Sebastiano Vigna, and Stefano Zacchiroli, in WWW '24: Companion Proceedings |
| 59 | + of the ACM on Web Conference 2024, pages 686-689. [DOI |
| 60 | + 10.1145/3589335.3651581](https://dl.acm.org/doi/10.1145/3589335.3651581) |
| 61 | + |
| 62 | +- [“The WebGraph Framework I: Compression |
| 63 | + Techniques”](http://vigna.di.unimi.it/papers.php#BoVWFI), by Paolo Boldi and |
| 64 | + Sebastiano Vigna, in _Proc. of the 13th international conference on World |
| 65 | + Wide Web, WWW 2004, pages 595-602, ACM. [DOI |
| 66 | + 10.1145/988672.988752](https://dl.acm.org/doi/10.1145/988672.988752) |
| 67 | + |
12 | 68 | ## Quick Setup
|
13 | 69 |
|
14 | 70 | Assuming you have built all binaries, you will first need a graph in BV format,
|
15 |
| -for example downloading it from the [LAW website](http://law.di.unimi.it/). You |
16 |
| -will need the `.graph` file (the bitstream containing a compressed representation |
17 |
| -of the graph), the `.properties` file (metadata) and the `.offsets` file (a |
18 |
| -bitstream containing pointers into the graph bitstream). As a first step, if |
19 |
| -you need random access to the successors of a node, you need |
20 |
| -to build an [Elias--Fano](sux::dict::EliasFano) representation of the |
21 |
| -offsets with the command `build_ef` (this part can be skipped if you just need |
22 |
| -sequential access), which will generate an `.ef` file. Then, to load a graph |
23 |
| -with basename `BASENAME` you need to call |
| 71 | +for example downloading it from the [LAW website]. For a graph with basename |
| 72 | +BASENAME, you will need the `BASENAME.graph` file (the bitstream containing a |
| 73 | +compressed representation of the graph), the `BASENAME.properties` file |
| 74 | +(metadata) and the `BASENAME.offsets` file (a bitstream containing pointers into |
| 75 | +the graph bitstream). |
| 76 | + |
| 77 | +As a first step, if you need random access to the successors of a node, you need |
| 78 | +to build an [Elias–Fano] representation of the offsets (this part can be skipped |
| 79 | +if you just need sequential access). There is a CLI command `webgraph` with many |
| 80 | +subcommands, among which `build`, and `webgraph build ef BASENAME` will build |
| 81 | +the representation for you, serializing it with [ε-serde] in a file |
| 82 | +named `BASENAME.ef`. |
| 83 | + |
| 84 | +Then, to load the graph you need to call |
24 | 85 |
|
25 | 86 | ```[ignore]
|
26 | 87 | let graph = BVGraph::with_basename("BASENAME").load()?;
|
27 | 88 | ```
|
28 | 89 |
|
29 |
| -The [`with_basename`] method returns a [`LoadConfig`] instance that can be further |
30 |
| -customized, selecting endianness, type of memory access, etc. By default you |
31 |
| -will get big endianness, memory mapping for both the graph and the offsets, and |
32 |
| -dynamic code dispatch. |
| 90 | +The [`with_basename`] method returns a [`LoadConfig`] instance that can be |
| 91 | +further customized, selecting endianness, type of memory access, and so on. By |
| 92 | +default you will get big endianness, memory mapping for both the graph and the |
| 93 | +offsets, and dynamic code dispatch. |
| 94 | + |
| 95 | +Once you load the graph, you can [retrieve the successors of a node] or |
| 96 | +[iterate on the whole graph]. In particular, using the handy [`for_`] macro, |
| 97 | +you can write an iteration on the graph as |
33 | 98 |
|
34 |
| -Once you loaded the [graph](), you can [retrieve the successors of a node]() |
35 |
| -or [iterate on the whole graph](). |
| 99 | +```[ignore] |
| 100 | +for_!((src, succ) in graph { |
| 101 | + for dst in succ { |
| 102 | + [do something with the arc src -> dst] |
| 103 | + } |
| 104 | +}); |
| 105 | +``` |
36 | 106 |
|
37 | 107 | ## More Options
|
38 | 108 |
|
39 |
| -- By starting from the [`BVGraphSeq`] class you can obtain an instance that |
40 |
| -does not need the `.ef` file, but provides only [iteration](). |
| 109 | +- By starting from the [`BVGraphSeq`] class you can obtain an instance that does |
| 110 | + not need the `BASENAME.ef` file, but provides only [iteration]. |
41 | 111 |
|
42 |
| -- Graphs can be labeled by [zipping]() then together with a [labeling](). In fact, |
| 112 | +- Graphs can be labeled by [zipping] them together with a [labeling]. In fact, |
43 | 113 | graphs are just labelings with `usize` labels.
|
44 | 114 |
|
45 | 115 | ## Operating on Graphs
|
46 | 116 |
|
47 |
| -There are many operations available on graphs, such as [`transpose`] or [`simplify`]. |
| 117 | +There are many operations available on graphs, such as [`transpose`] and |
| 118 | +[`simplify`]. You can [permute] a graph. |
48 | 119 |
|
49 | 120 | ## Acknowledgments
|
50 | 121 |
|
51 |
| -This software has been partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU, |
52 |
| -and by project ANR COREGRAPHIE, grant ANR-20-CE23-0002 of the French Agence Nationale de la Recherche. |
| 122 | +This software has been partially supported by project SERICS (PE00000014) under |
| 123 | +the NRRP MUR program funded by the EU - NGEU, and by project ANR COREGRAPHIE, |
| 124 | +grant ANR-20-CE23-0002 of the French Agence Nationale de la Recherche. |
53 | 125 |
|
54 |
| -[`LoadConfig`]: |
55 |
| -[`with_basename`]: |
56 |
| -[`transpose`] |
| 126 | +[`transpose`]: <https://docs.rs/webgraph/latest/webgraph/transform/transpose/index.html> |
| 127 | +[`simplify`]: <https://docs.rs/webgraph/latest/webgraph/transform/simplify/index.html> |
| 128 | +[`with_basename`]: <https://docs.rs/webgraph/latest/webgraph/struct.BVGraph.html#method.with_basename> |
| 129 | +[`BVGraphSeq`]: <https://docs.rs/webgraph/latest/webgraph/struct.BVGraphSeq.html> |
| 130 | +[`LoadConfig`]: <https://docs.rs/webgraph/latest/webgraph/struct.LoadConfig.html> |
| 131 | +[iterate on the whole graph]: <https://docs.rs/webgraph/latest/webgraph/trait/SequentialLabeling.html#method.iter> |
| 132 | +[zipping]: <https://docs.rs/webgraph/latest/webgraph/struct/Zip.html> |
| 133 | +[labeling]: <https://docs.rs/webgraph/latest/webgraph/trait/SequentialLabeling.html> |
| 134 | +[iteration]: <https://docs.rs/webgraph/latest/webgraph/trait/SequentialLabeling.html#method.iter> |
| 135 | +[retrieve the successors of a node]: <https://docs.rs/webgraph/latest/webgraph/trait/RandomAccessGraph.html#method.successors> |
| 136 | +[LAW website]: <http://law.di.unimi.it/> |
| 137 | +[Elias–Fano]: <sux::dict::EliasFano> |
| 138 | +[WebGraph framework]: <https://webgraph.di.unimi.it/> |
| 139 | +[permute]: <https://docs.rs/webgraph/latest/webgraph/transform/permute/index.html> |
| 140 | +[ε-serde]: <nttps://crates.io/crates/epserde/> |
| 141 | +[`for_`]: <https://docs.rs/lender/latest/lender/macro.for_.html> |
0 commit comments