Skip to content

[docs] Fix various syntax and rendering errors #2378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/docset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ toc:
- toc: reference
- toc: release-notes
subs:
version: "9.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we will likely need to be more specific, per discussions in elastic/docs-builder#737

Suggested change
version: "9.0.0"
version-90: "9.0.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intended to be a temporary solution until elastic/docs-builder#737 is addressed and there's a central place to manage versions.

I had to add version subs to multiple repos in related PRs. I'm not planning to go back and update all other repos unless elastic/docs-builder#737 can't be addressed before the next release, but you're welcome to make this change in this repo (and any other repos) if you think that would be better.

es: "Elasticsearch"
esh: "ES-Hadoop"
esh-full: "Elasticsearch for Apache Hadoop"
28 changes: 18 additions & 10 deletions docs/reference/apache-spark-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,11 @@ In case where the results from {{es}} need to be in JSON format (typically to be
#### Type conversion [spark-type-conversion]

::::{important}
When dealing with multi-value/array fields, please see [this](/reference/mapping-types.md#mapping-multi-values) section and in particular [these](/reference/configuration.md#cfg-field-info) configuration options. IMPORTANT: If automatic index creation is used, please review [this](/reference/mapping-types.md#auto-mapping-type-loss) section for more information.
When dealing with multi-value/array fields, please see [this](/reference/mapping-types.md#mapping-multi-values) section and in particular [these](/reference/configuration.md#cfg-field-info) configuration options.
::::

::::{important}
If automatic index creation is used, please review [this](/reference/mapping-types.md#auto-mapping-type-loss) section for more information.
::::


Expand Down Expand Up @@ -562,7 +566,7 @@ Added in 5.0.
::::


[TBC: FANCY QUOTE]
% [TBC: FANCY QUOTE]
Spark Streaming is an extension on top of the core Spark functionality that allows near real time processing of stream data. Spark Streaming works around the idea of `DStream`s, or *Discretized Streams*. `DStreams` operate by collecting newly arrived records into a small `RDD` and executing it. This repeats every few seconds with a new `RDD` in a process called *microbatching*. The `DStream` api includes many of the same processing operations as the `RDD` api, plus a few other streaming specific methods. elasticsearch-hadoop provides native integration with Spark Streaming as of version 5.0.

When using the elasticsearch-hadoop Spark Streaming support, {{es}} can be targeted as an output location to index data into from a Spark Streaming job in the same way that one might persist the results from an `RDD`. Though, unlike `RDD`s, you are unable to read data out of {{es}} using a `DStream` due to the continuous nature of it.
Expand Down Expand Up @@ -1074,7 +1078,7 @@ Added in 2.1.
::::


[TBC: FANCY QUOTE]
% [TBC: FANCY QUOTE]
On top of the core Spark support, elasticsearch-hadoop also provides integration with Spark SQL. In other words, {{es}} becomes a *native* source for Spark SQL so that data can be indexed and queried from Spark SQL *transparently*.

::::{important}
Expand Down Expand Up @@ -1210,7 +1214,7 @@ val df = sql.load( <1>
```

1. `SQLContext` *experimental* `load` method for arbitrary data sources
2. path or resource to load - in this case the index/type in {es}
2. path or resource to load - in this case the index/type in {{es}}
3. the data source provider - `org.elasticsearch.spark.sql`


Expand All @@ -1225,7 +1229,7 @@ val df = sql.read <1>

1. `SQLContext` *experimental* `read` method for arbitrary data sources
2. the data source provider - `org.elasticsearch.spark.sql`
3. path or resource to load - in this case the index/type in {es}
3. path or resource to load - in this case the index/type in {{es}}


In Spark 1.5, this can be further simplified to:
Expand Down Expand Up @@ -1441,8 +1445,8 @@ println(people.schema.treeString) <4>

1. Spark SQL Scala imports
2. elasticsearch-hadoop SQL Scala imports
3. create a `DataFrame` backed by the `spark/people` index in {es}
4. the `DataFrame` associated schema discovered from {es}
3. create a `DataFrame` backed by the `spark/people` index in {{es}}
4. the `DataFrame` associated schema discovered from {{es}}
5. notice how the `age` field was transformed into a `Long` when using the default {{es}} mapping as discussed in the [*Mapping and Types*](/reference/mapping-types.md) chapter.


Expand Down Expand Up @@ -1506,7 +1510,11 @@ DataFrame people = JavaEsSparkSQL.esDF(sql, "spark/people", "?q=Smith"); <1>
#### Spark SQL Type conversion [spark-sql-type-conversion]

::::{important}
When dealing with multi-value/array fields, please see [this](/reference/mapping-types.md#mapping-multi-values) section and in particular [these](/reference/configuration.md#cfg-field-info) configuration options. IMPORTANT: If automatic index creation is used, please review [this](/reference/mapping-types.md#auto-mapping-type-loss) section for more information.
When dealing with multi-value/array fields, please see [this](/reference/mapping-types.md#mapping-multi-values) section and in particular [these](/reference/configuration.md#cfg-field-info) configuration options.
::::

::::{important}
If automatic index creation is used, please review [this](/reference/mapping-types.md#auto-mapping-type-loss) section for more information.
::::


Expand Down Expand Up @@ -1547,7 +1555,7 @@ Added in 6.0.
::::


[TBC: FANCY QUOTE]
% [TBC: FANCY QUOTE]
Released as an experimental feature in Spark 2.0, Spark Structured Streaming provides a unified streaming and batch interface built into the Spark SQL integration. As of elasticsearch-hadoop 6.0, we provide native functionality to index streaming data into {{es}}.

::::{important}
Expand Down Expand Up @@ -1601,7 +1609,7 @@ people.writeStream
3. Instead of calling `read`, call `readStream` to get instance of `DataStreamReader`
4. Read a directory of text files continuously and convert them into `Person` objects
5. Provide a location to save the offsets and commit logs for the streaming query
6. Start the stream using the `"es"` format to index the contents of the `Dataset` continuously to {es}
6. Start the stream using the `"es"` format to index the contents of the `Dataset` continuously to {{es}}


::::{warning}
Expand Down
6 changes: 5 additions & 1 deletion docs/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -730,7 +730,11 @@ Added in 2.2.
: Whether the use the system Socks proxy properties (namely `socksProxyHost` and `socksProxyHost`) or not

::::{note}
elasticsearch-hadoop allows proxy settings to be applied only to its connection using the setting above. Take extra care when there is already a JVM-wide proxy setting (typically through system properties) to avoid unexpected behavior. IMPORTANT: The semantics of these properties are described in the JVM [docs](http://docs.oracle.com/javase/8/docs/api/java/net/doc-files/net-properties.md#Proxies). In some cases, setting up the JVM property `java.net.useSystemProxies` to `true` works better then setting these properties manually.
elasticsearch-hadoop allows proxy settings to be applied only to its connection using the setting above. Take extra care when there is already a JVM-wide proxy setting (typically through system properties) to avoid unexpected behavior.
::::

::::{important}
The semantics of these properties are described in the JVM [docs](http://docs.oracle.com/javase/8/docs/api/java/net/doc-files/net-properties.md#Proxies). In some cases, setting up the JVM property `java.net.useSystemProxies` to `true` works better then setting these properties manually.
::::


Expand Down
6 changes: 4 additions & 2 deletions docs/reference/error-handlers.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Elasticsearch for Apache Hadoop provides an API to handle document level errors
* The raw JSON bulk entry that was tried
* Error message
* HTTP status code for the document
* Number of times that the current document has been sent to {es}
* Number of times that the current document has been sent to {{es}}

There are a few default error handlers provided by the connector:

Expand Down Expand Up @@ -622,7 +622,9 @@ Elasticsearch for Apache Hadoop provides an API to handle document level deseria
* The raw JSON search result that was tried
* Exception encountered

Note: Deserialization Error Handlers only allow handling of errors that occur when parsing documents from scroll responses. It may be possible that a search result can be successfully read, but is still malformed, thus causing an exception when it is used in a completely different part of the framework. This Error Handler is called from the top of the most reasonable place to handle exceptions in the scroll reading process, but this does not encapsulate all logic for each integration.
::::{note}
Deserialization Error Handlers only allow handling of errors that occur when parsing documents from scroll responses. It may be possible that a search result can be successfully read, but is still malformed, thus causing an exception when it is used in a completely different part of the framework. This Error Handler is called from the top of the most reasonable place to handle exceptions in the scroll reading process, but this does not encapsulate all logic for each integration.
::::

There are a few default error handlers provided by the connector:

Expand Down
16 changes: 8 additions & 8 deletions docs/reference/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ navigation_title: Installation

elasticsearch-hadoop binaries can be obtained either by downloading them from the [elastic.co](http://elastic.co) site as a ZIP (containing project jars, sources and documentation) or by using any [Maven](http://maven.apache.org/)-compatible tool with the following dependency:

```xml
```xml subs=true
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>9.0.0-beta1</version>
<version>{{version}}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<version>{{version}}</version>
<version>{{version-90}}</version>

</dependency>
```

Expand All @@ -24,33 +24,33 @@ elasticsearch-hadoop binary is suitable for Hadoop 2.x (also known as YARN) envi

In addition to the *uber* jar, elasticsearch-hadoop provides minimalistic jars for each integration, tailored for those who use just *one* module (in all other situations the `uber` jar is recommended); the jars are smaller in size and use a dedicated pom, covering only the needed dependencies. These are available under the same `groupId`, using an `artifactId` with the pattern `elasticsearch-hadoop-{{integration}}`:

```xml
```xml subs=true
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-mr</artifactId> <1>
<version>9.0.0-beta1</version>
<version>{{version}}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<version>{{version}}</version>
<version>{{version-90}}</version>

</dependency>
```

1. *mr* artifact


```xml
```xml subs=true
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-hive</artifactId> <1>
<version>9.0.0-beta1</version>
<version>{{version}}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<version>{{version}}</version>
<version>{{version-90}}</version>

</dependency>
```

1. *hive* artifact


```xml
```xml subs=true
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-30_2.12</artifactId> <1>
<version>9.0.0-beta1</version>
<version>{{version}}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<version>{{version}}</version>
<version>{{version-90}}</version>

</dependency>
```

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/kerberos.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ if (!job.waitForCompletion(true)) { <3>
```

1. Creating a new job instance
2. EsMapReduceUtil obtains job delegation tokens for {es}
2. EsMapReduceUtil obtains job delegation tokens for {{es}}
3. Submit the job to the cluster


Expand Down
2 changes: 1 addition & 1 deletion docs/reference/runtime-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Unfortunately, these settings need to be setup **manually** **before** the job /

## Speculative execution [_speculative_execution]

[TBC: FANCY QUOTE]
% [TBC: FANCY QUOTE]
In other words, speculative execution is an **optimization**, enabled by default, that allows Hadoop to create duplicates tasks of those which it considers hanged or slowed down. When doing data crunching or reading resources, having duplicate tasks is harmless and means at most a waste of computation resources; however when writing data to an external store, this can cause data corruption through duplicates or unnecessary updates. Since the *speculative execution* behavior can be triggered by external factors (such as network or CPU load which in turn cause false positive) even in stable environments (virtualized clusters are particularly prone to this) and has a direct impact on data, elasticsearch-hadoop disables this optimization for data safety.

Please check your library setting and disable this feature. If you encounter more data then expected, double and triple check this setting.
Expand Down