Skip to content

Commit 8f89239

Browse files
prabodDevinTDHa
authored andcommitted
[SPARKNLP-1131] - Introducing Florance-2 (#14585)
* Florence2 Scala API * preprocessing and postprocessing for images * test images * python API and tests * notebooks and docs * changed postprocessing and added tests for all the tasks
1 parent 8f16acf commit 8f89239

File tree

17 files changed

+3817
-2
lines changed

17 files changed

+3817
-2
lines changed
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
{%- capture title -%}
2+
Florence2Transformer
3+
{%- endcapture -%}
4+
5+
{%- capture description -%}
6+
Florence2Transformer can load Florence-2 models for a wide variety of vision and vision-language tasks using prompt-based inference.
7+
8+
Florence-2 is an advanced vision foundation model from Microsoft that uses a prompt-based approach to handle tasks like image captioning, object detection, segmentation, OCR, and more. The model leverages the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. Its sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings.
9+
10+
Pretrained models can be loaded with `pretrained` of the companion object:
11+
12+
```scala
13+
val florence2 = Florence2Transformer.pretrained()
14+
.setInputCols("image_assembler")
15+
.setOutputCol("answer")
16+
```
17+
The default model is `"florence2_base_ft_int4"`, if no name is provided.
18+
19+
For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Vision+Tasks).
20+
21+
==Supported Tasks==
22+
23+
Florence-2 supports a variety of tasks through prompt engineering. The following prompt tokens can be used:
24+
25+
- <CAPTION>: Image captioning
26+
- <DETAILED_CAPTION>: Detailed image captioning
27+
- <MORE_DETAILED_CAPTION>: Paragraph-level captioning
28+
- <CAPTION_TO_PHRASE_GROUNDING>: Phrase grounding from caption (requires additional text input)
29+
- <OD>: Object detection
30+
- <DENSE_REGION_CAPTION>: Dense region captioning
31+
- <REGION_PROPOSAL>: Region proposal
32+
- <OCR>: Optical Character Recognition (plain text extraction)
33+
- <OCR_WITH_REGION>: OCR with region information
34+
- <REFERRING_EXPRESSION_SEGMENTATION>: Segmentation for a referred phrase (requires additional text input)
35+
- <REGION_TO_SEGMENTATION>: Polygon mask for a region (requires additional text input)
36+
- <OPEN_VOCABULARY_DETECTION>: Open vocabulary detection for a phrase (requires additional text input)
37+
- <REGION_TO_CATEGORY>: Category of a region (requires additional text input)
38+
- <REGION_TO_DESCRIPTION>: Description of a region (requires additional text input)
39+
- <REGION_TO_OCR>: OCR for a region (requires additional text input)
40+
41+
{%- endcapture -%}
42+
43+
{%- capture input_anno -%}
44+
IMAGE
45+
{%- endcapture -%}
46+
47+
{%- capture output_anno -%}
48+
DOCUMENT
49+
{%- endcapture -%}
50+
51+
{%- capture python_example -%}
52+
import sparknlp
53+
from sparknlp.base import *
54+
from sparknlp.annotator import *
55+
from pyspark.ml import Pipeline
56+
from pyspark.sql.functions import lit
57+
58+
image_df = spark.read.format("image").load(path=images_path) # Replace with your image path
59+
test_df = image_df.withColumn("text", lit("<OD>"))
60+
61+
imageAssembler = ImageAssembler()
62+
.setInputCol("image")
63+
.setOutputCol("image_assembler")
64+
65+
florence2 = Florence2Transformer.pretrained()
66+
.setInputCols(["image_assembler"])
67+
.setOutputCol("answer")
68+
69+
pipeline = Pipeline().setStages([
70+
imageAssembler,
71+
florence2
72+
])
73+
74+
result = pipeline.fit(test_df).transform(test_df)
75+
result.select("image_assembler.origin", "answer.result").show(False)
76+
{%- endcapture -%}
77+
78+
{%- capture scala_example -%}
79+
import spark.implicits._
80+
import com.johnsnowlabs.nlp.base._
81+
import com.johnsnowlabs.nlp.annotator._
82+
import org.apache.spark.ml.Pipeline
83+
import org.apache.spark.sql.DataFrame
84+
import org.apache.spark.sql.functions.lit
85+
86+
val imageFolder = "path/to/your/images" // Replace with your image path
87+
88+
val imageDF: DataFrame = spark.read
89+
.format("image")
90+
.option("dropInvalid", value = true)
91+
.load(imageFolder)
92+
93+
val testDF: DataFrame = imageDF.withColumn("text", lit("<OD>"))
94+
95+
val imageAssembler: ImageAssembler = new ImageAssembler()
96+
.setInputCol("image")
97+
.setOutputCol("image_assembler")
98+
99+
val florence2 = Florence2Transformer.pretrained()
100+
.setInputCols("image_assembler")
101+
.setOutputCol("answer")
102+
103+
val pipeline = new Pipeline().setStages(Array(
104+
imageAssembler,
105+
florence2
106+
))
107+
108+
val result = pipeline.fit(testDF).transform(testDF)
109+
110+
result.select("image_assembler.origin", "answer.result").show(false)
111+
{%- endcapture -%}
112+
113+
{%- capture api_link -%}
114+
[Florence2Transformer](/api/com/johnsnowlabs/nlp/annotators/cv/Florence2Transformer)
115+
{%- endcapture -%}
116+
117+
{%- capture python_api_link -%}
118+
[Florence2Transformer](/api/python/reference/autosummary/sparknlp/annotator/cv/florence2_transformer/index.html#sparknlp.annotator.cv.florence2_transformer.Florence2Transformer)
119+
{%- endcapture -%}
120+
121+
{%- capture source_link -%}
122+
[Florence2Transformer](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/cv/Florence2Transformer.scala)
123+
{%- endcapture -%}
124+
125+
{% include templates/anno_template.md
126+
title=title
127+
description=description
128+
input_anno=input_anno
129+
output_anno=output_anno
130+
python_example=python_example
131+
scala_example=scala_example
132+
api_link=api_link
133+
python_api_link=python_api_link
134+
source_link=source_link
135+
%}

0 commit comments

Comments
 (0)