|
| 1 | +{%- capture title -%} |
| 2 | +Florence2Transformer |
| 3 | +{%- endcapture -%} |
| 4 | + |
| 5 | +{%- capture description -%} |
| 6 | +Florence2Transformer can load Florence-2 models for a wide variety of vision and vision-language tasks using prompt-based inference. |
| 7 | + |
| 8 | +Florence-2 is an advanced vision foundation model from Microsoft that uses a prompt-based approach to handle tasks like image captioning, object detection, segmentation, OCR, and more. The model leverages the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. Its sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings. |
| 9 | + |
| 10 | +Pretrained models can be loaded with `pretrained` of the companion object: |
| 11 | + |
| 12 | +```scala |
| 13 | +val florence2 = Florence2Transformer.pretrained() |
| 14 | + .setInputCols("image_assembler") |
| 15 | + .setOutputCol("answer") |
| 16 | +``` |
| 17 | +The default model is `"florence2_base_ft_int4"`, if no name is provided. |
| 18 | + |
| 19 | +For available pretrained models please see the [Models Hub](https://sparknlp.org/models?task=Vision+Tasks). |
| 20 | + |
| 21 | +==Supported Tasks== |
| 22 | + |
| 23 | +Florence-2 supports a variety of tasks through prompt engineering. The following prompt tokens can be used: |
| 24 | + |
| 25 | +- <CAPTION>: Image captioning |
| 26 | +- <DETAILED_CAPTION>: Detailed image captioning |
| 27 | +- <MORE_DETAILED_CAPTION>: Paragraph-level captioning |
| 28 | +- <CAPTION_TO_PHRASE_GROUNDING>: Phrase grounding from caption (requires additional text input) |
| 29 | +- <OD>: Object detection |
| 30 | +- <DENSE_REGION_CAPTION>: Dense region captioning |
| 31 | +- <REGION_PROPOSAL>: Region proposal |
| 32 | +- <OCR>: Optical Character Recognition (plain text extraction) |
| 33 | +- <OCR_WITH_REGION>: OCR with region information |
| 34 | +- <REFERRING_EXPRESSION_SEGMENTATION>: Segmentation for a referred phrase (requires additional text input) |
| 35 | +- <REGION_TO_SEGMENTATION>: Polygon mask for a region (requires additional text input) |
| 36 | +- <OPEN_VOCABULARY_DETECTION>: Open vocabulary detection for a phrase (requires additional text input) |
| 37 | +- <REGION_TO_CATEGORY>: Category of a region (requires additional text input) |
| 38 | +- <REGION_TO_DESCRIPTION>: Description of a region (requires additional text input) |
| 39 | +- <REGION_TO_OCR>: OCR for a region (requires additional text input) |
| 40 | + |
| 41 | +{%- endcapture -%} |
| 42 | + |
| 43 | +{%- capture input_anno -%} |
| 44 | +IMAGE |
| 45 | +{%- endcapture -%} |
| 46 | + |
| 47 | +{%- capture output_anno -%} |
| 48 | +DOCUMENT |
| 49 | +{%- endcapture -%} |
| 50 | + |
| 51 | +{%- capture python_example -%} |
| 52 | +import sparknlp |
| 53 | +from sparknlp.base import * |
| 54 | +from sparknlp.annotator import * |
| 55 | +from pyspark.ml import Pipeline |
| 56 | +from pyspark.sql.functions import lit |
| 57 | + |
| 58 | +image_df = spark.read.format("image").load(path=images_path) # Replace with your image path |
| 59 | +test_df = image_df.withColumn("text", lit("<OD>")) |
| 60 | + |
| 61 | +imageAssembler = ImageAssembler() |
| 62 | + .setInputCol("image") |
| 63 | + .setOutputCol("image_assembler") |
| 64 | + |
| 65 | +florence2 = Florence2Transformer.pretrained() |
| 66 | + .setInputCols(["image_assembler"]) |
| 67 | + .setOutputCol("answer") |
| 68 | + |
| 69 | +pipeline = Pipeline().setStages([ |
| 70 | + imageAssembler, |
| 71 | + florence2 |
| 72 | +]) |
| 73 | + |
| 74 | +result = pipeline.fit(test_df).transform(test_df) |
| 75 | +result.select("image_assembler.origin", "answer.result").show(False) |
| 76 | +{%- endcapture -%} |
| 77 | + |
| 78 | +{%- capture scala_example -%} |
| 79 | +import spark.implicits._ |
| 80 | +import com.johnsnowlabs.nlp.base._ |
| 81 | +import com.johnsnowlabs.nlp.annotator._ |
| 82 | +import org.apache.spark.ml.Pipeline |
| 83 | +import org.apache.spark.sql.DataFrame |
| 84 | +import org.apache.spark.sql.functions.lit |
| 85 | + |
| 86 | +val imageFolder = "path/to/your/images" // Replace with your image path |
| 87 | + |
| 88 | +val imageDF: DataFrame = spark.read |
| 89 | + .format("image") |
| 90 | + .option("dropInvalid", value = true) |
| 91 | + .load(imageFolder) |
| 92 | + |
| 93 | +val testDF: DataFrame = imageDF.withColumn("text", lit("<OD>")) |
| 94 | + |
| 95 | +val imageAssembler: ImageAssembler = new ImageAssembler() |
| 96 | + .setInputCol("image") |
| 97 | + .setOutputCol("image_assembler") |
| 98 | + |
| 99 | +val florence2 = Florence2Transformer.pretrained() |
| 100 | + .setInputCols("image_assembler") |
| 101 | + .setOutputCol("answer") |
| 102 | + |
| 103 | +val pipeline = new Pipeline().setStages(Array( |
| 104 | + imageAssembler, |
| 105 | + florence2 |
| 106 | +)) |
| 107 | + |
| 108 | +val result = pipeline.fit(testDF).transform(testDF) |
| 109 | + |
| 110 | +result.select("image_assembler.origin", "answer.result").show(false) |
| 111 | +{%- endcapture -%} |
| 112 | + |
| 113 | +{%- capture api_link -%} |
| 114 | +[Florence2Transformer](/api/com/johnsnowlabs/nlp/annotators/cv/Florence2Transformer) |
| 115 | +{%- endcapture -%} |
| 116 | + |
| 117 | +{%- capture python_api_link -%} |
| 118 | +[Florence2Transformer](/api/python/reference/autosummary/sparknlp/annotator/cv/florence2_transformer/index.html#sparknlp.annotator.cv.florence2_transformer.Florence2Transformer) |
| 119 | +{%- endcapture -%} |
| 120 | + |
| 121 | +{%- capture source_link -%} |
| 122 | +[Florence2Transformer](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/cv/Florence2Transformer.scala) |
| 123 | +{%- endcapture -%} |
| 124 | + |
| 125 | +{% include templates/anno_template.md |
| 126 | + title=title |
| 127 | + description=description |
| 128 | + input_anno=input_anno |
| 129 | + output_anno=output_anno |
| 130 | + python_example=python_example |
| 131 | + scala_example=scala_example |
| 132 | + api_link=api_link |
| 133 | + python_api_link=python_api_link |
| 134 | + source_link=source_link |
| 135 | +%} |
0 commit comments