Close Menu
  • Breaking News
  • Business
  • Career
  • Sports
  • Climate
  • Science
    • Tech
  • Culture
  • Health
  • Lifestyle
  • Facebook
  • Instagram
  • TikTok
Categories
  • Breaking News (5,031)
  • Business (312)
  • Career (4,268)
  • Climate (213)
  • Culture (4,235)
  • Education (4,450)
  • Finance (202)
  • Health (854)
  • Lifestyle (4,122)
  • Science (4,138)
  • Sports (312)
  • Tech (174)
  • Uncategorized (1)
Hand Picked

UK backs innovation hub Cambridge — expansion tests city’s limit

October 29, 2025

A+ Teacher shows students how to live healthy lifestyle 

October 29, 2025

Study Links Mysterious Lights in The Sky to Historic Nuclear Tests : ScienceAlert

October 29, 2025

Karine Jean-Pierre criticized for ‘incoherent’ New Yorker interview about book

October 29, 2025
Facebook X (Twitter) Instagram
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
Facebook X (Twitter) Instagram
onlyfacts24
  • Breaking News

    UK backs innovation hub Cambridge — expansion tests city’s limit

    October 29, 2025

    Blue Jays rebound to even World Series after marathon Game 3

    October 29, 2025

    North Korea test-fires cruise missiles as Trump visits South Korea | Nuclear Weapons News

    October 29, 2025

    Australia’s inflation tops forecasts at 3.2%, highest in over a year

    October 29, 2025

    Tony Dungy argues NFL’s replay rules hurts credibility

    October 28, 2025
  • Business

    Google Business Profile New Report Negative Review Extortion Scams

    October 23, 2025

    Land Topic is Everybody’s Business

    October 20, 2025

    Global Topic: Air India selects Panasonic Avionics’ Astrova for 34 widebody aircraft | Business Solutions | Products & Solutions | Topics

    October 19, 2025

    Business Engagement | IUCN

    October 14, 2025

    10 ways artificial intelligence is transforming operations management | IBM

    October 11, 2025
  • Career

    Karine Jean-Pierre criticized for ‘incoherent’ New Yorker interview about book

    October 29, 2025

    News and Community

    October 29, 2025

    The youngest Denison students get career lessons | Texas Headlines

    October 29, 2025

    Building impact across oceans | News

    October 29, 2025

    Flying fellow Catholics to Rome a career highlight

    October 28, 2025
  • Sports

    Raiders DE Maxx Crosby Weighs In on Sports’ Hottest Topic

    October 28, 2025

    Bye Week Off-Topic Thread – Yahoo Sports

    October 25, 2025

    This Thunder Rookie Guard Benefits from the Nikola Topic Injury

    October 23, 2025

    South Bend Topic Sports-betting | WSBT 22: News, Weather and Sports for Michiana

    October 21, 2025

    John Tesh’s iconic ‘Roundball Rock’ theme returns for NBA on NBC

    October 21, 2025
  • Climate

    PA Environment & Energy Articles & NewsClips By Topic

    October 26, 2025

    important environmental topics 2024| Statista

    October 21, 2025

    World BankDevelopment TopicsProvide sustainable food systems, water, and economies for healthy people and a healthy planet. Agriculture · Agribusiness and Value Chains · Climate-Smart….2 days ago

    October 20, 2025

    PA Environment & Energy Articles & NewsClips By Topic

    October 17, 2025

    World Bank Group and the Intergovernmental Negotiating Committee on Plastic Pollution Process

    October 14, 2025
  • Science
    1. Tech
    2. View All

    It is a hot topic as Grok and DeepSeek overwhelmed big tech AI models such as ChatGPT and Gemini in ..

    October 24, 2025

    Countdown to the Tech.eu Summit London 2025: Key Topics, Speakers, and Opportunities

    October 23, 2025

    The High-Tech Agenda of the German government

    October 20, 2025

    Texas Tech Universities Ban Teaching About Transgender and Other Gender Topics

    October 19, 2025

    Study Links Mysterious Lights in The Sky to Historic Nuclear Tests : ScienceAlert

    October 29, 2025

    Gravitational wave events hint at ‘second-generation’ black holes

    October 29, 2025

    SpaceX plans rocket launch today. Will Falcon 9 be visible in Arizona?

    October 29, 2025

    Glowing meteor trail photobombs Comet Lemmon in incredible telescope image

    October 29, 2025
  • Culture

    Driver-led Safety Culture Cuts Fleet Insurance

    October 29, 2025

    When politics shape corporate culture: How national leadership influences company messaging

    October 29, 2025

    Mānoa: VNR: Symphony of the Hawaiʻi Seas unites science, culture, art

    October 29, 2025

    Native American culture to be celebrated in Lacombe | One Tammany

    October 29, 2025

    Panda Fest Brings Asian Culture and Cuisine to Dallas: A Must-Visit Festival in the US

    October 28, 2025
  • Health

    Breast Cancer Awareness Month 2025

    October 26, 2025

    Hampton: Community Encouraged To Attend November Los Alamos County Health Council Meeting

    October 24, 2025

    Health Insurance vs. Nuclear Weapons

    October 23, 2025

    Health Care Coverage For Seniors Topic Of West Hartford Forum

    October 20, 2025

    Mental health & finance topic for women @Bromley conference

    October 17, 2025
  • Lifestyle
Contact
onlyfacts24
Home»Science»Florence-2: Advancing Multiple Vision Tasks with a Single VLM Model | by Lihi Gur Arie, PhD | Oct, 2024
Science

Florence-2: Advancing Multiple Vision Tasks with a Single VLM Model | by Lihi Gur Arie, PhD | Oct, 2024

October 15, 2024No Comments
Facebook Twitter Pinterest LinkedIn Tumblr Email
1j4ruoxbuk Cy 1o3jz5qxg.png
Share
Facebook Twitter LinkedIn Pinterest Email

Loading Florence-2 model and a sample image

After installing and importing the necessary libraries (as demonstrated in the accompanying Colab notebook), we begin by loading the Florence-2 model, processor and the input image of a camera:

#Load model:
model_id = ‘microsoft/Florence-2-large’
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype='auto').eval().cuda()
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

#Load image:
image = Image.open(img_path)

Auxiliary Functions

In this tutorial, we will use several auxiliary functions. The most important is the run_example core function, which generates a response from the Florence-2 model.

The run_example function combines the task prompt with any additional text input (if provided) into a single prompt. Using the processor, it generates text and image embeddings that serve as inputs to the model. The magic happens during the model.generate step, where the model’s response is generated. Here’s a breakdown of some key parameters:

  • max_new_tokens=1024: Sets the maximum length of the output, allowing for detailed responses.
  • do_sample=False: Ensures a deterministic response.
  • num_beams=3: Implements beam search with the top 3 most likely tokens at each step, exploring multiple potential sequences to find the best overall output.
  • early_stopping=False: Ensures beam search continues until all beams reach the maximum length or an end-of-sequence token is generated.

Lastly, the model’s output is decoded and post-processed with processor.batch_decode and processor.post_process_generation to produce the final text response, which is returned by the run_example function.

def run_example(image, task_prompt, text_input=''):

prompt = task_prompt + text_input

inputs = processor(text=prompt, images=image, return_tensors=”pt”).to(‘cuda’, torch.float16)

generated_ids = model.generate(
input_ids=inputs[“input_ids”].cuda(),
pixel_values=inputs[“pixel_values”].cuda(),
max_new_tokens=1024,
do_sample=False,
num_beams=3,
early_stopping=False,
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(
generated_text,
task=task_prompt,
image_size=(image.width, image.height)
)

return parsed_answer

Additionally, we utilize auxiliary functions to visualize the results (draw_bbox ,draw_ocr_bboxes and draw_polygon) and handle the conversion between bounding boxes formats (convert_bbox_to_florence-2 and convert_florence-2_to_bbox). These can be explored in the attached Colab notebook.

Florence-2 can perform a variety of visual tasks. Let’s explore some of its capabilities, starting with image captioning.

1. Captioning Generation Related Tasks:

1.1 Generate Captions

Florence-2 can generate image captions at various levels of detail, using the '

'

, '' or '' task prompts.

print (run_example(image, task_prompt=''))
# Output: 'A black camera sitting on top of a wooden table.'

print (run_example(image, task_prompt=''))
# Output: 'The image shows a black Kodak V35 35mm film camera sitting on top of a wooden table with a blurred background.'

print (run_example(image, task_prompt=''))
# Output: 'The image is a close-up of a Kodak VR35 digital camera. The camera is black in color and has the Kodak logo on the top left corner. The body of the camera is made of wood and has a textured grip for easy handling. The lens is in the center of the body and is surrounded by a gold-colored ring. On the top right corner, there is a small LCD screen and a flash. The background is blurred, but it appears to be a wooded area with trees and greenery.'

The model accurately describes the image and its surrounding. It even identifies the camera’s brand and model, demonstrating its OCR ability. However, in the '' task there are minor inconsistencies, which is expected from a zero-shot model.

1.2 Generate Caption for a Given Bounding Box

Florence-2 can generate captions for specific regions of an image defined by bounding boxes. For this, it takes the bounding box location as input. You can extract the category with '' or a description with '' .

For your convenience, I added a widget to the Colab notebook that enables you to draw a bounding box on the image, and code to convert it to Florence-2 format.

task_prompt = ''
box_str = ''
results = run_example(image, task_prompt, text_input=box_str)
# Output: 'camera lens'
task_prompt = ''
box_str = ''
results = run_example(image, task_prompt, text_input=box_str)
# Output: 'camera'

In this case, the '' identified the lens, while the '' was less specific. However, this performance may vary with different images.

2. Object Detection Related Tasks:

2.1 Generate Bounding Boxes and Text for Objects

Florence-2 can identify densely packed regions in the image, and to provide their bounding box coordinates and their related labels or captions. To extract bounding boxes with labels, use the ’’task prompt:

results = run_example(image, task_prompt='')
draw_bbox(image, results[''])

To extract bounding boxes with captions, use '' task prompt:

task_prompt results = run_example(image, task_prompt= '')
draw_bbox(image, results[''])
The image on the left shows the results of the ’’ task prompt, while the image on the right demonstrates ‘’

2.2 Text Grounded Object Detection

Florence-2 can also perform text-grounded object detection. By providing specific object names or descriptions as input, Florence-2 detects bounding boxes around the specified objects.

task_prompt = ''
results = run_example(image,task_prompt, text_input=”lens. camera. table. logo. flash.”)
draw_bbox(image, results[''])
CAPTION_TO_PHRASE_GROUNDING task with the text input: “lens. camera. table. logo. flash.”

3. Segmentation Related Tasks:

Florence-2 can also generate segmentation polygons grounded by text ('') or by bounding boxes (''):

results = run_example(image, task_prompt='', text_input=”camera”)
draw_polygons(image, results[task_prompt])
results = run_example(image, task_prompt='', text_input="")
draw_polygons(output_image, results[''])
The image on the left shows the results of the REFERRING_EXPRESSION_SEGMENTATION task with ‘camera’ text as input. The image on the right demonstrates REGION_TO_SEGMENTATION task with a bounding box around the lens provided as input.

4. OCR Related Tasks:

Florence-2 demonstrates strong OCR capabilities. It can extract text from an image with the '' task prompt, and extract both text and its location with '' :

results = run_example(image,task_prompt)
draw_ocr_bboxes(image, results[''])

Florence-2 is a versatile Vision-Language Model (VLM), capable of handling multiple vision tasks within a single model. Its zero-shot capabilities are impressive across diverse tasks such as image captioning, object detection, segmentation and OCR. While Florence-2 performs well out-of-the-box, additional fine-tuning can further adapt the model to new tasks or improve its performance on unique, custom datasets.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Study Links Mysterious Lights in The Sky to Historic Nuclear Tests : ScienceAlert

October 29, 2025

Gravitational wave events hint at ‘second-generation’ black holes

October 29, 2025

SpaceX plans rocket launch today. Will Falcon 9 be visible in Arizona?

October 29, 2025

Glowing meteor trail photobombs Comet Lemmon in incredible telescope image

October 29, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

UK backs innovation hub Cambridge — expansion tests city’s limit

October 29, 2025

A+ Teacher shows students how to live healthy lifestyle 

October 29, 2025

Study Links Mysterious Lights in The Sky to Historic Nuclear Tests : ScienceAlert

October 29, 2025

Karine Jean-Pierre criticized for ‘incoherent’ New Yorker interview about book

October 29, 2025
News
  • Breaking News (5,031)
  • Business (312)
  • Career (4,268)
  • Climate (213)
  • Culture (4,235)
  • Education (4,450)
  • Finance (202)
  • Health (854)
  • Lifestyle (4,122)
  • Science (4,138)
  • Sports (312)
  • Tech (174)
  • Uncategorized (1)

Subscribe to Updates

Get the latest news from onlyfacts24.

Follow Us
  • Facebook
  • Instagram
  • TikTok

Subscribe to Updates

Get the latest news from ONlyfacts24.

News
  • Breaking News (5,031)
  • Business (312)
  • Career (4,268)
  • Climate (213)
  • Culture (4,235)
  • Education (4,450)
  • Finance (202)
  • Health (854)
  • Lifestyle (4,122)
  • Science (4,138)
  • Sports (312)
  • Tech (174)
  • Uncategorized (1)
Facebook Instagram TikTok
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
© 2025 Designed by onlyfacts24

Type above and press Enter to search. Press Esc to cancel.