Close Menu
  • Breaking News
  • Business
  • Career
  • Sports
  • Climate
  • Science
    • Tech
  • Culture
  • Health
  • Lifestyle
  • Facebook
  • Instagram
  • TikTok
Categories
  • Breaking News (6,096)
  • Business (340)
  • Career (5,066)
  • Climate (231)
  • Culture (5,021)
  • Education (5,323)
  • Finance (239)
  • Health (917)
  • Lifestyle (4,793)
  • Science (5,005)
  • Sports (366)
  • Tech (191)
  • Uncategorized (1)
Hand Picked

Choosing the Right Education Specialization for Your Teaching Career

January 26, 2026

Hawaiʻi lawmakers want more revenue streams to craft future of culture and arts

January 26, 2026

Math at center of Hawaii’s education priorities for 2026 session

January 26, 2026

Home Office admits facial recognition tech issue with black and Asian subjects | Facial recognition

January 26, 2026
Facebook X (Twitter) Instagram
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
Facebook X (Twitter) Instagram
onlyfacts24
  • Breaking News

    White House dials back Trump admin’s tone on Alex Pretti killing

    January 26, 2026

    White House backs Noem, Border Patrol as Homan takes point in Minneapolis after fatal shooting

    January 26, 2026

    Netanyahu says next phase of ceasefire is ‘demilitarising’ Gaza | Israel-Palestine conflict

    January 26, 2026

    USA Rare Earth shares jump 20% as Commerce Department takes equity stake

    January 26, 2026

    Former ICE agent warns police-ICE non-cooperation Is a ‘formula for disaster’

    January 26, 2026
  • Business

    Only two UNF SG committees able to conduct business, approve requests, discuss survey topic

    January 26, 2026

    How to Track Social Media Trends

    January 23, 2026

    Music Business 104 Wraps Fourth Edition With Global Growth

    January 22, 2026

    Starting a local business topic of Jan. 29 workshop in Gulf Shores & Orange Beach

    January 20, 2026

    Greenland expected to be a hot topic as President Trump meets with global business leaders

    January 20, 2026
  • Career

    Choosing the Right Education Specialization for Your Teaching Career

    January 26, 2026

    Buckeye Career Center honors dedicated board members during School Board Recognition Month

    January 26, 2026

    From neon onesies to heights of an influential ski patrol career

    January 26, 2026

    Antigo Daily JournalTFT applications open for career weekEAGLE RIVER — Trees For Tomorrow (TFT), an environmental education center, announced its 60th annual Natural Resources Careers Exploration….4 hours ago

    January 26, 2026

    NFC championship: Seahawks ride late fourth-down stop, career game from Sam Darnold past Rams into Super Bowl

    January 26, 2026
  • Sports

    Madison Square Garden | concerts, sports, entertainment

    January 21, 2026

    New Bay City schools superintendent Grant Hegenauer tackles sports-topic Q&A

    January 21, 2026

    Catch rule could become a hot topic in 2026 offseason

    January 20, 2026

    Protests, State House activity, high school sports topic of central Maine week in photos

    January 16, 2026

    Figure skating | Olympics, Jumps, Moves, History, & Competitions

    January 16, 2026
  • Climate

    PA Environment & Energy Articles & NewsClips By Topic

    January 26, 2026

    PA Environment Digest BlogStories You May Have Missed Last Week: PA Environment & Energy Articles & NewsClips By TopicPA Environment Digest Puts Links To The Best Environment & Energy Articles and NewsClips From Last Week Here By Topic–..1 day ago

    January 18, 2026

    The Providence JournalWill the environment be a big topic during the legislative session? What to expectEnvironmental advocates are grappling with how to meet the state's coming climate goals..1 day ago

    January 13, 2026

    New Updates To California’s Climate Disclosure Laws – Climate Change

    January 6, 2026

    PA Environment & Energy Articles & NewsClips By Topic

    January 6, 2026
  • Science
    1. Tech
    2. View All

    Home Office admits facial recognition tech issue with black and Asian subjects | Facial recognition

    January 26, 2026

    EU researchers are increasingly publishing on tech topics with China • Table.Briefings

    January 9, 2026

    CES 2026 trends to watch: 5 biggest topics we’re expecting at the world’s biggest tech show

    January 1, 2026

    turbulent year for end-device and downstream applications

    January 1, 2026

    New DNA analysis rewrites the story of the Beachy Head Woman

    January 26, 2026

    A Red and Green Sky Over Europe? NASA’s Photo From Space Shows the Full Spectacle

    January 26, 2026

    James Webb telescope peers into ‘Eye of God’ and finds clues to life’s origins — Space photo of the week

    January 26, 2026

    Scientists Identify Brain Waves That Define The Limits of ‘You’ : ScienceAlert

    January 26, 2026
  • Culture

    Hawaiʻi lawmakers want more revenue streams to craft future of culture and arts

    January 26, 2026

    Doug Hancock ‘Riders of the Buffalo Nations’ – A Photobook Celebrating Contemporary First Nations Youth Culture   – News

    January 26, 2026

    Fire & Ash’ atop the box office on snow-blanketed weekend in theaters

    January 26, 2026

    Bias against working class should be illegal, culture review says

    January 26, 2026

    Antebellum Liberty Hall preserves Alabama culture, values in form of bed and breakfast

    January 26, 2026
  • Health

    Speech & Debate: “Health Insurance” to be 2026-27 National High School Policy Debate Topic

    January 23, 2026

    Hidden mental health burden on America’s agricultural heartland topic at FHSU Feb. 5

    January 23, 2026

    Reportable Medical Events at Military Health System Facilities Through Week 14, Ending April 5, 2025

    January 22, 2026

    Mpox – Southern Nevada Health District

    January 21, 2026

    Google AI Overviews cite YouTube most often for health topics: Study

    January 20, 2026
  • Lifestyle
Contact
onlyfacts24
Home»Science»Florence-2: Advancing Multiple Vision Tasks with a Single VLM Model | by Lihi Gur Arie, PhD | Oct, 2024
Science

Florence-2: Advancing Multiple Vision Tasks with a Single VLM Model | by Lihi Gur Arie, PhD | Oct, 2024

October 15, 2024No Comments
Facebook Twitter Pinterest LinkedIn Tumblr Email
1j4ruoxbuk Cy 1o3jz5qxg.png
Share
Facebook Twitter LinkedIn Pinterest Email

Loading Florence-2 model and a sample image

After installing and importing the necessary libraries (as demonstrated in the accompanying Colab notebook), we begin by loading the Florence-2 model, processor and the input image of a camera:

#Load model:
model_id = ‘microsoft/Florence-2-large’
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype='auto').eval().cuda()
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

#Load image:
image = Image.open(img_path)

Auxiliary Functions

In this tutorial, we will use several auxiliary functions. The most important is the run_example core function, which generates a response from the Florence-2 model.

The run_example function combines the task prompt with any additional text input (if provided) into a single prompt. Using the processor, it generates text and image embeddings that serve as inputs to the model. The magic happens during the model.generate step, where the model’s response is generated. Here’s a breakdown of some key parameters:

  • max_new_tokens=1024: Sets the maximum length of the output, allowing for detailed responses.
  • do_sample=False: Ensures a deterministic response.
  • num_beams=3: Implements beam search with the top 3 most likely tokens at each step, exploring multiple potential sequences to find the best overall output.
  • early_stopping=False: Ensures beam search continues until all beams reach the maximum length or an end-of-sequence token is generated.

Lastly, the model’s output is decoded and post-processed with processor.batch_decode and processor.post_process_generation to produce the final text response, which is returned by the run_example function.

def run_example(image, task_prompt, text_input=''):

prompt = task_prompt + text_input

inputs = processor(text=prompt, images=image, return_tensors=”pt”).to(‘cuda’, torch.float16)

generated_ids = model.generate(
input_ids=inputs[“input_ids”].cuda(),
pixel_values=inputs[“pixel_values”].cuda(),
max_new_tokens=1024,
do_sample=False,
num_beams=3,
early_stopping=False,
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(
generated_text,
task=task_prompt,
image_size=(image.width, image.height)
)

return parsed_answer

Additionally, we utilize auxiliary functions to visualize the results (draw_bbox ,draw_ocr_bboxes and draw_polygon) and handle the conversion between bounding boxes formats (convert_bbox_to_florence-2 and convert_florence-2_to_bbox). These can be explored in the attached Colab notebook.

Florence-2 can perform a variety of visual tasks. Let’s explore some of its capabilities, starting with image captioning.

1. Captioning Generation Related Tasks:

1.1 Generate Captions

Florence-2 can generate image captions at various levels of detail, using the '

'

, '' or '' task prompts.

print (run_example(image, task_prompt=''))
# Output: 'A black camera sitting on top of a wooden table.'

print (run_example(image, task_prompt=''))
# Output: 'The image shows a black Kodak V35 35mm film camera sitting on top of a wooden table with a blurred background.'

print (run_example(image, task_prompt=''))
# Output: 'The image is a close-up of a Kodak VR35 digital camera. The camera is black in color and has the Kodak logo on the top left corner. The body of the camera is made of wood and has a textured grip for easy handling. The lens is in the center of the body and is surrounded by a gold-colored ring. On the top right corner, there is a small LCD screen and a flash. The background is blurred, but it appears to be a wooded area with trees and greenery.'

The model accurately describes the image and its surrounding. It even identifies the camera’s brand and model, demonstrating its OCR ability. However, in the '' task there are minor inconsistencies, which is expected from a zero-shot model.

1.2 Generate Caption for a Given Bounding Box

Florence-2 can generate captions for specific regions of an image defined by bounding boxes. For this, it takes the bounding box location as input. You can extract the category with '' or a description with '' .

For your convenience, I added a widget to the Colab notebook that enables you to draw a bounding box on the image, and code to convert it to Florence-2 format.

task_prompt = ''
box_str = ''
results = run_example(image, task_prompt, text_input=box_str)
# Output: 'camera lens'
task_prompt = ''
box_str = ''
results = run_example(image, task_prompt, text_input=box_str)
# Output: 'camera'

In this case, the '' identified the lens, while the '' was less specific. However, this performance may vary with different images.

2. Object Detection Related Tasks:

2.1 Generate Bounding Boxes and Text for Objects

Florence-2 can identify densely packed regions in the image, and to provide their bounding box coordinates and their related labels or captions. To extract bounding boxes with labels, use the ’’task prompt:

results = run_example(image, task_prompt='')
draw_bbox(image, results[''])

To extract bounding boxes with captions, use '' task prompt:

task_prompt results = run_example(image, task_prompt= '')
draw_bbox(image, results[''])
The image on the left shows the results of the ’’ task prompt, while the image on the right demonstrates ‘’

2.2 Text Grounded Object Detection

Florence-2 can also perform text-grounded object detection. By providing specific object names or descriptions as input, Florence-2 detects bounding boxes around the specified objects.

task_prompt = ''
results = run_example(image,task_prompt, text_input=”lens. camera. table. logo. flash.”)
draw_bbox(image, results[''])
CAPTION_TO_PHRASE_GROUNDING task with the text input: “lens. camera. table. logo. flash.”

3. Segmentation Related Tasks:

Florence-2 can also generate segmentation polygons grounded by text ('') or by bounding boxes (''):

results = run_example(image, task_prompt='', text_input=”camera”)
draw_polygons(image, results[task_prompt])
results = run_example(image, task_prompt='', text_input="")
draw_polygons(output_image, results[''])
The image on the left shows the results of the REFERRING_EXPRESSION_SEGMENTATION task with ‘camera’ text as input. The image on the right demonstrates REGION_TO_SEGMENTATION task with a bounding box around the lens provided as input.

4. OCR Related Tasks:

Florence-2 demonstrates strong OCR capabilities. It can extract text from an image with the '' task prompt, and extract both text and its location with '' :

results = run_example(image,task_prompt)
draw_ocr_bboxes(image, results[''])

Florence-2 is a versatile Vision-Language Model (VLM), capable of handling multiple vision tasks within a single model. Its zero-shot capabilities are impressive across diverse tasks such as image captioning, object detection, segmentation and OCR. While Florence-2 performs well out-of-the-box, additional fine-tuning can further adapt the model to new tasks or improve its performance on unique, custom datasets.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

New DNA analysis rewrites the story of the Beachy Head Woman

January 26, 2026

A Red and Green Sky Over Europe? NASA’s Photo From Space Shows the Full Spectacle

January 26, 2026

James Webb telescope peers into ‘Eye of God’ and finds clues to life’s origins — Space photo of the week

January 26, 2026

Scientists Identify Brain Waves That Define The Limits of ‘You’ : ScienceAlert

January 26, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Choosing the Right Education Specialization for Your Teaching Career

January 26, 2026

Hawaiʻi lawmakers want more revenue streams to craft future of culture and arts

January 26, 2026

Math at center of Hawaii’s education priorities for 2026 session

January 26, 2026

Home Office admits facial recognition tech issue with black and Asian subjects | Facial recognition

January 26, 2026
News
  • Breaking News (6,096)
  • Business (340)
  • Career (5,066)
  • Climate (231)
  • Culture (5,021)
  • Education (5,323)
  • Finance (239)
  • Health (917)
  • Lifestyle (4,793)
  • Science (5,005)
  • Sports (366)
  • Tech (191)
  • Uncategorized (1)

Subscribe to Updates

Get the latest news from onlyfacts24.

Follow Us
  • Facebook
  • Instagram
  • TikTok

Subscribe to Updates

Get the latest news from ONlyfacts24.

News
  • Breaking News (6,096)
  • Business (340)
  • Career (5,066)
  • Climate (231)
  • Culture (5,021)
  • Education (5,323)
  • Finance (239)
  • Health (917)
  • Lifestyle (4,793)
  • Science (5,005)
  • Sports (366)
  • Tech (191)
  • Uncategorized (1)
Facebook Instagram TikTok
  • About us
  • Contact us
  • Disclaimer
  • Privacy Policy
  • Terms and services
© 2026 Designed by onlyfacts24

Type above and press Enter to search. Press Esc to cancel.