Skip to content

Improve can_infer_option and can_infer_text #1175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Ezra-Yu
Copy link
Contributor

@Ezra-Yu Ezra-Yu commented Jul 23, 2025

Restrict the use of can_infer to allow more responses to enter the model matching phase.

Some of the answers are too complex, causing "can_infer" to be mismatched.

results before fixed:

dev model:
xhs_api-temperature0.7-32k_MMMU_DEV_VAL_openai_result.xlsx

seedvl1.5:
xhs-seedvl-1.5_old_MMMU_DEV_VAL_openai_result.xlsx

results after fixed:

dev model:
fixed_xhs_api-temperature0.7-32k_MMMU_DEV_VAL_openai_result.xlsx

seedvl1.5:

fixed_xhs-seedvl-1.5_MMMU_DEV_VAL_openai_result.xlsx

score

seedvl from 78.56 -> 79.67; (report 77+ in paper)

Some case (match logs are in xlsx)

I logged which function name can_infer to use.

"Match Log: 【PREDIDCTION】: To determine the missing gain for Company B, we use the formula for net income (or loss):  

\[ \text{Net Income (Loss)} = \text{Revenues} - \text{Expenses} + \text{Gains} - \text{Losses} \]  


For Company B:  
- Revenues = \( \$1,480,500 \)  
- Expenses = \( \$1,518,300 \)  
- Gains = \( ? \) (let this be \( G \))  
- Losses = \( 0 \)  
- Net Income (Loss) = \( \$39,690 \)  


Substitute the known values into the formula:  
\[ 39,690 = 1,480,500 - 1,518,300 + G - 0 \]  


First, calculate \( 1,480,500 - 1,518,300 \):  
\[ 1,480,500 - 1,518,300 = -37,800 \]  


Now solve for \( G \):  
\[ 39,690 = -37,800 + G \]  
\[ G = 39,690 + 37,800 \]  
\[ G = 77,490 \]  


Thus, the missing gain for Company B is \(\boxed{77490}\).
【CHOICES】: {'A': '$63,020', 'B': '$58,410', 'C': '$71,320', 'D': '$77,490'}
 【RET】: B
【LOG】: can_infer_option.
"Match Log: 【PREDIDCTION】: To determine the work done during the cycle \( ABCA \) in a \( PV \)-diagram, we use the fact that **the work done by a gas in a cyclic process is equal to the area enclosed by the cycle** on the \( PV \)-diagram.  


### Step 1: Identify the shape of the cycle  
The cycle \( ABCA \) forms a **triangle** with vertices at:  
- \( A(P, V) \)  
- \( B(2P, 2V) \)  
- \( C(P, 2V) \)  


### Step 2: Calculate the area of the triangular cycle  
The area of a triangle is given by \( \text{Area} = \frac{1}{2} \times \text{base} \times \text{height} \).  

- **Base**: The horizontal side \( AC \) (constant pressure \( P \)).  
  The length of \( AC \) is the change in volume: \( \Delta V = 2V - V = V \).  

- **Height**: The vertical side \( CB \) (constant volume \( 2V \)).  
  The length of \( CB \) is the change in pressure: \( \Delta P = 2P - P = P \).  


### Step 3: Compute the area (work done)  
Substitute the base and height into the area formula:  
\[
\text{Area} = \frac{1}{2} \times \text{base} \times \text{height} = \frac{1}{2} \times V \times P = \frac{PV}{2}
\]  


Thus, the work done during the cycle \( ABCA \) is \( \frac{PV}{2} \), which corresponds to **Option B**.
【CHOICES】: {'A': 'pv', 'B': 'pv/2', 'C': '2pv', 'D': 'pv/3'}
 【RET】: A
【LOG】: can_infer_text. "
"Match Log: 【PREDIDCTION】: To determine the correct answer, we analyze the context of ukiyo - e (Japanese woodblock print) art, which is the style of the work in the image:  

- **Option A**: Ukiyo - e prints were not created for “private meditation” in household niches. They were popular, accessible art, not specialized for private spiritual or meditative use.  
- **Option B**: Ukiyo - e was not about “special commissions for wealthy patrons”. These works were mass - produced, not tailored to individual wealthy clients.  
- **Option C**: Ukiyo - e art (like works by Hokusai, Hiroshige, etc.) was designed for **mass production** to be affordable and accessible to a broad audience. Prints were made in large quantities so that people from various social classes could purchase them. This matches the purpose of ukiyo - e.  
- **Option D**: Ukiyo - e was not “personal artistic experimentation with new media”. It was a commercial art form, focused on widespread distribution rather than individual artistic experimentation for its own sake.  


Thus, the correct answer is **C**.
【CHOICES】: {'A': 'hanging in a household niche as an object of private meditation', 'B': 'fulfilling a special commission for a wealthy patron', 'C': 'mass production for purchase by a wide audience', 'D': 'personal artistic experimentation with new media'}
 【RET】: D
【LOG】: can_infer_text. "

@Ezra-Yu Ezra-Yu changed the title fix can infer options and can infer text Improve can_infer_option and can_infer_text Jul 23, 2025
@FangXinyu-0913 FangXinyu-0913 self-assigned this Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants