Current Configuration
Judge/Synthesis Model: Claude 3 Haiku
Models Being Queried:
GPT-4 (OpenAI) Claude 3 Sonnet
Max Iterations: 3

Final response

Confidence Level

0%

Synthesized Response

Analysis

Dissenting Views

Model Responses

Frequently Asked Questions

The LLM Consortium works by sending your prompt to multiple AI models (currently GPT-4 and Claude 3 Sonnet) in parallel. Each model processes your request independently, and then a judge model (Claude 3 Haiku) analyzes and synthesizes their responses to provide the best possible answer.

If the judge model's confidence in the synthesized answer is below 0.8, the system will automatically initiate another iteration. In each iteration, the models receive refined prompts based on previous responses. This process continues until either the confidence threshold is met or the maximum of 3 iterations is reached.

The judge model (Claude 3 Haiku) analyzes responses based on multiple criteria:
  • Completeness and accuracy of information
  • Consistency between model responses
  • Presence of any dissenting views
  • Areas that might need refinement
This analysis results in a confidence score, and the system will iterate if needed to improve the response quality.

The synthesis process involves:
  • Combining the best insights from all model responses
  • Identifying and resolving any contradictions
  • Highlighting important dissenting views
  • Providing a confidence score for the final answer
  • Suggesting areas that might need further exploration

Each model (GPT-4 and Claude 3 Sonnet) provides a confidence score (0-1) indicating how certain it is about its own response. This score is based on:
  • The model's understanding of the prompt
  • The completeness of its response
  • The reliability of information provided
  • Any ambiguities or uncertainties in the response

The final confidence score is determined by the judge model (Claude 3 Haiku) based on multiple factors:
  • Agreement between model responses
  • Completeness of the synthesized answer
  • Quality of supporting evidence
  • Resolution of any contradictions
  • Overall coherence of the final response
This score represents the judge's confidence in the quality and reliability of the synthesized response.

Each model (GPT-4 and Claude 3 Sonnet) receives this structured prompt:
1. Begin by carefully considering the specific instructions provided.

2. Write your thought process inside <thought_process> tags, including:
   - Key aspects relevant to the query
   - Potential challenges or limitations
   - How response instructions affect the approach
   - Different angles and step-by-step logic

3. Provide confidence level (0-1) in <confidence> tags

4. Present final answer in <answer> tags

This ensures consistent, well-structured responses from all models.

The judge (Claude 3 Haiku) receives this analysis prompt:
The judge analyzes multiple AI responses and provides:
1. A synthesized answer combining best insights
2. Confidence in synthesis (0-1)
3. Analysis of responses
4. Notable dissenting views
5. Whether further iteration is needed

The response is structured with XML tags:
<synthesis>[Combined response]</synthesis>
<confidence>[0-1 score]</confidence>
<analysis>[Response analysis]</analysis>
<dissent>[Dissenting views]</dissent>
<needs_iteration>true/false</needs_iteration>
<refinement_areas>[Areas needing exploration]</refinement_areas>

The final confidence score can be higher or lower than individual model scores because:
  • Strong agreement between models can increase confidence
  • Complementary insights from different models can create a more complete answer
  • The judge model validates and fact-checks the combined response
  • Contradictions between models might lower the final confidence
  • The synthesis process may resolve uncertainties present in individual responses