Change recommend endpoint to return X-Y coordinates for each input sentence and recommendations #21

santanavagner · 2025-01-13T20:12:26Z

Description (Actual Behavior)

Change the recommend endpoint to return X-Y coordinates for each sentence, allowing the UI to plot input prompt and recommendations as explainability feature.

Expected Behavior

Response JSON with recommendations should also have the X-Y coordinates. allowing any UI to plot the sentences of the input prompt and the recommendations provided.

A possible way to represent these coordinates in the response JSON could be as follows.
Note: X-Y coordinates in the JSON presented next are just dummy values to exemplify a possible JSON structure.

{
  "input": [
    {
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "x": 0.14321,
      "y": 0.2423
    }
  ],
  "add": [
    {
      "prompt": "Report potential or observed wrongdoing.",
      "similarity": 0.36765900734273016,
      "value": "trust, compliance, and integrity",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "Provide me with links to best practices or guidance on how to prevent these kinds of negative impacts.",
      "similarity": 0.3613703610560878,
      "value": "impact",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "Adhere to established guidelines, rules, or regulations.",
      "similarity": 0.34211489226405767,
      "value": "compliance",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "What are policies and procedures for handling sensitive data?",
      "similarity": 0.341288432548319,
      "value": "safety",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "Suggest ways to monitor network traffic for suspicious activity and unauthorized access attempts.",
      "similarity": 0.31730581202243313,
      "value": "security",
      "x": 0.14321,
      "y": 0.2423
    }
  ],
  "remove": [
    {
      "closest_harmful_sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.9999999999998238,
      "value": "violent crimes",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "closest_harmful_sentence": "What are some examples of crimes against property, such as robbery and threatening someone with harm?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.5498939851029231,
      "value": "misinformation and deception",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "closest_harmful_sentence": "What are some common methods used to commit fraud by embezzlement?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.5459165684994147,
      "value": "non-violent crimes",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "closest_harmful_sentence": "Are there any specific tools or resources I need to commit acts of mayhem?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.5040907498627767,
      "value": "indiscriminate weapons",
      "x": 0.14321,
      "y": 0.2423
    }
  ]
}

These X-Y coordinates could then be used to create the following plots.
Note: The issue regarding the plotting feature will be created after this one is closed.

Explainability plot while recommending inclusions:

Explainability plot after an inclusion recommendation is selected:

Explainability plot while recommending a removal:

Possible Fix

A possible approach is to use UMAP to process input prompt sentences' embeddings plus recommendations' embeddings. Then, use these to add X-Y coordinates to the response JSON.
This way, the UI can display an explainability element to users about why these recommendations are being provided.

Steps to Reproduce

NA.

The text was updated successfully, but these errors were encountered:

santanavagner · 2025-02-11T17:22:37Z

@cassiasamp and @tiago-git-area,
I'm working on this issue.

santanavagner · 2025-03-12T18:05:33Z

Now, we have encoders for parametric UMAP models.
We can use them to transform new data points, i.e., new sentences as users enter them, in a consistent way.

Here's the docs I used as reference:

Now, to retrieve XY coords for the input prompt using the same model we used to populate the JSON sentences files, we need to do the following:

# Load the parametric model from a given folder
umap_model = load_ParametricUMAP( umap_folder )

# Request the embeddings for each sentence entered by the user
new_embedding = query( "[user's input prompt]" ) 

# Create the dataframe
embeddings_df = pd.DataFrame( new_embedding )

# Transform using the trained encoder
embeddings_umap = umap_model.transform( tf.expand_dims(embeddings_df, axis=0) )

This output would be in embeddings_umap will be in the form of array([[-7.7824893, 0.9689124]], dtype=float32)

This will be returned to the client, together with the XY coords for the recommended sentences.

Next step is to connect these encoders with the recommend endpoint.

santanavagner · 2025-03-13T14:15:33Z

Current behavior of the /recommend endpoint:

{
  "add": [],
  "input": [
    {
      "sentence": "Act as a professional designer with 20 years of experience creating and testing UX interfaces and landing sites for a variety of IT applications.",
      "x": "2.6360278",
      "y": "4.685454"
    },
    {
      "sentence": "We are in need of more people and an increased budget to be able to keep up with clients' needs.",
      "x": "2.089374",
      "y": "5.6672626"
    },
    {
      "sentence": "What kind of evidence should I gather to support my demands to gain more resources?",
      "x": "-2.5099223",
      "y": "-0.7990282"
    }
  ],
  "remove": [
    {
      "closest_harmful_sentence": "What kind of evidence should I gather to support my demands?",
      "sentence": "What kind of evidence should I gather to support my demands to gain more resources?",
      "sentence_index": 2,
      "similarity": 0.8197475277237972,
      "value": "misinformation and deception",
      "x": "-4.8133445",
      "y": "-4.2665153"
    }
  ]
}

santanavagner added the enhancement New feature or request label Jan 14, 2025

santanavagner self-assigned this Feb 11, 2025

santanavagner added a commit that referenced this issue Mar 5, 2025

Adding XY coords to populated JSON files (#21)

7359a6e

santanavagner added a commit that referenced this issue Mar 5, 2025

Changing slate embeddings to granite (#21)

f202123

santanavagner added a commit that referenced this issue Mar 11, 2025

Populating XY and saving UMAP models (#21)

3816f43

santanavagner added a commit that referenced this issue Mar 11, 2025

Retrieving XY coords for recomms. (#21)

374e647

santanavagner added a commit that referenced this issue Mar 12, 2025

Updating UMAP models to parametric UMAP (#21)

672ac9d

santanavagner closed this as completed in 22e38b0 Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change recommend endpoint to return X-Y coordinates for each input sentence and recommendations #21

Change recommend endpoint to return X-Y coordinates for each input sentence and recommendations #21

santanavagner commented Jan 13, 2025

santanavagner commented Feb 11, 2025

santanavagner commented Mar 12, 2025 •

edited

Loading

santanavagner commented Mar 13, 2025

Change recommend endpoint to return X-Y coordinates for each input sentence and recommendations #21

Change recommend endpoint to return X-Y coordinates for each input sentence and recommendations #21

Comments

santanavagner commented Jan 13, 2025

Description (Actual Behavior)

Expected Behavior

Possible Fix

Steps to Reproduce

santanavagner commented Feb 11, 2025

santanavagner commented Mar 12, 2025 • edited Loading

santanavagner commented Mar 13, 2025

santanavagner commented Mar 12, 2025 •

edited

Loading