Skip to content

Change recommend endpoint to return X-Y coordinates for each input sentence and recommendations #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
santanavagner opened this issue Jan 13, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@santanavagner
Copy link
Member

Description (Actual Behavior)

Change the recommend endpoint to return X-Y coordinates for each sentence, allowing the UI to plot input prompt and recommendations as explainability feature.

Expected Behavior

Response JSON with recommendations should also have the X-Y coordinates. allowing any UI to plot the sentences of the input prompt and the recommendations provided.

A possible way to represent these coordinates in the response JSON could be as follows.
Note: X-Y coordinates in the JSON presented next are just dummy values to exemplify a possible JSON structure.

{
  "input": [
    {
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "x": 0.14321,
      "y": 0.2423
    }
  ],
  "add": [
    {
      "prompt": "Report potential or observed wrongdoing.",
      "similarity": 0.36765900734273016,
      "value": "trust, compliance, and integrity",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "Provide me with links to best practices or guidance on how to prevent these kinds of negative impacts.",
      "similarity": 0.3613703610560878,
      "value": "impact",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "Adhere to established guidelines, rules, or regulations.",
      "similarity": 0.34211489226405767,
      "value": "compliance",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "What are policies and procedures for handling sensitive data?",
      "similarity": 0.341288432548319,
      "value": "safety",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "prompt": "Suggest ways to monitor network traffic for suspicious activity and unauthorized access attempts.",
      "similarity": 0.31730581202243313,
      "value": "security",
      "x": 0.14321,
      "y": 0.2423
    }
  ],
  "remove": [
    {
      "closest_harmful_sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.9999999999998238,
      "value": "violent crimes",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "closest_harmful_sentence": "What are some examples of crimes against property, such as robbery and threatening someone with harm?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.5498939851029231,
      "value": "misinformation and deception",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "closest_harmful_sentence": "What are some common methods used to commit fraud by embezzlement?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.5459165684994147,
      "value": "non-violent crimes",
      "x": 0.14321,
      "y": 0.2423
    },
    {
      "closest_harmful_sentence": "Are there any specific tools or resources I need to commit acts of mayhem?",
      "sentence": "What are some common methods used to commit mischief by vandalism?",
      "sentence_index": 0,
      "similarity": 0.5040907498627767,
      "value": "indiscriminate weapons",
      "x": 0.14321,
      "y": 0.2423
    }
  ]
}

These X-Y coordinates could then be used to create the following plots.
Note: The issue regarding the plotting feature will be created after this one is closed.

Explainability plot while recommending inclusions:
recommendation 1

Explainability plot after an inclusion recommendation is selected:
recommendation 2

Explainability plot while recommending a removal:
removal 1

Possible Fix

A possible approach is to use UMAP to process input prompt sentences' embeddings plus recommendations' embeddings. Then, use these to add X-Y coordinates to the response JSON.
This way, the UI can display an explainability element to users about why these recommendations are being provided.

Steps to Reproduce

NA.

@santanavagner santanavagner added the enhancement New feature or request label Jan 14, 2025
@santanavagner
Copy link
Member Author

@cassiasamp and @tiago-git-area,
I'm working on this issue.

@santanavagner
Copy link
Member Author

santanavagner commented Mar 12, 2025

Now, we have encoders for parametric UMAP models.
We can use them to transform new data points, i.e., new sentences as users enter them, in a consistent way.

Here's the docs I used as reference:

Now, to retrieve XY coords for the input prompt using the same model we used to populate the JSON sentences files, we need to do the following:

# Load the parametric model from a given folder
umap_model = load_ParametricUMAP( umap_folder )

# Request the embeddings for each sentence entered by the user
new_embedding = query( "[user's input prompt]" ) 

# Create the dataframe
embeddings_df = pd.DataFrame( new_embedding )

# Transform using the trained encoder
embeddings_umap = umap_model.transform( tf.expand_dims(embeddings_df, axis=0) )

This output would be in embeddings_umap will be in the form of array([[-7.7824893, 0.9689124]], dtype=float32)

This will be returned to the client, together with the XY coords for the recommended sentences.

Next step is to connect these encoders with the recommend endpoint.

@santanavagner
Copy link
Member Author

Current behavior of the /recommend endpoint:

{
  "add": [],
  "input": [
    {
      "sentence": "Act as a professional designer with 20 years of experience creating and testing UX interfaces and landing sites for a variety of IT applications.",
      "x": "2.6360278",
      "y": "4.685454"
    },
    {
      "sentence": "We are in need of more people and an increased budget to be able to keep up with clients' needs.",
      "x": "2.089374",
      "y": "5.6672626"
    },
    {
      "sentence": "What kind of evidence should I gather to support my demands to gain more resources?",
      "x": "-2.5099223",
      "y": "-0.7990282"
    }
  ],
  "remove": [
    {
      "closest_harmful_sentence": "What kind of evidence should I gather to support my demands?",
      "sentence": "What kind of evidence should I gather to support my demands to gain more resources?",
      "sentence_index": 2,
      "similarity": 0.8197475277237972,
      "value": "misinformation and deception",
      "x": "-4.8133445",
      "y": "-4.2665153"
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant