Skip to content

Synthetic Data + Model #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jack89roberts opened this issue May 15, 2025 · 0 comments
Open

Synthetic Data + Model #12

jack89roberts opened this issue May 15, 2025 · 0 comments
Assignees

Comments

@jack89roberts
Copy link
Collaborator

jack89roberts commented May 15, 2025

Implement experiments with a synthetic dataset + model for a baseline/quick way to vary a lot of parameters.

E.g. something like

Data

from random import random

n = 1000
imbalance = 0.1
labels = [1 if random() < imbalance else 0 for _ in range(n)]

Model

pos_err_rate = 0.1
neg_err_rate = 0.2

err_rate = pos_err_rate if label == 1 else neg_err_rate
pred = label if random() > err_rate else 1 - label

Need to figure out:

  • Predicting model scores not only labels.
  • Systematic bias in errors, e.g. model is correct only when a certain feature is present/absent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant