Skip to content

Change references to "Synthetic Data" #1036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

ParadaCarleton
Copy link

The term "synthetic data" is ambiguous here, because in statistics, it usually refers to "fake data" created by fitting a generative model to a dataset, then drawing random data with the correct statistical properties from it. (Usually to preserve the privacy of participants while still providing a dataset for replication.)

ablaom and others added 13 commits October 27, 2022 10:18
Document changes and sundries.  No new release.
Fix TransformedTarget example in manual (no new release)
Some small documentations improvements. Not to trigger a new release.
Add new auto-generated Model Browser section to the manual. Not to trigger new release.
Update to the manual. No new release.
The term "synthetic data" is ambiguous here, because in statistics, it usually refers to "fake data" created by fitting a generative model to a dataset, then drawing random data with the correct statistical properties from it. (Usually to preserve the privacy of participants while still providing a dataset for replication.)
@ablaom
Copy link
Member

ablaom commented Aug 24, 2023

Thanks @ParadaCarleton for this.

Sure, within some Statistical community there might be some confusion. But looking around (see e.g. https://en.wikipedia.org/wiki/Synthetic_data) it seems a more general conception of "synthetic data" is pretty common.

How about we just add a clarifying sentence at the top of the "Generating Synthetic Data" section; something like

Here *synthetic data* means artificially generated data, with no reference to a "real world" data set. Not to be confused "fake data" obtained by resampling from a distribution fit to some  actual real data.

?

I think "example data" is too broad a term. This could be anything, real or imagined.

@ParadaCarleton
Copy link
Author

Mostly I suggested this because when I tried to look up "Synthetic Data Generation Julia" or "Synthetic Population Julia" I kept getting this as a result 😅

@ablaom
Copy link
Member

ablaom commented Sep 11, 2023

@ParadaCarleton Are you not happy with the smaller clarification I suggested? Your latest commit does not reflect it.

@codecov-commenter
Copy link

Codecov Report

Merging #1036 (75c5346) into dev (dd852d4) will not change coverage.
Report is 28 commits behind head on dev.
The diff coverage is n/a.

@@           Coverage Diff           @@
##              dev    #1036   +/-   ##
=======================================
  Coverage   60.97%   60.97%           
=======================================
  Files           2        2           
  Lines          41       41           
=======================================
  Hits           25       25           
  Misses         16       16           

see 1 file with indirect coverage changes

ablaom added a commit that referenced this pull request Sep 22, 2023
@ablaom
Copy link
Member

ablaom commented Sep 22, 2023

Closed as rendered redundant by 2178c10

@ablaom ablaom closed this Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants