Integrate the SimilarWeb top 50+ Adult sites into the filter #4

pdehaan · 2016-09-15T20:14:53Z

Ref #1

Also https://www.similarweb.com/top-websites/category/adult would be interesting. I don't have an API key [yet], but it'd be interesting to see how the top 50 list on that page compares to the Alexa results.

pdehaan · 2016-09-15T20:17:07Z

I did a manual scan and search, and did find at least 1-2 sites in the SimilarWeb top 10 that weren't in the nofap.txt list.

After a very brief comparison, it looks like the SimilarWeb top 50 list is different than the Alexa list (although not to say that both sites don't have the same general results, just in a different order).

pdehaan · 2016-09-15T20:52:07Z

scraper:

const url = require('url');

const fetch = require('node-fetch');
const cheerio = require('cheerio');

const CATEGORY_BASE_URL = 'https://www.similarweb.com/top-websites/category';

function getCategory(category='news-and-media') {
  const CATEGORY_URL = `${CATEGORY_BASE_URL}/${category}`;
  return fetch(CATEGORY_URL)
    .then((res) => res.text())
    .then((html) => cheerio.load(html))
    .then(($) => {
      const urls = $('td.topWebsitesGrid-cellWebsite a.linkout[itemprop="url"]')
        .map((idx, el) => $(el).attr('href'))
        .get();
      return urls;
    })
    .then((urls) => urls.map((uri) => url.parse(uri).hostname))
    // .then((urls) => urls.sort())
}

getCategory('Adult')
  .then((urls) => console.log(urls))
  .catch((err) => console.error(err));

Output:

[
  "xvideos.com",
  "pornhub.com",
  "xhamster.com",
  "xnxx.com",
  "redtube.com",
  "chaturbate.com",
  "youporn.com",
  "e-hentai.org",
  "dmm.co.jp",
  "bongacams.com",
  "spankbang.com",
  "livejasmin.com",
  "beeg.com",
  "imagefap.com",
  "tube8.com",
  "4chan.org",
  "backpage.com",
  "youjizz.com",
  "nhentai.net",
  "txxx.com",
  "motherless.com",
  "vporn.com",
  "reallifecam.com",
  "porn.com",
  "tnaflix.com",
  "abril.com.br",
  "cam4.com",
  "myfreecams.com",
  "ab4hr.com",
  "nudevista.com",
  "gotporn.com",
  "wellhello.com",
  "drtuber.com",
  "hclips.com",
  "dropbooks.tv",
  "adultfriendfinder.com",
  "pornmd.com",
  "perfectgirls.net",
  "xtube.com",
  "sunporno.com",
  "fetlife.com",
  "serviporno.com",
  "sankakucomplex.com",
  "brazzers.com",
  "upornia.com",
  "eroprofile.com",
  "porntube.com",
  "imagetwist.com",
  "iwank.tv",
  "planetromeo.com"
]

UPDATE: SimilarWeb Top 50 results not in nofap.txt:

[
  "4chan.org",
  "ab4hr.com",
  "abril.com.br",
  "backpage.com",
  "beeg.com",
  "bongacams.com",
  "dmm.co.jp",
  "dropbooks.tv",
  "drtuber.com",
  "e-hentai.org",
  "eroprofile.com",
  "fetlife.com",
  "gotporn.com",
  "hclips.com",
  "imagefap.com",
  "imagetwist.com",
  "iwank.tv",
  "livejasmin.com",
  "motherless.com",
  "myfreecams.com",
  "nhentai.net",
  "nudevista.com",
  "planetromeo.com",
  "pornmd.com",
  "porntube.com",
  "reallifecam.com",
  "sankakucomplex.com",
  "serviporno.com",
  "txxx.com",
  "upornia.com",
  "wellhello.com",
  "xtube.com"
]

const readFile = require('fs').readFileSync;
const Adult = require('./Adult.json');

const nofap = readFile('nofap.txt', 'utf-8').trim().split('\n');

const cacheMiss = Adult.filter((url) => nofap.indexOf(url) === -1);

console.log(JSON.stringify(cacheMiss.sort(), null, 2));

pdehaan · 2016-09-15T21:52:44Z

I can do this, but I'm not sure of the best approach.
Should we have one master "nofap.txt" list, or create separate lists which get merged into a single list somehow?

For example, should we have "alexa.txt", "similarweb.txt", and "nofap.txt", so we know where the data came from? Although that means we'd also potentially have the same entries in all 3 lists.

spirillen mentioned this issue Feb 20, 2025

serviporno.com mypdns/matrix#108505

Open

spirillen mentioned this issue Mar 25, 2025

sunporno.com mypdns/matrix#122308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate the SimilarWeb top 50+ Adult sites into the filter #4

Integrate the SimilarWeb top 50+ Adult sites into the filter #4

pdehaan commented Sep 15, 2016

pdehaan commented Sep 15, 2016 •

edited

Loading

pdehaan commented Sep 15, 2016 •

edited

Loading

pdehaan commented Sep 15, 2016

Integrate the SimilarWeb top 50+ Adult sites into the filter #4

Integrate the SimilarWeb top 50+ Adult sites into the filter #4

Comments

pdehaan commented Sep 15, 2016

pdehaan commented Sep 15, 2016 • edited Loading

pdehaan commented Sep 15, 2016 • edited Loading

scraper:

Output:

pdehaan commented Sep 15, 2016

pdehaan commented Sep 15, 2016 •

edited

Loading

pdehaan commented Sep 15, 2016 •

edited

Loading