Skip to content

messede-degod/sstable-migrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sstable-migrator

Building

  • install and use java 8, check with java -version
  • compile - mvn compile
  • run - MAVEN_OPTS="-Xmx7114M" mvn exec:java -DargLine="-Xms6144m -Xmx7144m" to convert input/* to sstables in /output

Setup Cassandra

  • Start Container - sudo docker run -v ./output/:/ferret/dnsdata -d --name cassandra --hostname cassandra --network cassandra cassandra (Allow upto a minute for bootup)

  • Start a cqlsh shell - sudo docker exec -it cassandra cqlsh

  • Create Keyspace

    CREATE KEYSPACE ferret WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
    
  • Create RDNS Table

    CREATE TABLE ferret.rdnsv4 (
        ip8 INET,
        ip16 INET,
        ip24 INET,
        ipAddress INET,
        p1 VARCHAR,
        p2 VARCHAR,
        p3 VARCHAR,
        p4 VARCHAR,
        p5 VARCHAR,
        p6 VARCHAR,
        p7 VARCHAR,
        country VARCHAR,
        city VARCHAR,
        asn INT,
        as_name VARCHAR,
        source VARCHAR,
        sourceRecordType VARCHAR,
        firstSeen timestamp,
        lastSeen timestamp,
        updatedAt timestamp,
        PRIMARY KEY (ip8, ip16, ip24, ipAddress, p1, p2, p3, p4, p5, p6, p7)
    );
    
  • Create SubDomains table -

    CREATE TABLE ferret.subdomains (
        p1 VARCHAR, 
        p2 VARCHAR, 
        p3 VARCHAR, 
        p4 VARCHAR, 
        p5 VARCHAR, 
        p6 VARCHAR, 
        p7 VARCHAR, 
        source VARCHAR,
        sourceRecordType VARCHAR,
        firstSeen timestamp,
        lastSeen timestamp,
        updatedAt timestamp,
        PRIMARY KEY ((p1, p2, p3), p4, p5, p6, p7)
    );
    
  • Create CNAME table -

    CREATE TABLE ferret.cnames (
        target VARCHAR, 
        apexDomain VARCHAR, 
        domain VARCHAR,
        source VARCHAR,
        firstSeen timestamp,
        lastSeen timestamp,
        updatedAt timestamp,
        PRIMARY KEY (target, apexDomain, domain)
    );
    
  • Move Data - sudo docker container exec -it cassandra sstableloader -d 172.18.0.2 /ferret/dnsdata/

Possible Improvements

  • use java FileChannel to read files (possible performance improvements) (no improvements observed)
  • use fastjson parser
  • use multithreaded writes to CQLSSTableWriter (https://issues.apache.org/jira/browse/CASSANDRA-7463) (bad idea, write performance is far better when keys are in order, writes with out of order keys take up a lot of cpu, but yield no improvement in conversion time)

TLD Source

About

Generate SStables From CSV. The Data Conversion Workhorse behind https://ip.thc.org

Topics

Resources

License

Stars

Watchers

Forks