Skip to content

essans/gsd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gsd

Overview

A python package of utilities to help Get Sh*t Done

Installation

inclue in requirements.txt as

git+https://github.com/someuser/mypackage.git 
#or
git+https://github.com/someuser/[email protected]

then:

pip install -r requirements.txt

or pip install from github via:

pip install git+https://github.com/essans/gsd.git

#or

pip install git+https://github.com/essans/[email protected]

#or

pip install git+https://github.com/essans/gsd.git@<commit-hash>

or git clone https://github.com/essans/gsd.git to local and install via:

git tag # to see available tagged versions
git checkout v0.1.0 #for example

pip install .
#or
pip install git+file://Users<user_name>/code_dir/[email protected]

Usage

from gsd import *

or

from gsd import ProjectConfig, Timer, gsdIO, gsdUtil

Structure


Features



ProjectConfigs

class ProjectConfigs:

Class in the project.py module containing methods relating to project configurations allowing user to determine root directory; obtain various project configs/settings; and read/write to yaml.

Instantiate with:

project=ProjectConfigs()

Available methods:

project.root_dir() 
       .configs_from_yaml(dir='configs', filename='settings.yaml') 
       .print_global_configs() 
       .yaml_to_dict(filename) 
       .dict_to_yaml(data_dict, filename, append=False) 

eg:

project=ProjectConfigs()

root_dir = project.root_dir()
configs = project.configs_from_yaml() #assuming file is in default location based on cookiecutter template

project.print_global_configs() #shows defaults and shows dir content for any config label ending in '_dir'

eg

#settings.yaml

project_name: "project_name"

input_data:
  raw_dir: "data/raw/"
  processed_dir: "data/processed/"
  
outputs:
  outputs_dir: "outputs/"
  logs_dir: "outputs/logs/"

credentials:
  aws_credentials: "/USERS/<user_name>/.aws/credentials"

Then input data path can be set programatically via supplied project config info:

data_dir = project_dir / configs['input_data']['raw_dir']

yaml_to_dict(full_path)

eg for pulling out specific information from project and user configs based on above example yaml

project=ProjectConfigs()
configs = project.configs_from_yaml()

aws_credentials_dict = project.yaml_to_dict(configs['credentials']['aws_credentials'])

load_credentials(credentials_json, aws_profile='default)

Used to extract and load credentials

project=ProjectConfigs()
configs = project.configs_from_yaml()

credentials = project.load_credentials(configs['credentials'], aws_profile='default')


---
<br>



## gsdIO

gsdIO: Functions to handles files, directories, input/output operations.

Available functions:
  • create_dir(..): creates new directory
  • join_csv(..): join multiple csv files as they are read into memory
<br>

#### `create_dir(full_path)`
```py
gsdIO.create_dir('/USERS/<user_name>/dir')

# creates new dir if needed based on full_path.

join_csv(path, file_list=None, wildcard='', return_as='df')

#read and join all files in provided path.  Return a dataframe
gsdIO.join_csv('/USERS/<user_name>/dir', file_list=None, wildcard='', return_as='df')

#read and join all files in provided path containing "_prod.csv" string pattern. Return a dataframe
gsdIO.join_csv('/USERS/<user_name>/dir', file_list=None, wildcard='_prod.csv', return_as='df')

#read and join the provided list of files.  Return as a py dictionary.
gsdIO.join_csv('/USERS/<user_name>/dir', 
               file_list=['/USERS/<user_name>/dir/filename1.csv',
                          '/USERS/<user_name>/dir/filename2.csv'
                         ],
               wildcard=None, return_as='df')


gsdUtil

Various gsd functions:

  • gsdUtil.snake2camel(snake_str)
  • gsdUtil.col_to_str(df, colname, fill_na=False)
  • gsdUtil.col_cast(dataframe, colname, cast_as, fill_na=False)
  • gsdUtil.rename_col(df, current_colname, desired_colname)
  • gsdUtil.print_vars(str_pattern, match_criteria='contains', local_scope=None)
  • gsdUtil.pd_format(maxRows=50, maxCols=20, displayWidth=250)
  • gsdUtil.merge2(df1, df2, left_on, right_on, how, cast_keys_as, fill_NA_keys=False)

gsdUtil.snake2camel(snake_str)

gsdUtil.snake2camel("my_string")

# outputs "myString"

col_to_str(df, colname, fill_na=False)

Force formating (in place) of dataframe column to be of type string

col_to_str(data_df, '<col_name', fill_na=False) #set fill_na=True to handle and NAs first

col_cast(dataframe, [colnames], cast_as, fill_na=False)

Force format / recast columns (in place) to specified format with fine-grain control of NAs

col_cast(data_df,'<col_name>', 'int32', fill_na=-1)

col_cast(data_df,['<col_name>','ticker'], 'str', fill_na='NA')

col_cast(data_df,['<col_name>'], 'boolean', fill_na='pd.NA')

col_cast(data_df,'<col_name>', 'float64', fill_na='pd.NA')

col_cast(data_df,'<col_name>', 'str', fill_na='NA')

rename_col(df, current_colname, desired_colname)

Convenience function for inplace renaming of a single df column


print_vars(str_pattern, match_criteria='contains', local_scope=None)

Helper function for printing variables from memory.

eg:

print_vars('_prod') #print all variable names and their values when variable name contains '_prod'

print_vars('_prod', 'endswith') # where variable name ends with '_prod'

Current default implemention of function is for local_scope variables, but local scope can be set with local_scope input parameter



pd_format(maxRows=50, maxCols=20, maxColWidth=50, displayWidth=250)

Convenience function for adjust common pandas dataframe view settings


merge2(df1, df2, left_on, right_on, how, cast_keys_as, fill_NA_keys=False)

Convenience function to minimize steps needed to merge two dataframes with different key columns formts.

eg:

merge2(df1, df2, left_on='<col_name_1>', right_on='<col_name_2>',how='left', cast_keys_as='str', fill_NA_keys='N/A')

Timer

Class and methods for setting and tracking timers. eg. for ligh-weight logging / de-bugging

eg.

from gsd import Timer

timer=Timer()

timer.start(message='')

timer.elapsed(message='', periodicity='s') # 's' to display in for seconds, 'm' for minutes

timer.end(periodicity='s')

timer.get_timestamp(format="YYYY-MM-DD_HHMMSS") #alternative provide valid python strf datetime format (eg: "%Y-%m-%d_%H%M%S")

Notes:

git tag -a v0.1.2.post2 -m "new version"
git push origin v0.1.2.post2

About

A python package of utilities to help Get Sh*t Done

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages