Data Science
Databases
Data Loading
- GitHub - medialab/xan: The CSV magician -
xan
is a command line tool that can be used to process CSV files directly from the shell. It has been written in Rust to be as fast as possible, use as little memory as possible, and can easily handle very large CSV files (Gigabytes)
Deduplication
- GitHub - Cynosureprime/rlite: A lightweight alternative to rling
- GitHub - Cynosureprime/rling: RLI Next Gen (Rling), a faster multi-threaded, feature rich alternative to rli found in hashcat utilities.
- hashcat_utils
Pipelines
Tools
- Microsoft VS Code IPE Documentation - combines the rich editor experience of Visual Studio Code with the interactive programming model of Jupyter Notebooks to make VS Code the tool of choice for data scientists
- ricklamers/gridstudio: Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.
- The Visual Python Debugger for Jupyter Notebooks You’ve Always Wanted | by David Taieb | Center for Open Source Data and AI Technologies | Medium
Charting
- Data Gif Maker with example: Louise E. Sinks - TidyTuesday Week 28: Global Surface Temperature, h/t Sharon Machlis (@smach@masto.machlis.com) - Sharon's Mastodon