datafuselabs/datafuse: An elastic and scalable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy [https://github.com/datafuselabs/datafuse/] - 2021-08-16 01:32:14 - public:aguynamedryan cloud, data, rust - 3 | id:747701 -
apache/arrow-datafusion: Apache Arrow DataFusion and Ballista query engines [https://github.com/apache/arrow-datafusion] - 2021-08-16 01:31:11 - public:aguynamedryan data, etl, pipeline, rust - 4 | id:747700 -
q - Text as Data [https://harelba.github.io/q/] - 2021-06-14 15:33:23 - public:aguynamedryan cli, csv, data, sql, tools, try - 6 | id:684375 -
data-cleaning/validate: Professional data validation for the R environment [https://github.com/data-cleaning/validate] - 2021-05-23 03:44:53 - public:aguynamedryan data, package, R, try - 4 | id:684109 -
ropensci/skimr: A frictionless, pipeable approach to dealing with summary statistics [https://github.com/ropensci/skimr] - 2021-05-23 03:44:32 - public:aguynamedryan data, package, R - 3 | id:684108 -
choonghyunryu/dlookr: Tools for Data Diagnosis, Exploration, Transformation [https://github.com/choonghyunryu/dlookr] - 2021-05-23 03:43:59 - public:aguynamedryan data, package, R - 3 | id:684107 -
data-cleaning/dcmodify: Modify data records using separately defined modification rules [https://github.com/data-cleaning/dcmodify] - 2021-05-23 03:43:08 - public:aguynamedryan data, package, R - 3 | id:684106 -
data-cleaning/deductive: Methods for deductive data correction and imputation [https://github.com/data-cleaning/deductive] - 2021-05-23 03:42:42 - public:aguynamedryan data, package, R - 3 | id:684105 -
data-cleaning/errorlocate: Find and replace erroneous fields in data using validation rules [https://github.com/data-cleaning/errorlocate] - 2021-05-23 03:41:43 - public:aguynamedryan data, package, R - 3 | id:684104 -
Introducing Amazon S3 Object Lambda – Use Your Code to Process Data as It Is Being Retrieved from S3 | AWS News Blog [https://aws.amazon.com/blogs/aws/introducing-amazon-s3-object-lambda-use-your-code-to-process-data-as-it-is-being-retrieved-from-s3/] - 2021-04-16 17:07:32 - public:aguynamedryan aws, data, s3 - 3 | id:683416 -
Using S3 Object Lambdas to Generate and Transform on the fly | by Eoin Shanaghy | Mar, 2021 | Medium [https://eoins.medium.com/using-s3-object-lambdas-to-generate-and-transform-on-the-fly-874b0f27fb84] - 2021-04-01 02:42:07 - public:aguynamedryan data, serverless - 2 | id:678582 -
A Data Pipeline Is a Materialized View | Hacker News [https://news.ycombinator.com/item?id=26217911&utm_term=comment] - 2021-03-01 03:40:03 - public:aguynamedryan data, pipeline - 2 | id:574069 -
Estuary Flow (Preview) — Estuary Flow (Preview) documentation [https://estuary.readthedocs.io/en/latest/README.html] - 2021-03-01 03:39:33 - public:aguynamedryan data, pipeline, python - 3 | id:574068 -
Building Rich Terminal Dashboards | Hacker News [https://news.ycombinator.com/item?id=26149488&utm_term=comment] - 2021-03-01 03:30:59 - public:aguynamedryan cli, data - 2 | id:574062 -
Show HN: I wrote a book about using data science to solve “everyday” problems | Hacker News [https://news.ycombinator.com/item?id=26253281&utm_term=comment] - 2021-03-01 03:28:57 - public:aguynamedryan data, programming - 2 | id:574061 -
Hierarchical Structures in PostgreSQL [https://hoverbear.org/blog/postgresql-hierarchical-structures/] - 2021-01-20 22:19:40 - public:aguynamedryan data, pg - 2 | id:488445 -
Datasette: An open source multi-tool for exploring and publishing data [https://datasette.io/] - 2020-12-23 19:21:20 - public:aguynamedryan data, programming, try - 3 | id:485248 -
Using PostgreSQL and SQL to Randomly Sample Data [https://info.crunchydata.com/blog/randomly-sampling-data-using-sql-and-postgresql] - 2020-10-28 16:22:58 - public:aguynamedryan data, pg, stats, try - 4 | id:436596 -
Wikidata:SPARQL query service/A gentle introduction to the Wikidata Query Service - Wikidata [https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service#A_gentle_introduction_to_the_Wikidata_Query_Service] - 2020-10-26 16:57:54 - public:aguynamedryan api, data, information model, ontology - 4 | id:426300 -
Simple Anomaly Detection Using Plain SQL | Haki Benita [https://hakibenita.com/sql-anomaly-detection] - 2020-09-25 17:45:53 - public:aguynamedryan data, rdbms - 2 | id:388117 -
DuckDB - An embeddable SQL OLAP database management system [https://duckdb.org/] - 2020-08-21 15:29:03 - public:aguynamedryan data, try - 2 | id:366387 -
mlin/GenomicSQLite: Genomics Extension for SQLite [https://github.com/mlin/GenomicSQLite] - 2020-08-21 15:28:40 - public:aguynamedryan data, sql, try - 3 | id:366386 -
Running Awk in parallel to process 256M records [https://ketancmaheshwari.github.io/posts/2020/05/24/SMC18-Data-Challenge-4.html] - 2020-06-05 17:22:49 - public:aguynamedryan awk, data, etl - 3 | id:321757 -
TXR Language [https://www.nongnu.org/txr/] - 2020-04-19 18:46:37 - public:aguynamedryan data, etl, try - 3 | id:309584 -
What's new in Kiba ETL v3 (visually explained) [https://thibautbarrere.com/2020/03/05/new-in-kiba-etl-v3] - 2020-03-12 23:01:49 - public:aguynamedryan data, etl, kiba, library, ruby - 5 | id:290722 -
thewhitetulip/awk-anti-textbook: learn awk by example [https://github.com/thewhitetulip/awk-anti-textbook] - 2020-03-09 18:10:46 - public:aguynamedryan awk, cli, data - 3 | id:285274 -
Preparing your Postgres data for scale-out - DEV Community [https://dev.to/heroku/preparing-your-postgres-data-for-scale-out-km] - 2020-02-26 16:23:22 - public:aguynamedryan data, pg, scale, shard - 4 | id:283237 -
In Loving Memory of Strictly-Typed Schemas - ssense-tech - Medium [https://medium.com/ssense-tech/in-loving-memory-of-strictly-typed-schemas-89ae6e186202] - 2020-02-21 19:38:53 - public:aguynamedryan data, db, nosql, thinkpiece - 4 | id:283175 -
Command Line Tricks For Data Scientists [https://kadekillary.work/post/cli-4-ds/] - 2019-10-04 21:42:44 - public:aguynamedryan cli, data, tools - 3 | id:277799 -