5 private links
Each of your open tabs is mapped to a folder.
Building the data warehouse in Python (07/2014 - present)
In recent two years, I'm forced on offline data processing, the big part is to build a
data warehouse, and aims to provide reports for millions of users. At the same time, I
built serveral useful tools, and put them all in https://github.com/luiti organization.
Hadoop is our fundamental infrastructure, includes HDFS, YARN, and Hive. On the top of
Hadoop, we use luigi, hue, and luiti to manage the business codes.
luiti an offline task management framework, built on top of luigi. And it's the biggest
project I had ever created, and was used and developed more than half of a year.
etl_utils includes lots of useful utils, e.g. print processing speed on whatever
enumrable object, etc.
rsyncrun Rsync your code to server and run.
validata A data validator library used to detect invalid data with error informations,
based on MongoEngine.
model_cache Cache data in { item_id => item_content } format, supported storage are
memory, sqlite and redis.