tidyextractors is a Python package that makes extracting data from supported sources (e.g. email mbox files, source code log files) as painless as possible, delivering you a populated Pandas DataFrame in just a few lines of code.
tidyextractors makes extracting data from supported sources as painless as possible, delivering you a populated Pandas DataFrame in just a few lines of code. tidyextractors was inspired by Hadley Wickham’s (2014) paper which introduces “tidy data” as a conceptual framework for data preparation.
- Extracts data with minimal effort.
- Creates readable code that requires minimal explanation.
- Exports Pandas Dataframes to maximize compatibility with the Python data science ecosystem.
Currently implemented data sources
- Local Git repositories
- Twitter user data (including Tweets) using the Twitter API
- Emails stored in the Mbox file format
See the tidyextractors docs for more information, including code examples, API reference, and general documentation.