FreeComputerBooks.com
Links to Free Computer, Mathematics, Technical Books all over the World
|
|
- Title Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools
- Author(s) Jeroen Janssens
- Publisher: O'Reilly Media; 2nd edition (September 7, 2021); eBook (Creative Commons Licensed)
- License(s): CC BY-ND 4.0
- Paperback: 282 pages
- eBook: HTML
- Language: English
- ISBN-10: 1492087912
- ISBN-13: 978-1492087915
- Share This:
This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.
To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux.
You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power.
This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.
- Obtain data from websites, APIs, databases, and spreadsheets
- Perform scrub operations on plain text, CSV, HTML/XML, and JSON
- Explore data, compute descriptive statistics, and create visualizations
- Manage your data science workflow using Drake
- Create reusable tools from one-liners and existing Python or R code
- Parallelize and distribute data-intensive pipelines using GNU Parallel
- Model data with dimensionality reduction, clustering, regression, and classification algorithms
- Jeroen Janssens teaches data science; often through training and coaching, occasionally through speaking, and infrequently through writing. Currently, Jeroen is the CEO of Data Science Workshops, which organises open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups.
- Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools
- The Mirror Site (1) - PDF, ePub, Kindle, etc.
- The Mirror Site (2) - PDF
-
The Linux Command Line: A Complete Introduction (W. Shotts, Jr.)
This book takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell. Along the way you'll learn the timeless skills handed down by generations of gray-bearded, mouse-shunning gurus.
-
Data Engineering Cookbook: The Plumbing of Data Science
This is a practical and comprehensive guide. You will learn the basics of data engineering. Then you will learn the technologies and frameworks required to build data pipelines to work with large datasets.
-
The Ultimate Guide to Effective Data Cleaning
With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Experts share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges.
-
The Data Engineer's Guide to Apache Spark (Databricks)
This book is for data engineers looking to leverage the immense growth of Apache Spark to build faster and more reliable data pipelines. It leverages Spark's amazing speed, scalability, simplicity, and versatility to build practical Big Data solutions.
-
Computational and Inferential: The Foundations of Data Science
Step by step, you'll learn how to leverage algorithmic thinking and the power of code, gain intuition about the power and limitations of current machine learning methods, and effectively apply them to real business problems.
-
Data Science: Theories, Models, Algorithms, and Analytics
It provides a bucket full of information regarding Data Science, covers a wide variety of sections by giving access to theories, data science algorithms, tools and analytics. You'll explore the right approach to best practices to guide you along the way.
-
Building the Data Lakehouse (Bill Inmon, et al.)
Learn about the features and architecture of the data lakehouse, along with its powerful analytical infrastructure. The data lakehouse is the next generation of the data warehouse and data lake.
:
|
|