FreeComputerBooks.com
Links to Free Computer, Mathematics, Technical Books all over the World
|
|
- Title: Big Data Processing with Apache Spark
- Author(s) Srini Penchikala
- Publisher: InfoQ (2018)
- Paperback: N/A
- eBook: PDF (104 pages)
- Language: English
- ISBN-10: N/A
- ISBN-13: N/A
- Share This:
Apache Spark is an open-source big-data processing framework built around speed, ease of use, and sophisticated analytics.
Spark has several advantages compared to other big-data and MapReduce technologies like Hadoop and Storm. It provides a comprehensive, unified framework with which to manage big-data processing requirements for datasets that are diverse in nature (text data, graph data, etc.) and that come from a variety of sources (batch versus real-time streaming data).
Spark enables applications in HDFS clusters to run up to a hundred times faster in memory and ten times faster even when running on disk.
In this mini-book, the reader will learn about the Apache Spark framework and will develop Spark programs for use cases in big-data analysis. The book covers all the libraries that are part of Spark ecosystem, which includes Spark Core, Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX.
About the Authors- Srini Penchikala currently works as Software Architect at a financial services organization in Austin, Texas. He has over 20 years of experience in software architecture, design and development.
- Big Data
- Data Engineering and Data Science
- Data Analysis and Data Mining
- Non-relational/NoSQL Databases
-
Mastering Spark with R: Large-Scale Analysis and Modeling
With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.
-
The Internals of Apache Spark (Jacek Laskowski)
This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala.
-
The Data Engineer’s Guide to Apache Spark (Databricks)
This book is for data engineers looking to leverage the immense growth of Apache Spark to build faster and more reliable data pipelines. It leverages Spark's amazing speed, scalability, simplicity, and versatility to build practical Big Data solutions.
-
Graph Algorithms: Practical Examples in Apache Spark and Neo4j
This book is a practical guide to getting started with graph algorithms for developers and data scientists who have experience using Apache Spark or Neo4j. You'll walk through hands-on examples that show you how to use graph algorithms in Apache Spark/Neo4j.
-
Knowledge Graphs and Big Data Processing (Valentina Janev, et al)
Each chapter in this book addresses some pertinent aspect of the data processing chain, with a specific focus on understanding Enterprise Knowledge Graphs, Semantic Big Data Architectures, and Smart Data Analytics solutions.
-
Kafka: The Definitive Guide: Real-Time Data and Stream Processing
Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.
-
Artificial Intelligence for Big Data (Anand Deshpande, et al)
You will learn to use machine learning algorithms such as k-means, SVM, RBF, and regression to perform advanced data analysis. You will understand the current status of machine and deep learning techniques to work on genetic and neuro-fuzzy algorithms.
-
Hadoop with Python (Zachary Radtka, et al)
This book takes you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools.
:
|
|