End-to-End Data Pipelines

We offer three carefully selected full-day trainings, each doing a deep dive in a vertical or horizontal that is a key theme of Data By the Bay conference matrix.


Agile Data Science and End-to-End Data Pipelines with Andy Petrella

Building a distributed pipeline is a huge — and complex — undertaking. If you want to ensure yours is scalable, has fast in-memory processing, can handle real-time or streaming data feeds with high throughput and low-latency, is well suited for ad-hoc queries, can be spread across multiple data centers, is built to allocate resources efficiently, and is designed to allow for future changes, join Andy Petrella from Data Fellas for this immensely practical hands-on course.

  • Introduction
  • Data Collection
  • Akka Collectors
  • Queuing
  • Kafka


  • Interactive Programming
  • Spark Notebook
  • Produce (Notebook example)
  • Live Coding Exercise


  • Streaming Data
  • Spark Streaming
  • Consume Streaming
  • In-Memory Data
  • Cassandra


  • Store
  • Data Analysis
  • Spark Core and MLlib
  • Model (Notebook example)
  • Live Coding Exercise


  • Access Layer
  • Akka Micro Service
  • Serve (Notebook example)


  • Cluster Manager and Orchestration
  • Mesos
  • Marathon and Chronos
  • Live Coding Exercise
  • Wrap Up


Andy Petrella

Andy is a mathematician turned into a distributed computing entrepreneur. Besides being a Scala/Spark trainer. Andy also participated in many projects built using spark, cassandra, and other distributed technologies, in various fields including Geospatial, IoT, Automotive and Smart cities projects. He is the creator of the Spark Notebook, the only reactive and fully Scala notebook for Apache Spark.

In 2015, Xavier Tordoir and Andy founded Data Fellas around their product Agile Data Science Toolkit which facilitates the productization of Data Science projects and guarantees their maintainability and sustainability in time. Andy is also member of program committee of the O’Reilly Strata, Scala eXchange, Data Science eXchange and Devoxx events.


Andy Petrella presents Agile Data Science with Scala at SF Scala