Cascading: A Java Tool for Hadoop

Cascading: A Java Tool for Hadoop

This presentation provides an introduction to Cascading, an open source application development framework that allows Java developers to build applications on top of Hadoop through its Java API.

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster. Whether solving simple or complex data problems, Cascading balances an optimal level of abstraction with the necessary degrees of freedom through a computation engine, systems integration framework, data processing and scheduling capabilities. Cascading was designed to fit into any Enterprise development environment. With a clear separation between “data processing” and “data integration”, its clean Java API, and JUnit testing framework, Cascading can easily be tested and deployed at any scale.

Besides its core SDK, the Cascading project is the also providing several additional related tools:
* Lingual: simplifies systems integration through ANSI SQL compatibility and a JDBC driver
* Pattern: Enables various machine learning scoring algorithms through PMML compatibility
* Scalding: Enables development with Scala, a powerful language for solving functional problems
* Cascalog: Enables development with Clojure, a Lisp dialect

Video producer: https://twitter.com/university