Spring XD is a unified, distributed, and extensible service for data ingestion, real time analytics, batch processing, and data export. The Spring XD project is an open source Apache 2 License licenced project whose goal is to tackle big data complexity. Much of the complexity in building real-world big data applications is related to integrating many disparate systems into one cohesive solution across a range of use-cases.
Common use-cases encountered in creating a comprehensive big data solution are
- High throughput distributed data ingestion from a variety of input sources into big data store such as HDFS or Splunk
- Real-time analytics at ingestion time, e.g. gathering metrics and counting values.
- Workflow management via batch jobs. The jobs combine interactions with standard enterprise systems (e.g. RDBMS) as well as Hadoop operations (e.g. MapReduce, HDFS, Pig, Hive or HBase).
- High throughput data export, e.g. from HDFS to a RDBMS or NoSQL database.
The Spring XD project aims to provide a one stop shop solution for these use-cases. 
Handling Data flow/pipeline has always been a challenging task ,with advent of IOT we have grown both in volume and variety of data required to be handled.Spring XD was designed to ease the task of handling data from multiple streams and allowing developers to focus on logic .
Spring XD is developed by Pivotal the same team behind Spring
Spring XD vs Apache Nifi
Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of dataflows. It is highly configurable along several dimensions of quality of service, such as loss-tolerant versus guaranteed delivery, low latency versus high throughput, and priority-based queuing. NiFi provides fine-grained data provenance for all data received, forked, joined cloned, modified, sent, and ultimately dropped upon reaching its configured end-state.
Apache Nifi is a tool and platform for getting data from source to destination, both at the edge (ingestion) and at the core(transport). There are tools built in to track data and support more cases at the edge by supplying one's own services and processors. It also has a very operator-centric UI for building and maintaining flows.
The scope for Spark XD appears to be more of an application orchestration/integration framework (analytic glue). From my perspective, the UIs and programming interfaces are very programmer centric. It also has some overlap with Apache Falcon.
There is overlap with Spark XD and NiFi. Both do ingest (into a myriad of sinks). Both can do 'analysis'.
It would be very interesting to see how Spring XD and Apache Nifi will compete or perhaps work together.
One of the beauty of Spring XD is you can get started very very easily .Once you have downloaded Spring XD stable version
- cd spring-xd-<version>.RELEASE
Spring XD can be started in two different modes.Spring XD can be run in two different modes. There’s a single-node runtime option for testing and development, and there’s a distributed runtime which supports distribution of processing tasks across multiple nodes. This document will get you up and running quickly with a single-node runtime.
Start the Runtime and the XD Shell
The single node option is the easiest to get started with. It runs everything you need in a single process. To start it, you just need to cd to the xd directory and run the following command
- xd/bin>$ ./xd-singlenode
In a separate terminal, cd into the shell directory and start the XD shell, which you can use to issue commands.
- shell/bin>$ ./xd-shell
1.3.0.RELEASE | Admin Server Target: http://localhost:9393
Welcome to the Spring XD shell. For assistance hit TAB or type "help".
Spring XD uses ZooKeeper internally which typically runs as an external process. XD singlenode runs with an embedded ZooKeeper server and assigns a random available port.
Create a Stream
In Spring XD, a basic stream defines the ingestion of event driven data from a source to a sink that passes through any number of processors. You can create a new stream by issuing a stream create command from the XD shell. Stream definitions are built from a simple DSL. For example, execute:
- xd:> stream create --name ticktock --definition "time | log" --deploy
This defines a stream named ticktock based off the DSL expression time | log. The DSL uses the "pipe" symbol |, to connect a source to a sink. The stream server finds the time and log definitions in the modules directory and uses them to setup the stream. In this simple example, the time source simply sends the current time as a message each second, and the log sink outputs it using the logging framework at the WARN logging level. Since the --deploy flag was provided, this stream will be deployed immediately. In the console where you started the server, you will see log output similar to that listed below
- 13:09:53,812 INFO http-bio-8080-exec-1 module.SimpleModule:109 - started module: Module [name=log, type=sink]
- 13:09:53,813 INFO http-bio-8080-exec-1 module.ModuleDeployer:111 - launched sink module: ticktock:log:1
- 13:09:53,911 INFO http-bio-8080-exec-1 module.SimpleModule:109 - started module: Module [name=time, type=source]
- 13:09:53,912 INFO http-bio-8080-exec-1 module.ModuleDeployer:111 - launched source module: ticktock:time:0
- 13:09:53,945 WARN task-scheduler-1 logger.ticktock:141 - 2013-06-11 13:09:53
- 13:09:54,948 WARN task-scheduler-1 logger.ticktock:141 - 2013-06-11 13:09:54
- 13:09:55,949 WARN task-scheduler-2 logger.ticktock:141 - 2013-06-11 13:09:55
Important Video Links
- A Great video that walks through the architecture as well as working in depth of Spring XD
- IoT Realized with Spring XD - The Connected Car
Top 5 Recent Tweets
|29 Dec 2015||@LPSIntegration||You know Twitter, but do you know Twitter with #SpringXD? http://bit.ly/1Mw6gZG #LPSTechBlog #BestOf2015|
|20 Dec 2015||@dinodb||#SpringXD: The Foundation for Real-time #Streaming and #machinelearning Systems https://shar.es/1GlWtx #bigdata #lambda @pivotal|
|21 Nov 2015||@Pivotal||The New Flo for #SpringXD http://ow.ly/UStvu #bigdata via @Pivotal|
|19 Nov 2015||@springcentral||#SpringXD 1.3 GA introduces Flo for Spring XD 1.0 + a job composition DSL http://bit.ly/1MXdPZy #hadoop #bigdata #springframework @java|
|9 Nov 2015||@PivotalBigData||Pivotal #partner @zDataInc’s Analytics Sandbox helps you get started with @Greenplum, #SpringXD and @ApacheMADlib|
Top 5 Recent News Headlines
|8 Jan 2016||Spring XD Today and Tomorrow||Mark Pollack covers the major new features added to Spring XD since last year as well as upcoming changes for the next major version. He introduces and demonstrates key integrations driven by the Big Data ecosystem at large such as Kafka, Spark, functional programming, integration with Python, and designer/monitoring UIs.|
|24 Dec 2015||Stream Processing at Scale with Spring XD and Kafka||Marius Bogoevici discusses how Spring XD integrates with Kafka as an external datasource and transport. Marius performs a demo that shows how to unleash the power of Kafka with Spring XD, by building a highly scalable data pipeline with RxJava and Kafka, using Spring XD as a platform.|
|10 Nov 2015||Spring XD 1.3 RC1 released||The first release candidate of Spring XD 1.3 is available, providing new functionality for batch jobs. Release target is currently set for 11/17/2015.|
|25 Sept 2015||SpringXD being Re-architected and Re-branded to Spring Cloud Data Flow||Pivotal announced a complete re-design of Spring XD, its big data offering, during last week’s SpringOne2GX conference, with a corresponding re-brand from Spring XD to Spring Cloud Data Flow. The new product uses executable applications as the foundation for modules, and focuses on the orchestration of them. Whilst at the top level the REST API, shell and UI have survived from Spring XD, maintaining backwards compatibility, below that the two products are very different.|
|5 Mar 2015||Spring XD 1.1: Simplifying Big Data like Spring Did for Java EE||Pivotal recently released Spring XD 1.1 GA with new features including stream processing with Reactor, RxJava, Spark Streaming and Python. Additionally support for Kafka, batching and compression with RabbitMQ, and support for container group management when running on YARN are now featured. The Spring XD project provides over 25 sample applications for developers.|
Top 5 Lifetime Tweets
|8 Oct 2015||@springcentral||#SpringXD 1.3M1 updates versions of @ApacheSpark, @apachehadoop #springintegration and introduces a @cassandra sink http://bit.ly/1R1F9VB|
|17 Jun 2015||@ryanpmorgan||So many great features in the latest #SpringXD 1.2 GA. One of my favorites is Flo, check it out here: https://www.youtube.com/watch?v=17pLpcdIu_M …|
|16 Jun 2015||@cmani||#SpringXD 1.2GA debuts > 1mil msgs/sec perf, on par with native @apachekafka WITH reproducible benchmark kit http://bit.ly/1J3D8s9 #iot|
|13 Apr 2015||@fredmelo_br||Real-time Stock Prediction System with #R, #Geode and #SpringXD for @ApacheCon almost ready! http://sched.co/2xFS|
|26 Feb 2015||@springcentral||Honored to have visionaries like Benjamin Black joining Pivotal to work on #IoT - incredible http://www.wired.com/2015/02/cloud-computing-pioneer-moves-emcs-pivotal/ … #springxd @b6n|
Top 5 Lifetime News Headlines
- ↑ http://docs.spring.io/spring-xd/docs/current/reference/html/
- ↑ https://mail-archives.apache.org/mod_mbox/nifi-dev/201506.mbox/%3CCA+LyY55t7tHbwr2MVbd2VrUhr+GXs4PwcFWQF0qbj2qL-RYgXg@mail.gmail.com%3E
- ↑ https://news.ycombinator.com/item?id=10190846
- ↑ http://www.infoq.com/presentations/spring-xd-big-data
- ↑ http://www.infoq.com/presentations/stream-spring-xd-kafka
- ↑ https://dzone.com/articles/spring-xd-13-rc1-released-1
- ↑ http://www.infoq.com/news/2015/03/spring-xd-1.1