Gemfire XD

From Verify.Wiki
Jump to: navigation, search


About

Gemfire Icon

Pivotal GemFire XD is a memory-optimized, distributed data store that is designed for applications that have demanding scalability and availability requirements. With GemFire XD you can manage data entirely using in-memory tables, or you can persist very large tables to local disk store files or to a Hadoop Distributed File System (HDFS) for big data deployments. In this initial release, GemFire XD provides a low-latency SQL interface to in-memory table data, while seamlessly integrating data that is persisted in HDFS. A single GemFire XD distributed system can be easily scaled out using commodity hardware to support thousands of concurrent clients, and you can also replicate data between distinct GemFire XD clusters over a WAN interface. GemFire XD also provides easy access to persisted HDFS data.

GemFire XD offers extremely high throughput, predictable latency, dynamic and linear scalability, and continuous availability of data. GemFire XD is implemented entirely in Java, and it can be embedded directly within a Java application.You can also deploy GemFire XD members as standalone servers that participate in a cluster.

GemFire XD integrates Pivotal GemFire functionality with several components of the Apache Derby relational database management system (RDBMS)[1]

Controversies

Someone can argue upon the performance/stability of application using Gemfire XD as it consumes all the available memory available in the Cluster,but at the same time Pivotal has introduced certain measures by which one can limit the amount of memory being utilized by Gemfire.

History

Big Data is not only about storing large volume of data but enabling analysis on this data to derive intelligence is the most important part which will provide us wisdom to take right decision.

With the announcement of Pivotal Gemfire XD it closed loop analytics with Hadoop. Pivotal released Pivotal HD 2.0 on 2014-09-15 , the company's supported version of Apache Hadoop 2.2, along with an additional tool for real-time analytics support with the general availability of GemFire XD, an in-memory database application. The company's integration of GemFire XD with Pivotal HD 2.0 is meant to deliver Business Data Lake (BDL) architecture for the enterprise.[2]

With release of Pivotal Gemfire XD 1.3.0[3] on 2014-09-15.

Version Release Date
1.4.1 2015-04-30
1.4.0 2015-01-07
1.3.1 2014-11-24
1.3.0 2014-05-15

Feature Comparison

GemFire XD and traditional relational databases

GemFire XD has little administrative requirements and no scaling limitations, and offers an alternative to the traditional, centralized database architecture.

  1. GemFire XD data can be partitioned and/or replicated in memory across a cluster of machines. In some respects, this is similar to embedding a database like Derby, except that the data can be managed primarily in memory, replicated for high availability, and partitioned for scale. Data can be managed either in an embedded fashion (collocated in the application server JVM) or managed in standalone JVMs.
  1. Unlike traditional databases, not all operations with GemFire XD are constrained by ACID properties. Instead, GemFire XD provides this control to the control to application designers, who can choose ACID constraints based on performance, availability, and consistency requirements. Although data consistency and data integrity are key element of the GemFire XD design, the design assumes access to abundant memory and relaxes some of the transactional properties.

GemFire XD and Pivotal GemFire

  1. GemFire XD incorporates a more sophisticated SQL query engine that compiles a query plan into byte code.
  2. GemFire XD also has a much more sophisticated cost-based optimizer.
  3. The configuration and deployment model for GemFire XD is simpler and is designed to be intuitive to anyone having experience with relational database systems.
  4. GemFire XD is based on standards such as SQL, JDBC, and ODBC making it very straightforward to adopt in existing applications that use relational databases.

GemFire XD and other in-memory databases

Many in-memory database offerings were designed to keep all data in memory in a single process space. Although many support replication for failover, they are similar to relational databases where all updates are routed through a single primary process. The design of such systems does not support partitioning data sets across a servers, and does not allow the cluster size to grow or shrink at runtime. Databases of this type are best suited when the data set size is small and the concurrent access requirements are modest.

In contrast, GemFire XD data and processing can be distributed to as many servers as are required to handle the volume and load at a given time.

GemFire XD and other cloud or NoSQL databases

NoSQL and cloud databases provide scalability and high availability properties for Web applications, but they typically do so at the expense of data consistency. Products such as Amazon's Dynamo use an eventually consistent transaction model, while others allow only a single entry update as part of a transaction.

GemFire XD's design, though driven by horizontal scalability and availability, imposes fewer restrictions. NoSQL databases provide only key-based access, or use a proprietary syntax for queries. Those products that do support queries do not support joins.

The premise behind GemFire XD is to capitalize on the power of SQL as an expressive, flexible, and very well-understood query language.

Hello World Example

GemFire XD in 15 minutes[4]

Download the latest GemFire XD .zip file distribution from the product download page: https://network.pivotal.io/products/gemfirexd. Save the downloaded file in your home directory.

  • Install GemFire XD by uncompressing the downloaded file. For example:

$ cd ~

$ unzip Pivotal_GemFireXD_130_b50226_Linux.zip

Substitute the exact filename that you downloaded. This installs GemFire XD in a new Pivotal_GemFireXD_XXX_bNNNNN_Linux subdirectory in your home directory, where XXX is the version of GemFire XD and NNNNN is the specific GemFire XD build number that you downloaded.

  • If you have not already done so, download and install Java.
  • Set your PATH environment variable to include the bin subdirectory of the GemFire XD directory. For example:

$ export PATH=$PATH:$HOME/Pivotal_GemFireXD_XXX_bNNNNN_Linux/bin

  • Create three new directories in your home directory for the locator and two servers that will make up the GemFire XD distributed system:

$ mkdir locator1 server1 server2

  • Start the locator:

$ gfxd locator start -peer-discovery-address=localhost -dir=locator1 -jmxmanager-start=true -jmx-manager-http-port=7075

Starting GemFireXD Locator using peer discovery on: localhost[10334]

Starting network server for GemFireXD Locator at address localhost/127.0.0.1[1527]

GemFireXD Locator pid: 8787 status: running

Logs generated in /home/yozie/Pivotal_GemFireXD_XXX_bNNNNN_Linux/quickstart/locator1/gfxdlocator.log

This command starts a default locator that accepts connections on the localhost address. The default port of 10334 is used for communication with other members of the distributed system. (You can double-check that this port is used by examining the locator1/gfxdlocator.log file.) All new members of the distributed system must specify this locator's address and peer discovery port, localhost[10334], in order to join the system.

The default port of 1527 is used for client connections to the distributed system.

Specifying -jmx-manager-start=true starts an embedded JMX Manager within the locator. By starting a JMX Manager, you can monitor and browse the GemFire XD distributed system through the Pulse graphical interface.

  • Start the Pulse monitoring tool by opening a browser and entering the following URL:

http://localhost:7075/pulse

At the Pulse login screen, type in the default username admin and password admin. Keep on checking the Pulse tool for corresponding changes as you perform the activities in this quickstart.

  • Start both servers:

$ gfxd server start -locators=localhost[10334] -bind-address=localhost -client-port=1528 -dir=server1

$ gfxd server start -locators=localhost[10334] -bind-address=localhost -client-port=1529 -dir=server2

Starting GemFireXD Server using locators for peer discovery:

localhost[10334]

Starting network server for GemFireXD Server at address

localhost/127.0.0.1[1528]

GemFireXD Server pid: 8897 status: running

Logs generated in /home/yozie/Pivotal_GemFireXD_XXX_bNNNNN_Linux/quickstart/server1/gfxdserver.log

Starting GemFireXD Server using locators for peer discovery:

localhost[10334]

Starting network server for GemFireXD Server at address

localhost/127.0.0.1[1529]

GemFireXD Server pid: 9003 status: running

Logs generated in /home/yozie/Pivotal_GemFireXD_XXX_bNNNNN_Linux/quickstart/server2/gfxdserver.log

Both servers also bind to the localhost address. They must specify unique client ports in order to avoid conflicts with the locator's default client port. As an alternative, they could disable the network server entirely by specifying -run-netserver=false, and all clients would need to connect through the locator.

  • Connect to the distributed system as a thin client, and display information about the system members:

$ gfxd

gfxd> connect client 'localhost:1527';

  • Now that you're connected to the system, run a simple query to display information about the GemFire XD system members:

gfxd> select id, kind, netservers from sys.members;

ID |KIND |NETSERVERS


localhost(17355):1374 |locator(normal) |localhost/127.0.0.1[1527]

localhost(17535)<v2>:52946 |datastore(normal)|localhost/127.0.0.1[1529]

localhost(17438)<v1>:1230 |datastore(normal)|localhost/127.0.0.1[1528]

3 rows selected

By default, GemFire XD servers are started as data stores, so that they can host database schemas.In this cluster, you can connect as a client to any member by specifying localhost with the unique port number of the member (the one specified in the NETSERVERS column). However, connecting to the locator provides basic load balancing by routing the connection request to an available server member.

  • Create a simple table and insert a few rows:

gfxd> create table quicktable (id int generated always as identity, item char(25));

0 rows inserted/updated/deleted

gfxd> insert into quicktable values (default, 'widget');

1 row inserted/updated/deleted

gfxd> insert into quicktable values (default, 'gadget');

1 row inserted/updated/deleted

gfxd> select * from quicktable;

ID |ITEM


2 |gadget 1 |widget 2 rows selected


Top 5 Recent Tweets

Date Author Tweet
12 Nov 2015 @Pivotal Catch The Wave! @Forrester evaluated 11 companies across 32 criteria and positioned Pivotal as a Leader. #GemFire
5 Nov 2015 @PivotalANZ Performance & scalability: 300% increase in app performance with #IndiaRail. How did they do it? http://bit.ly/1GJ4lAr #GemFire
19 Oct 2015 ‏@cmani Pivotal GemFire cited as a leader in the Forrester Wave™: In-Memory Data Grids, Q3 2015 report https://lnkd.in/e3mdDDw #pivotal #gemfire
20 Feb 2015 @vGazza @pinrojas #Spark is promising and @PivotalGemFire #GemFireXD is rock solid for in-memory analytics workloads today
6 Jan 2015 @arsenyspb 10000 char limit... Now we talking! #datalake #storm #kafka #gemfire http://buff.ly/1VHLvfS

Top 5 Recent News Headlines

Date Headline FirstPara
29 Sept 2015 Pivotal open sources tech for SQL and machine learning on Hadoop The two technologies—HAWQ, a scale-out SQL database on Hadoop, and MADlib, a library of machine learning algorithms for databases like HAWQ—will be released as open source projects to the Apache Software Foundation. (MADlib was technically open source already, but was not an Apache project). The move is a continuation of what Pivotal started in February, when it open sourced the code for its Greenplum database software and its proprietary distribution of the Hadoop software.[5]
8 May 2015 Pivotal rolls out Hadoop distro update, new query optimizer Pivotal releases a new version of its Pivotal HD Hadoop distribution built on the open source ODP big data kernel, along with a new cost-based query optimizer that promises up to 100x performance upgrades for Greenplum and HAWQ.[6]
07 Jun 2015 Gartner's 19 In-memory Databases for Big Data Analytics Amid the big data boom, the in-memory database market will enjoy a 43 percent compound annual growth rate (CAGR) – leaping from $2.21 billion in 2013 to $13.23 billion in 2018, predicts Markets and Markets, a global research firm. What’s driving that demand? Simply put, in-memory databases allow real-time analytics and situation awareness on "live" transaction data – rather than after-the-fact analysis on "stale data,” notes a recent Gartner market guide. Here are 19 in-memory database options mentioned in that Gartner market guide.[7]
20 Feb 2015 ‏Pivotal pivots to open source and Hortonworks A few days ago Pivotal made three major announcements: the creation of a Big Data Product Suite, a partnership with Hortonworks and the launch of an 'Open Data Platform'.[8]
17 Mar 2014 The Data Economy: With GemFire XD, Pivotal closes the Big Data analytics loop PThe team at Pivotal made a handful of announcements today focused on the Big Data and analytics layer of its enterprise platform play. Namely, the company announced Pivotal HD 2, the latest version of its Hadoop distribution that incorporates HAWQ, a modified version of the Greenplum database, for SQL analytics. Version 2 is based on Hadoop 2.2, which as you’ll remember is the remastered version of Apache Hadoop that leverages YARN to enable multiple types of applications (not just MapReduce applications) to run on top of HDFS.[9]

Top 5 Lifetime Tweets

Date Author Tweet
5 Nov 2015 @PivotalANZ Performance & scalability: 300% increase in app performance with #IndiaRail. How did they do it? http://bit.ly/1GJ4lAr #GemFire
19 Oct 2015 ‏@cmani Pivotal GemFire cited as a leader in the Forrester Wave™: In-Memory Data Grids, Q3 2015 report https://lnkd.in/e3mdDDw #pivotal #gemfire
18 Jun 2015 ‏@making Open Sourcing GemFire - Apache Geode by @ApacheGeode #apache #gemfire http://www.slideshare.net/ApacheGeode/open-sourcing-gemfire-apache-geode … @SlideShare
28 Aug 2015 @GemDataGuy Pivotal #GemFire 8.2 w/ support for JDK 8 and RH7 is here! https://network.pivotal.io/products/pivotal-gemfire … @pivotal @PivotalBigData #IMDG #NoSQL #InMemoryComputing
6 Jan 2015 @arsenyspb 10000 char limit... Now we talking! #datalake #storm #kafka #gemfire http://buff.ly/1VHLvfS

Top 5 Lifetime News Headlines

Date Headline FirstPara
29 Sept 2015 Pivotal open sources tech for SQL and machine learning on Hadoop The two technologies—HAWQ, a scale-out SQL database on Hadoop, and MADlib, a library of machine learning algorithms for databases like HAWQ—will be released as open source projects to the Apache Software Foundation. (MADlib was technically open source already, but was not an Apache project). The move is a continuation of what Pivotal started in February, when it open sourced the code for its Greenplum database software and its proprietary distribution of the Hadoop software.[10]
20 Feb 2015 ‏Pivotal pivots to open source and Hortonworks A few days ago Pivotal made three major announcements: the creation of a Big Data Product Suite, a partnership with Hortonworks and the launch of an 'Open Data Platform'.[11]
17 Feb 2015 Big Data Suite Goes Open Source Last spring, Pivotal unveiled its Pivotal Big Data Suite, a subscription-based software, support and maintenance package that bundled its big data components into a single, simple licensing structure. The Big Data Suite was responsible for $40 million of the $100 million in total business Pivotal did in 2014. Today, the company took the unprecedented step of open sourcing all those components.[12]
17 Mar 2014 The Data Economy: With GemFire XD, Pivotal closes the Big Data analytics loop PThe team at Pivotal made a handful of announcements today focused on the Big Data and analytics layer of its enterprise platform play. Namely, the company announced Pivotal HD 2, the latest version of its Hadoop distribution that incorporates HAWQ, a modified version of the Greenplum database, for SQL analytics. Version 2 is based on Hadoop 2.2, which as you’ll remember is the remastered version of Apache Hadoop that leverages YARN to enable multiple types of applications (not just MapReduce applications) to run on top of HDFS.[13]
24 Oct 2013 A heaping helping of Hadoop products just hit the market The newswires were inundated with Hadoop-related products announcements this week, and next week will be even crazier. Some are interesting, some are less interesting, but they all underscore the trend we’ve been seeing shape up for a couple years now: Big data, and Hadoop specifically, is the new cloud computing.[14]

References

  1. http://gemfirexd.docs.pivotal.io/docs/1.4.1/pdf/pivotal-gemfirexd-ug-1.4.1.pdf
  2. http://www.tomsitpro.com/articles/pivotal-hadoop-in-memory-database-gemini-xd,1-1794.html
  3. https://network.pivotal.io/products/gemfirexd#/releases/67
  4. http://gemfirexd.docs.pivotal.io/docs/1.4.1/pdf/pivotal-gemfirexd-ug-1.4.1.pdf
  5. http://fortune.com/2015/09/29/pivotal-open-source/
  6. http://www.cio.com/article/2920126/big-data/pivotal-rolls-out-hadoop-distro-update-new-query-optimizer.html
  7. http://www.information-management.com/gallery/In-memory-database-list-gartner-big-data-analytics-10027047-1.html
  8. http://www.zdnet.com/article/pivotal-pivots-to-open-source-and-hortonworks/
  9. http://siliconangle.com/blog/2014/03/17/with-gemfire-xd-pivotal-closes-the-big-data-analytics-loop/
  10. http://fortune.com/2015/09/29/pivotal-open-source/
  11. http://www.zdnet.com/article/pivotal-pivots-to-open-source-and-hortonworks/
  12. http://www.cio.com/article/2884323/big-data/big-data-suite-goes-open-source.html
  13. http://siliconangle.com/blog/2014/03/17/with-gemfire-xd-pivotal-closes-the-big-data-analytics-loop/
  14. | http://gigaom.com/2013/10/24/a-heaping-helping-of-hadoop-products-just-hit-the-market/

Verification history