HomeTech PlusTECH & OTHER NEWSNew Aerospike release brings Spark 3.0 compatibility

New Aerospike release brings Spark 3.0 compatibility

At its Digital Summit virtual event today, real-time NoSQL database player Aerospike announced a new release of its eponymous product. The v5.6 release adds a few features that together are designed to optimize the loop of real-time data processing and machine learning at the edge and “core” (cloud or corporate data center). The scenarios furthermore involve training machine learning (ML) models at the core from edge data, then pushing the models back to the edge for inferencing.

Also read:

Three legs of the stool

ZDNet spoke with Aerospike founder and Chief Product Officer Srini Srinivasan, who briefed us on the three features that facilitate and optimize this virtuous data/ML cycle. They are:

  • Set indexes: which accelerate access to data in Aerospike sets (comparable to tables). The company says this feature makes for fast queries of sets, even in a petabyte-scale database.
  • Enhancements to Aerospike expressions: read and write operations can now be embedded within the implementations of expressions. Srinivasan explained that expressions, which are implemented in C, execute much more efficiently than user defined functions, and move processing closer to the data.
  • Updated Aerospike Connect for Spark: this connector is now compatible with Apache Spark 3.0. This in turn, allows developers to use Spark 3.0 and its APIs directly against Aerospike (bringing back data as Spark DataFrames).

Lighting up Spark

The Aerospike connector for Spark allows real-time and historical data in the database to be used for training ML models, without requiring that data to be exported first. Also, explained Srinivasan, Aerospike can manage data sets larger than what might fit in memory, which enables the otherwise memory-oriented Spark to work with high-volume data, potentially much faster than Spark working against, say, Parquet files in cloud storage.

The more Spark can “push down” data operations to Aerospike, the better, and the Aerospike connector will delegate the work aggressively that way when Spark code queries it. Such operations will then further benefit from the set indexes and expression enhancements that are also introduced in the new release. Aerospike’s connector for Presto (and, one would assume, Trino) operates similarly and benefits Presto users in a comparable fashion.

(Data)frame of reference

This pattern of allowing Spark developers to work natively against external databases is gaining momentum. Other databases, like Splice Machine, have enabled similar interfaces. Spark is now such a standard that its DataFrames are becoming a developer’s universal abstraction layer over data for the purposes of stream processing, querying, data engineering and ML.

Given the huge array of database and analytics platforms that have emerged over the last decade, it’s good to see that one of them is becoming a tool of consensus for working with several of the others. It’s also good to see that Aerospike now enables this for Spark 3.0.  

By ZDNet Source Link

Technology For You
Technology For Youhttps://www.technologyforyou.org
Technology For You - One of the Leading Online TECHNOLOGY NEWS Media providing the Latest & Real-time news on Technology, Cyber Security, Smartphones/Gadgets, Apps, Startups, Careers, Tech Skills, Web Updates, Tech Industry News, Product Reviews and TechKnowledge...etc. Technology For You has always brought technology to the doorstep of the Industry through its exclusive content, updates, and expertise from industry leaders through its Online Tech News Website. Technology For You Provides Advertisers with a strong Digital Platform to reach lakhs of people in India as well as abroad.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

spot_img

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS