MENU × BUSINESS
Banking And InsuranceCryptocurrencyDigital MarketingErpFood And BeveragesHealthcareLegalMarketing And AdvertisingMedia And EntertainmentMetals And MiningOil And GasRetailTelecom
TECHNOLOGY
Artificial IntelligenceBig DataCloudCyber SecurityE CommerceEducationGaming And VfxIT ServiceMobileNetworkingSAPScience And TechnologySecuritySoftwareStorage
PLATFORM
CiscoDatabaseGoogleIBMJuniperM2MMicrosoftOracleOracleRed Hat
LEADERSHIP
CEO ReviewCompany Review
MAGAZINE
ASIA INDIA
STARTUPS CLIENT SPEAK CONTACT US

The Silicon Review Asia

“Apache Arrow” is the new open-source project for Big data

“Apache Arrow” is the new open-source project for Big data

Hadoop, Spark and Kafka have already had a defining influence on the world of big data, and now there’s yet another Apache project with the potential to shape the landscape even further: Apache Arrow. Apache Software Foundation recently launched Arrow as a top-level project designed to provide a high-performance data layer for columnar in-memory analytics across disparate systems. Based on code from the related Apache Drill project, Apache Arrow can bring benefits including performance improvements of more than 100x on analytical workloads, the foundation said. In general, it enables multi-system workloads by eliminating cross-system communication overhead.

Code committers to the project include developers from other Apache big-data projects such as Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Kudu, Parquet, Phoenix, Spark and Storm. “The open-source community has joined forces on Apache Arrow,” said Jacques Nadeau, vice president of the new project as well as Apache Drill. “We anticipate the majority of the world’s data will be processed through Arrow within the next few years.” In many workloads, between 70 percent and 80 percent of CPU cycles are spent serializing and deserializing data. Arrow alleviates that burden by enabling data to be shared among systems and processed with no serialization, deserialization or memory copies, the foundation said.

“An industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead,” said Ted Dunning, vice president of the Apache Incubator and member of the Apache Arrow Project Management Committee. Arrow also supports complex data with dynamic schemas in addition to traditional relational data. For instance, it can handle JSON data, which is commonly used in Internet-of-Things (IoT) workloads, modern applications and log files. Implementations are also available for a number of programming languages for greater interoperability.

Apache Arrow software is available under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project.

YOU MAY ALSO LIKE

Tariff plans will cost higher from the next financial year as telecom companies are gearing up to increase rates

The ongoing Covid pandemic had significantly increased the number of mobile and internet users worldwide. The high amount of usage is expected to drop...

Business Travelers to Stay at Singapore’s Changi Airport Bubble

Singapore’s open economy depends heavily on tourism and business. But its vibrant but small economy has been hurting as the circumstances due to...

Countries in Asia-Pacific are marching towards ‘green recovery’ amidst Covid-19 crisis

2020 has been an unforgettable year for many of us. The ongoing Covid crisis has reminded people that it is very important to have an uninterrupted an...

Department of Telecommunication to announce the new schedule for 5G trials

Department of Telecommunications (DoT) is all set to announce the new schedule for 5G trials. The Dot made this decision after being pulled by the par...

RECOMMENDED