Rust Open Source, 2023
Rust Open Source Projects
If you are into Rust, and I know you are this list may interest you. It is a short list of some significant Rust programming open source projects that are gaining momentum.
The thing with Rust is that to grow the tools and projects always need more developers to make them work as well as they can. If you are looking for some side projects to start with while learning Rust maybe try one of these. The last two, especially, are looking for first-time-contributions.
Velox
Ahana, Intel, and our Voltron Data team all contributed to its August 2022 release by Meta. Velox is made to speed up data management systems and simplify development. It uses an Apache Arrow-compatible memory layout. It covers feature engineering, data preprocessing, and other rapidly expanding machine learning (ML) and artificial intelligence (AI) use cases.
The goal of the project is to lessen the churn in the development process.
Velox centralizes computing, assisting engineers in connecting with engines more quickly so they can concentrate on results (for their apps, use cases, and clients) rather than having to create specialized solutions from scratch.
It’s crucial to emphasize that Velox is designed from the ground up to run on contemporary Processors. This is important for the development of ML and AI. You can currently find work centered on the most recent Intel CPUs with Advanced Vector Extensions (AVX).
DuckDB
Although DuckDB provides embeddable, column-oriented data storage with vectorized processing to enhance OLAP workloads, it is frequently referred to as a SQLite for analytical processes. That’s a lot to say. The project’s zero-copy data connection with Apache Arrow, which enables quick analysis for larger-than-memory datasets, is something we enjoy.
Because DuckDB is simple to install, lightweight, and offers quick analytics, developers are adopting it. The stack is CPU-based and single-node optimized. We are observing that it is attracting customers from other conventional relational database systems like MySQL and Postgres because it also makes language library integrations straightforward.
Rapids
Open source analytics, machine learning, graph analytics, and visualization packages collectively known as RAPIDS expose C++ primitives through Pythonic interfaces for low-level computation optimizations on NVIDIA GPUs. The RAPIDS C++ dataframe library, libcuDF, was built from the ground up with the Arrow columnar, in-memory data format in mind to provide quick and effective data exchange. As described in the blog post Relentlessly Increasing Performance, RAPIDS has been laser-focused on enhancing data preparation performance for a number of years.
In order to speed up string data transformations by more than 10X pandas and extract useful information from string data, RAPIDS libcuDF saves string data in device memory using the Arrow format. I’d like to mention some significant advancements, such as the inclusion of user defined functions (UDFs) with strings.
In order to reduce calculation delay, RMM builds and shares a single memory pool on the GPU. With this new integration, memory reuse between RAPIDS, PyTorch, and XGBoost systems is made easier and the handoff between libraries is simplified.
Substrait
When it comes to their data stacks’ lack of standardization, data analysts and data engineers suffer. Data manipulation APIs and DSLs don’t always work together with other compute engines, and compute engines don’t always work together with lower-level parts like compute primitives, query optimizers, hardware acceleration libraries, and computational storage systems.
A cross-language, interoperable specification for data compute operations is called Substrait. By connecting analytic tools with computation engines and compute engines with underlying computing components, it enables all these layers of the stack to communicate in a standard, flexible manner. When combined with Apache Arrow, an uniform standard for encoding tabular data, users can gain greater performance in a method that is modular and composable.
The Substrait project has grown substantially in 2022. The fundamental specification, which is built upon Protocol Buffers, is well known and can be expressed as binary or JSON. C++, C#, Go, Java, Python, R, Ruby, and Rust are among the eight languages for which bindings and other integrations have been developed. To guarantee that the project can accommodate the requirements of a wide range of stakeholders, a formal governance structure has been established.
Polars
Polars was introduced by Ritchie Vink as a quick dataframe library that makes use of all the cores on developers’ computers and is based on Apache Arrow. Polars is a Rust program created for the purpose of parallelizing dataframe query processing. It is being embraced by the community as a replacement for pandas since it addresses the memory-related issues that many Python writers have. It is a powerful single-node engine that enables programmers to scale in the face of enormous datasets.
Malloy
New to the scene from Google, Malloy is a relational algebra query language that is gaining popularity outside of the realm of major computer languages.
Malloy is experimental and provides data professionals with new(ish) methods for creating nested data. Each query a developer creates serves as a foundation stone for the next level of comprehension; hence, the more you construct, the more you comprehend.
Numba
Numba compiles Python code so that you can benefit from Python’s speed advantages while also using C’s parallelism and fast execution. It includes memory management that you may use to control GPU and CPU arrays and is compatible with NumPy and CuPy.
Tantivy
Tantivy is a Rust-based full-text search engine library.
In that it is not a ready-made search engine server but rather a crate that may be used to create such a search engine, it is more similar to Apache Lucene than to Elasticsearch or Apache Solr.
Tantivy’s design is very heavily influenced by Lucene’s. If you are seeking for an alternative to Elasticsearch or Apache Solr, check out Quickwit, our search engine built on top of Tantivy.
Grin
A Mimblewimble blockchain is implemented by Grin, an open source software project that fills in the blanks needed for a complete blockchain and cryptocurrency implementation.
The Grin project’s primary objective and features are:
Privacy is a given.
This permits perfect fungibility without precluding the ability to selectively share information as needed. In comparison to other blockchains, it scales primarily with the number of users and very little with the number of transactions (100 byte kernel).
Only Elliptic Curve Cryptography, which has been tried and proven for years, is used by Mimblewimble. Simple design that is simple to audit and maintain over time. Community driven, fostering mining decentralization.
If you are not a Medium member and you would like to gain unlimited access to the platform, consider using my referral link right here to sign up. It’s $5 a month and you get unlimited access to my articles and many others like mine. Thank you.
If you want to subscribe to my email list click here.
My business site is here