Wendelin is convergent platform for Big Data and Machine Learning and a variant of ERP5 with extensions for ndarrays, a core module managing RAM beyond physical limits and interfaces with libraries such as scikit-learn, jupyter, pandas, fluentD or embulk. Wendelin is hosted on SlapOS and uses NEO for data storage allowing to manage the data life cycle from ingestion to commercialisation. It is developed and maintained by Nexedi.
Wendelin originated from an idea of Jean-Paul Smets and Alexandre Gramfort and was launched by Nexedi at the MariaDB conference in 2014. It is being jointly developped between Nexedi and Télécom ParisTech.
Wendelin = scikit-learn + NEO
Wendelin combines the performance of scikit-learn machine learning with NEO distributed storage in order to provide out-of-core processing of large data sets. Main application fields are industrial big data collection, processing and storage. Any industrial problem of prediction can be adressed with Wendelin: mechanical health prediction, intrusion prediction, power prediction, et al. In addition, the support of other NumPy based libraries such as OpenCV or Pandas, allows Wendelin to be used in other fields such as video processing or finance.
Aside from support, consulting and custom development provided by Nexedi, Wendelin can be extended with open source or proprietary components to fit a given vertical big data market. The Wendelin project is looking for industrial partners willing to adapt Wendelin to more vertical markets and reinvest part of their revenue into Wendelin core and in particular into scikit-learn.
Wendelin architecture is based on 5 layers:
- Analytics layer: Wendelin leverages a wide variety of Numpy based analytics libraries such as scikit-learn, Pandas, NLTK, OpenCV-python, etc
- Storage layer: Wendelin stores native python objects on NEO distributed storage and thus eliminates format conversion steps found in other NoSQL technologies.
- Elasticity layer: Wendelin distributes data processing scripts on a cluster thanks to ERP5 active python object technology. Scripts are stored on NEO and can be modified in real time without any system restart.
- Deployment layer: Wendelin deployment is automated thanks to SlapOS mesh computing operating system. Analytics libraries are optimized automatically by SlapOS based on the targert CPU.
- Infrastructure Layer: Wendelin can be deployed on commodity hardware, private cloud or public cloud.
Wendelin architecture provides key features not found in other platforms:
- python based
- native code compiler for key algorithms
- GPU compiler for key algorithms
- native storage of low level matrix data structure
- best machine learning algorithms
- wide scientific community thanks to Numpy
- support 30+ years of FORTRAN optimizations
- distributed multi-index
- orthogonal index/storage topology for high throughput and fast access
Wendelin vs. HADOOP
Wendelin focuses on python based data analytics and in particular on Numpy standard whereas HADOOP mostly related to Java programming world. Thanks to this, Wendelin can benefit more quickly from the growing homogenization of scientific computing on python. Some similarities however exist between both architectures as illustrated in the following table, with some typical examples of software components used in both cases.
|High-level programming language
|Standard data structure
|Native x86 compiler
|Natural language processing
|Cloud deployment and orchestration
Automatic tests for Wendelin are run within the Nexedi test environment. The latest test results can be seen in the Nexedi Test Status for Wendelin.
Nexedi is working with Wendelin on client implementations and research projects. Please refer to the following examples for ideas on how Wendelin can be used:
Wendelin is Free Software, licensed under the terms of the GNU GPL v3 (or later). For rationale, please see Nexedi licensing.