Wendelin Home Wendelin

    Wendelin

    Wendelin is convergent platform for Big Data and Machine Learning and a variant of ERP5 with extensions for ndarrays, a core module managing RAM beyond physical limits and interfaces with libraries such as scikit-learn, jupyter, pandas, fluentD or embulk. Wendelin is hosted on SlapOS and uses NEO for data storage allowing to manage the data life cycle from ingestion to commercialisation. It is developed and maintained by Nexedi.

    Wendelin originated from an idea of Jean-Paul Smets and Alexandre Gramfort and was launched by Nexedi at the MariaDB conference in 2014. It is being jointly developped between Nexedi and Télécom ParisTech.

    Wendelin = scikit-learn + NEO

    Wendelin combines the performance of scikit-learn machine learning with NEO distributed storage in order to provide out-of-core processing of large data sets. Main application fields are industrial big data collection, processing and storage. Any industrial problem of prediction can be adressed with Wendelin: mechanical health prediction, intrusion prediction, power prediction, et al. In addition, the support of other NumPy based libraries such as OpenCV or Pandas, allows Wendelin to be used in other fields such as video processing or finance.

    Support Services

    Aside from support, consulting and custom development provided by Nexedi, Wendelin can be extended with open source or proprietary components to fit a given vertical big data market. The Wendelin project is looking for industrial partners willing to adapt Wendelin to more vertical markets and reinvest part of their revenue into Wendelin core and in particular into scikit-learn.

    News

    Architecture

    Wendelin architecture is based on 5 layers:

    • Analytics layer: Wendelin leverages a wide variety of Numpy based analytics libraries such as scikit-learn, Pandas, NLTK, OpenCV-python, etc
    • Storage layer: Wendelin stores native python objects on NEO distributed storage and thus eliminates format conversion steps found in other NoSQL technologies.
    • Elasticity layer: Wendelin distributes data processing scripts on a cluster thanks to ERP5 active python object technology. Scripts are stored on NEO and can be modified in real time without any system restart.
    • Deployment layer: Wendelin deployment is automated thanks to SlapOS mesh computing operating system. Analytics libraries are optimized automatically by SlapOS based on the targert CPU.
    • Infrastructure Layer: Wendelin can be deployed on commodity hardware, private cloud or public cloud.

    Wendelin architecture provides key features not found in other platforms:

    • python based
    • native code compiler for key algorithms
    • GPU compiler for key algorithms
    • native storage of low level matrix data structure
    • best machine learning algorithms
    • wide scientific community thanks to Numpy
    • support 30+ years of FORTRAN optimizations
    • distributed multi-index
    • orthogonal index/storage topology for high throughput and fast access

    Wendelin vs. HADOOP

    Wendelin focuses on python based data analytics and in particular on Numpy standard whereas HADOOP mostly related to Java programming world. Thanks to this, Wendelin can benefit more quickly from the growing homogenization of scientific computing on python. Some similarities however exist between both architectures as illustrated in the following table, with some typical examples of software components used in both cases.

      Wendelin HADOOP
    High-level programming language Python  Java
    Low-level language C/C++/FORTRAN N/A
    Standard data structure Numpy N/A
    Native x86 compiler Numba N/A
    GPU compiler Parakeet N/A
    Machine learning Scikit-learn Weka
    Distributed storage NEO Spark
    Distributed processing ERP5 Activity Job Tracker
    Management portal ERP5 Data Cloudera Manager
    Natural language processing NLTK Lucene
    Video processing OpenCV-python N/A
    Financial statistics Pandas N/A
    Distributed index MariaDB
    TokuDB
    Spider
    Sphinx
    Solr
    Cloud deployment and orchestration SlapOS Zookeeper

    Documentation

    Source Code

    Tutorials

    HowTo

    FAQ

    Other documents

    Tests

    Automatic tests for Wendelin are run within the Nexedi test environment. The latest test results can be seen in the Nexedi Test Status for Wendelin.

    Examples

    Nexedi is working with Wendelin on client implementations and research projects. Please refer to the following examples for ideas on how Wendelin can be used:

    Licence

    Wendelin is Free Software, licensed under the terms of the GNU GPL v3 (or later). For rationale, please see Nexedi licensing.