Wendelin Home Wendelin

    Analyzing Data With Wendelin

    A tutorial explaining how to use Wendelin to work with ingested data, including using out-of-core capabilities.
    • Last Update:2016-06-24
    • Version:
    • Language:

    Analyse: Work with Ingested Data

    Out-of-Core

    Wendelin Out of Core Computing
    • Wendelin.Core enables computation beyond limits of existing RAM
    • We have integrated Wendelin and Wendelin.Core With Jupyter
    • ERP5 Kernel (out-of-core compliant) vs. Python 2 Kernel (default)

    Todo: Head to Jupyter (Notebook)

    Wendelin-ERP5 - Juypter Interface
    • Head to Juypter http://[x].pydata-class.erp5.cn
    • Start a new ERP5 Notebook
    • This will make sure you use the ERP5 Kernel
    • The Python 2 Kernel is the default Jupyter Kernel
    • Using Python 2 will disregard Wendelin and Wendelin.Core, so it's basic Jupyter
    • Using ERP5 Kernel will use Wendelin.core in the background
    • To make good use of it, all code written should be Out-of-core "compatible"
    • For example you should not just load a large file into memory (see below)

    Todo: Learn ERP5 Kernel (Notebook)

    Wendelin-ERP5 - Juypter Interface Introduction
    • Note you have to connect to Wendelin/ERP5
    • The reference you set will store your notebook in the Date Notebook Module
    • Passing login/password will authenticate Juypter with Wendelin/ERP5
    • Note that your ERP5_URL in this case should be your internal url
    • You can retrieve it be running erp5-show -s in your webrunner terminal
    • Note, outside of the tutorial we would set the external IPv6 adress of ZOPE

    Todo: Getting Started (Notebook)

    Wendelin-ERP5 - Juypter Access Wendelin
    • Connect, set arbitrary reference and authenticate

    Todo: Accessing Objects (Notebook)

    Wendelin-ERP5 - Juypter Accessing Objects
    • Import necessary libs
    • Type context , this will give you the Wendelin/ERP5 Object
    • Type context.data_stream_module["1"] to get your uploaded sound file
    • Accessing data works the same ways throughout [IPv6]:30002/erp5/[module_name]/[id]
    • All modules you see on the Wendelin/ERP5 start page can be accessed like this
    • Once you have an object you can manipulate it
    • Note that accessing a file by internal id (1) is only one way
    • The standard way would be using the reference of the respective object, which will also allow to user portal_catalog to query

    Todo: Accessing Data Itself (Notebook)

    Wendelin-ERP5 - Juypter Accessing Data Itself
    • Try to get the length of the file using getData and via iterate
    • Note then when using ERP5 kernel all manipulations should be "Big Data Aware"
    • Just loading a file via getData() works for small files, but will break with volume
    • It's important to understand that manipulations outside of Wendelin.Core need to be Big Data "compatible"
    • Internally Wendelin.Core will run all manipulations "context-aware"
    • An alternative way to work would be to create your scripts inside Wendelin/ERP5 and call them from Juypter
    • Scripts/Manipulations are stored in Data Operations Module

    Todo: Compute Fourier (Notebook)

    Wendelin-ERP5 - Juypter Compute Fourier Series
    • Proceed to fetch data using getData for now
    • Extract one channel, save it back to Wendelin and compute FFT
    • Note, that ERP5 kernel at this time doesn't support %matplotlib inline
    • Note the way to call methods from Wendelin/ERP5 (Base_renderAsHtml )
    • Wendelin/ERP5 has a system of method acquistion. Every module can come with its own module specific methods and method names are always context specific ([object_name]_[method_name] ). Base methods on the other hand are core methods of Wendelin/ERP5 and applicable to more than one object.

    Todo: Display Fourier (Notebook)

    Wendelin-ERP5 - Juypter Display Fourier Series
    • Check the rendered Fourier graphs of your recorded sound file

    Todo: Save Image (Notebook)

    Wendelin-ERP5 - Juypter Save Image
    • Save the image back to Wendelin/ERP5.

    Todo: Create BigFile Reader (Notebook)

    Wendelin-ERP5 - Create Big File Class
    • Add a new class BigFileReader
    • Allows to pass out-of-core objects

    Todo: Rerun using Big File Reader (Notebook)

    Wendelin-ERP5 - Juypter Rerun using Big File Reader
    • Rerun using the Big File Reader
    • Now one more step is out of core compliant
    • Verify graphs render the same
    • We are now showing how to step by step convert our code to being Out-of-Core compatible
    • This will only be possible for code we write ourselves
    • Whenever we have to rely on 3rd party libraries, there is no guarantee that data will be handled in the correct way. The only option to be truly Out-of-Core is to either make sure the 3rd party methods used are compatible and fixing them accordingly/committing back or to reimplement a 3rd party library completely.

    Todo: Redraw from Wendelin (Notebook)

    Wendelin-ERP5 - Juypter Recover Plot
    • Redraw the plot directly from data stored in Wendelin/ERP5

    Todo: Verify Images are Stored

    Wendelin-ERP5 - Image Module
    • Head back to Wendelin/ERP5
    • Go to Image module and verify your stored images are there.

    Todo: Verify Data Arrays are Stored

    Wendelin-ERP5 - Data Array Module
    • Switch to the Data Array module
    • Verify all computed files are there.