Time | 20.07 | 21.07 | 22.07 | 23.07 | 24.07 | 25.07 | 26.07 | 27.07 | 28.07 |
---|---|---|---|---|---|---|---|---|---|
08.00 09.00 | Breakfast | ||||||||
09.00 12.00 | Statistics for Data Science | Statistical learning (hands-on) | Machine Learning (hands-on) | Time Series (incl. hands-on) | Knowledge Graphs for conversational AI (incl. (hands-on) | FAIR Data and best practices in data sharing | Operationalizing Machine Learning pipelines | Departures | |
12.00 14.30 | Arrivals | Lunch break and socializing activities | |||||||
14.30 17.30 | Data Science with Python (hands-on) | Intro to Machine Learning | Deep Learning with Neural Networks (incl. hands-on) | Introduction to Graph Data (incl. hands-on) | Social Event | Data enrichment (incl. hands-on) | Operationalizing Machine Learning pipelines (hands-on) | ||
17.30 19.00 | Intro event | Free time | Free time | ||||||
19.00 21.00 | Dinner |
Detailed information about the activities
Statistics for data science (Dan Nicolae)
- A data science pipeline
- Data exploration
- Statistical inference with resampling methods
Data science with Python (hands-on) (Dan Nicolae, Razvan Bunescu)
- Intro to Python
- Pandas and data frames
- Probability and simulations
Statistical learning (hands-on) (Dan Nicolae)
- Regression models and inference
- Prediction and classification
Intro to machine learning (Razvan Bunescu & ChatGPT)
- Feature vector representations
- ML for Classification
- ML for Regression
- Clustering
Machine learning (hands-on) (Razvan Bunescu & ChatGPT)
- ML algorithms in Python
- Implementation using NumPy
- The sklearn library
- Visualization using Matplotlib
- Experimental evaluation of ML models
- Linear vs. non-linear classification
Deep learning with neural networks (incl. hands-on) (Pawel Gasiorowski)
- Deep learning with Artificial Neural Networks
- Image Processing, Object Classification and Detection with Convolutional Neural Networks
- Implementation in Tensorflow Keras
- Regression and gradient descent
- Activation Functions, Feedforward Process, Error Functions, Optimizers, Backpropagation
- Logistic regression and NNs for non-linear classification
- Transfer Learning technique
Time-series analytics (Jože Rožanec)
- Time series analytics techniques: filtering methods, interpolation, extrapolation, prediction with ML
- Time series databases
- Python libraries for working with time-series data
Introduction to graph data (incl. hands-on) (Dumitru Roman, Brian Elvesæter, Radu Prodan)
- Introduction to graph data
- Knowledge Graphs
- NoSQL databases
- Graph databases (focus on Neo4j)
- Data model and data modeling
- Query language
- Graph algorithms / analytics / ML
- Introduction to massive graphs
Knowledge graphs for conversational AI (Ioan Toma)
- Introduction to Semantic Knowledge Graphs and their role in building intelligent chatbots
- Understanding knowledge modeling and ontology development for building Knowledge Graphs for Conversational AI
- Data import and mapping techniques to populate the Knowledge Graphs
- Overview of conversational setup and designing a chatbot interface
- Building a chatbot using Onlim Conversational AI framework
- Integration of Knowledge Graph data with the chatbot using API calls
- Querying and accessing Knowledge Graph data through Chatbots
FAIR data and best practices in data sharing (Anna Fensel)
- Introduction to FAIR data
- How to make data FAIR?
- Sharing data effectively with semantic technology:
- Open data vs. closed data
- Consent, contracts, licenses, legal compliance
- Research data infrastructures
Data enrichment (Dumitru Roman, Nikolay Nikolov)
- Data preparation; cleaning, annotating and enriching data
- Semantic data enrichment
- Tools for semantic enrichment
- Data enrichment pipelines
- Example application for data enrichment
Operationalizing machine learning pipelines (Wiktor Sowinski-Mydlarz)
- What are Machine Learning pipelines
- Introduction to Software Containers and Cloud
- Deployment, orchestration, monitoring of ML pipelines on the Cloud – using python libraries
- Applications example
Required software
Software tools/services to be used during the sessions and hands-on include:
- Anaconda (https://www.anaconda.com): Installation instructions for various platforms can be found at: https://docs.anaconda.com/anaconda/install
- A number of relevant tools and libraries that we will use can be configured from Anaconda: Python 3, NumPy, SciPy, Matplotlib, Jupyter Notebook, Ipython, Pandas, and Scikit-learn.
- For Deep Learning we will use Keras Tensorflow (https://keras.io) to build and train ANN/CNN models. This can be installed from Anaconda, with ‘conda’ from the command line, the actual command line depends on the platform and version (conda install -c conda-forge keras)
- Onlim Platform (https://app.onlim.com/): Conversational and Knowledge Graph Platform. Accounts can be created https://auth.onlim.com/auth/realms/onlim/login-actions/registration?client_id=onlim&tab_id=gmTCMEh3-6U
- DataGraft (https://datagraft.io): Software as a service that requires sign up for a free (DataGraft platform account at https://datagraft.io/users/sign_up where you will be required to provide a username, email and password to access the website).
- Neo4j (https://neo4j.com): Installation and documentation can be found at https://neo4j.com/developer/get-started.We will use the online sandbox service provided at https://neo4j.com/sandbox, so no installation on local machines is needed for experimenting with Neo4j. Alternatively you can download and install Neo4j Desktop, which provides a convenient way for developers to work with local Neo4j databases (this can be downloaded from https://neo4j.com/download-center/#desktop). We will also use Neo4j Graph Data Science (https://neo4j.com/product/graph-data-science) which comes with Neo4j.
- Docker (https://www.docker.com): An open-source containerization platform that will be used for ML pipelines. Installation instructions can be found at https://docs.docker.com/engine/install.