Schedule – datascience.ase.ro

Detailed information about the activities

Statistics for data science (Dan Nicolae)

Foundations of data analysis
Statistical inference with resampling methods
Probability and simulations

Machine learning (Dan Nicolae)

Linear models and inference
Model complexity
Prediction and classification
Neural networks

Large Language Models (LLMs) – reasoning capabilities and model calibration (Cornelia Caragea)

Prompting strategies in LLMs – Zero-Shot vs. In-Context Learning
LLMs reasoning capabilities
LLMs calibration – do they know what they do not know?

Knowledge graphs (Dumitru Roman and Roberto Avogadro)

Intro to graph data structure
Knowledge Graphs
Graph data management (graph databases with Noe4j, graph data model, graph construction and querying)

LLMs and Agentic AI (Ioan Toma)

Introduction to Agentic AI
Agent Frameworks

Conversational AI (Ioan Toma)

Conversational AI setup and designing a chatbot interface
Semantic Knowledge Graphs and their role in Conversational AI
Building a chatbot using Onlim Conversational AI framework

Time series analysis and forecasting (Jože Rožanec)

Introduction to time series
Analysis tools and real-world examples
Time series forecasting

Time series: Forecasting, XAI, and databases (Jože Rožanec)

Using network models to represent and forecast time series
Introduction to explainability methods
Introduction to time series databases

High performance data processing (Radu Prodan)

Parallel computing architectures
Multiprocessing
Parallel algorithms
Parallel computing for AI and data science

Data/AI pipelines (Nikolay Nikolov)

Introduction to data/AI pipelines
Data/AI pipelines using containers

Operationalizing data and AI pipelines (Wiktor Sowinski-Mydlarz)

Contemporary data processing
GATE Institute Data Platform
Alternatives and decisions
Pipeline lifecycle

Management of data and AI pipelines (Wiktor Sowinski-Mydlarz)

Deployment of data and ML pipelines
Orchestration of data and ML pipelines
Monitoring of data and ML pipelines

Findable, Accessible, Interoperable, Reusable (FAIR) data (Anna Fensel)

Introduction to FAIR data. Examples from agri-food and health domains
How to make data FAIR? Open data, closed data and everything in between
Research data infrastructures

Best practices in data sharing (Anna Fensel)

Legal compliance (GDPR, AI Act, Data Act)
Consent, contracts and licenses, empowered with knowledge graphs
Incentivising data sharing

Petrom Case Study

Fabian Wedam – VP Controlling, Performance and Advanced Analytics
Simona Gradinaru – Senior Expert Data Science
Adrian Domenteanu – Professional Controlling Data Science

Software (preliminary): Software tools/services to be used during the sessions include:

Anaconda (https://www.anaconda.com): Installation instructions for various platforms can be found at: https://docs.anaconda.com/anaconda/install
- A number of relevant tools and libraries that we will use can be configured from Anaconda: Python 3, NumPy, SciPy, Matplotlib, Jupyter Notebook, Ipython, Pandas, and Scikit-learn.
Other Python packages: statsmodels, transformers, lingam
Onlim Platform (https://app.onlim.com/): Conversational and Knowledge Graph Platform. Accounts can be created https://auth.onlim.com/auth/realms/onlim/login-actions/registration?client_id=onlim&tab_id=gmTCMEh3-6U
Neo4j (https://neo4j.com): Installation and documentation can be found at https://neo4j.com/developer/get-started.We will use the online sandbox service provided at https://neo4j.com/sandbox, so no installation on local machines is needed for experimenting with Neo4j. Alternatively you can download and install Neo4j Desktop, which provides a convenient way for developers to work with local Neo4j databases (this can be downloaded from https://neo4j.com/download-center/#desktop). We will also use Neo4j Graph Data Science (https://neo4j.com/product/graph-data-science) which comes with Neo4j.
Docker (https://www.docker.com): An open-source containerization platform that will be used for ML pipelines. Installation instructions can be found at https://docs.docker.com/engine/install.
SIM-PIPE (https://github.com/DataCloud-project/SIM-PIPE): An open-source tool for dry running of Big Data Pipelines using sample data. The tool allows evaluating pipeline performance and resource requirements at scale. An open version of the tool is available on
.