THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2017/2018

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Informatics : Informatics

Postgraduate Course: Advanced Topics in Foundations of Databases (INFR11122)

Course Outline
SchoolSchool of Informatics CollegeCollege of Science and Engineering
Credit level (Normal year taken)SCQF Level 11 (Postgraduate) AvailabilityAvailable to all students
SCQF Credits20 ECTS Credits10
SummaryThe course focuses on three central aspects of big data: Volume, Variety and Veracity. It will cover tractability and parallel scalability of querying big data (volume), data models and data interoperability (variety), and foundations of data quality and uncertainty (veracity). It aims to expose students to current research and development in connection with big data theory, and prepare them for conducting research in this emerging area. The course content is dynamic and continuously updated to cover the state-of-the-art in big data theory.
Course description * Background: Fundamental challenges introduced by querying big data; the need for revising the classical computational complexity theory in the context of big data; modelling computational costs and communication costs; BD-tractability: the tractability of queries on big data; the challenges to query data residing in multiple sources; the need to study data quality, the other side of big data.

* Volume: (1) the feasibility of computing exact query answers in big data within our available resources: parallel scalability, scale independence, techniques for making big data small; (2) approximate query answering: (a) query-driven approximation, envelopes with absolute approximation bounds, (b) data-driven approximation, synopsis-based approximate query answering, and (c) resource-bounded approximate query answering and anytime approximation.

* Variety: data can be in different formats, and come from different sources and/or applications. We shall cover: (a) popular data models, including relational, XML, and graph models, and languages for them, and (b) handling queries over data residing in multiple sources, focusing on both virtual and materialized integration, and efficient query answering.

* Veracity: big data = data quantity + data quality; (1) central issues of data quality: data consistency, data accuracy, information completeness, data currency (timeliness), entity resolution; (2) improving data quality: consistency query answering, data repairing, certain fixes; (3) knowledge bases as master data, deducing the true values of entities; (4) handling poor quality information, understanding current technologies and their deficiencies, correctness guarantees.


Big data is the next frontier for innovation, competition and productivity. This course will cover fundamental issues in connection with three of four big V's in the typical characterization of big data, namely, Volume, Variety and Veracity.
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Co-requisites
Prohibited Combinations Other requirements The course assumes a strong computer science background, in particular algorithm design and the ability to prove intractability. An emphasis on data management is welcome, such as
relational databases and query languages.
Information for Visiting Students
Pre-requisitesNone
High Demand Course? Yes
Course Delivery Information
Academic year 2017/18, Available to all students (SV1) Quota:  None
Course Start Semester 2
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 200 ( Lecture Hours 14, Seminar/Tutorial Hours 6, Programme Level Learning and Teaching Hours 4, Directed Learning and Independent Learning Hours 176 )
Assessment (Further Info) Written Exam 0 %, Coursework 100 %, Practical Exam 0 %
Additional Information (Assessment) For proper evaluation, students must be presented with real problems, rather than 'toy' ones which can be solved in a very limited time. There will be a list of projects given out to the students at the beginning of the semester from which the students will be able to pick one. The project is research-oriented, to solve a simple research problem by developing algorithms, proofs and analyses. Each student is expected to present their report in class.

The students will deliver their work in four instalments:
- an essay on the volume of big data at the end of Week 3 (worth 15%);
- an essay on the variety of big data at the end of Week 6 (worth 15%);
- an essay on the veracity of big data at the end of Week 9 (worth 15%);
- a final project report (40%)
- a presentation of their work at the end of the semester (worth 15%).

Students are expected to spend around 90 hours working independently their assessed coursework, outside of lectures and direct supervision hours. This includes writing reports during semester and preparing an oral presentation of their project work to the class.
* Three essays: 24 hours in total, 8 hours each;
* Project: 60 hours;
* Oral presentation preparation: 6 hours
Feedback Not entered
No Exam Information
Learning Outcomes
On completion of this course, the student will be able to:
  1. Demonstrate an understanding of theory and techniques for querying big data (volume), including BD-tractability, parallel scalability, scale-independent queries, query-driven approximation and data-driven approximation.
  2. Demonstrate knowledge of coping with the variety of big data, including popular data models and languages for them, and techniques for answering queries in big data residing in multiple sources, focusing on both virtual and materialized integration.
  3. Demonstrate an understanding of techniques for improving the quality of big data (veracity): data consistency, data accuracy, data currency, information completeness, and entity resolution; data quality rule discovery, error detection, data repairing, consistent query answering, certain fixes and conflict resolution.
  4. Complete a project for solving simple research problems, by providing proofs, algorithms and analyse.
  5. Write a project report and present the project in class.
Reading List
* Marcelo Arenas, Pablo Barcelo, Leonid Libkin, Filip Murlak: Foundations of Data Exchange. Cambridge University Press 2014 (a shorter Morgan & Claypool version from 2010 is available for free with institutional subscription);
* Wenfei Fan, Floris Geerts: Foundations of Data Quality Management. Morgan & Claypool Publishers 2012 (available for free with
institutional subscription)
Additional Information
Course URL http://course.inf.ed.ac.uk/atfd/
Graduate Attributes and Skills Not entered
KeywordsDatabase systems,data management,big data,scalability,data exchange and integration,data qualit
Contacts
Course organiserDr Andreas Pieris
Tel: (0131 6)51 5606
Email:
Course secretaryMs Katey Lee
Tel: (0131 6)50 2701
Email:
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information