Title:

Natural Language Processing

Code:ZPJe
Ac.Year:2016/2017
Term:Winter
Curriculums:
ProgrammeBranchYearDuty
IT-MGR-1HMGH-Recommended
IT-MSC-2MBI-Compulsory-Elective - group S
IT-MSC-2MBS-Elective
IT-MSC-2MGM-Elective
IT-MSC-2MIN-Elective
IT-MSC-2MIS-Elective
IT-MSC-2MMI-Elective
IT-MSC-2MMM-Elective
IT-MSC-2MPV-Elective
IT-MSC-2MSK-Elective
Language:English
Credits:5
Completion:examination (written)
Type of
instruction:
Hour/semLecturesSem. ExercisesLab. exercisesComp. exercisesOther
Hours:2600026
 ExaminationTestsExercisesLaboratoriesOther
Points:5190040
Guarantee:Smrž Pavel, doc. RNDr., Ph.D., DCGM
Lecturer:Smrž Pavel, doc. RNDr., Ph.D., DCGM
Faculty:Faculty of Information Technology BUT
Department:Department of Computer Graphics and Multimedia FIT BUT
 
Learning objectives:
  To understand natural language processing and to learn how to apply basic algorithms in this field. To get acquainted with the algorithmic description of the main language levels: morphology, syntax, semantics, and pragmatics, as well as the resources of natural language data - corpora. To conceive basics of knowledge representation, inference, and relations to the artificial intelligence.
Description:
  Foundations of the natural language processing, language data in corpora, levels of description: phonetics and phonology, morphology, syntax, semantics and pragmatics. Traditional vs. formal grammars: representation of morphological and syntactic structures, meaning representation. context-free grammars and their context-sensitive extensions, DCG (Definite Clause Grammars), CKY algorithm (Cocke-Kasami-Younger), chart-parsing. Problem of ambiguity. Electronic dictionaries: representation of lexical knowledge. Types of the machine readable dictionaries. Semantic representation of sentence meaning. The Compositionality Principle, composition of meaning. Semantic classification: valency frames, predicates, ontologies, transparent intensional logic (TIL) and its application to semantic analysis of sentences. Pragmatics: semantic and pragmatic nature of noun groups, discourse structure, deictic expressions, verbal and non-verbal contexts. Natural language understanding: semantic representation, inference and knowledge representations.
Knowledge and skills required for the course:
  Basic knowledge of C/C++ programming or a scripting language (Perl, Python, Ruby)
Subject specific learning outcomes and competences:
  The students will get acquainted with natural language processing and learn how to apply basic algorithms in this field. They will understand the algorithmic description of the main language levels: morphology, syntax, semantics, and pragmatics, as well as the resources of natural language data - corpora. They will also grasp basics of knowledge representation, inference, and relations to the artificial intelligence.
Generic learning outcomes and competences:
  The students will learn to work in a team. They will also improve their programming skills and their knowledge of development tools.
Syllabus of lectures:
 
  1. Introduction, history of NLP, subdisciplines
  2. How to build a Google-like search engine, text categorization, document similarity
  3. Morphological analysis, inflective and derivational morphology, trie structure for dictionaries
  4. Syntactical analysis, constituent and dependency structures, feature structures, grammar specification formats
  5. Grammar formalisms, categorial grammars, LFG, HPSG, LTAG
  6. Methods of syntactic analysis, CKY-algorithm, chart-parsing
  7. Korpus linguistics, treebanks, TBL method
  8. Probabilistic context-free analysis, automatic alignment, machine translation
  9. Lexical semantics, dictionaries vs. encyclopedias, compositionality
  10. Transparent intensional logic for the description of meaning
  11. Pragmatics, contextual meaning relations, dynamic semantics
  12. Knowledge representation, possible-world semantics, inference
  13. The Semantic Web technologies, ontologies, OWL
Syllabus - others, projects and individual work of students:
 
  • Individually assigned projects
Fundamental literature:
 
  • Allen, J., Natural language understanding. 2nd ed. Redwood City : Benjamin/Cummings Publishing Company, 1995. ISBN 0-8053-0334-0.
  • Manning, C. D., Schütze, H., Foundations of Statistical Natural Language Processing, MIT Press, 1999, ISBN 0-262-13360-1.
Study literature:
 
  • Manning, C. D., Schütze, H., Foundations of Statistical Natural Language Processing, MIT Press, 1999, ISBN 0-262-13360-1.
Controlled instruction:
  The evaluation includes mid-term test, individual project, and the final exam. The mid-term test does not have a correction option, the final exam has two possible correction terms
Progress assessment:
  
  • Mid-term test - up to 9 points
  • Individual project - up to 40 points
  • Written final exam - up to 51 points
Exam prerequisites:
  
  • Realized individual project