Artificial Intelligence & Big Data

artificial intelligence & big data

Content Processing and Acquisition

The robustness recently achieved by NLP technologies makes their applicability very promising for the support in the design of advanced system development. Human Language Technologies (HLT) for Content Acquisition favor the incremental design of unstructured data processing systems through reuse.

HLTs are crucial for robust and accurate analysis of unstructured text, and for enriching them with semantic meta-data or other implicit information. They allow the extraction of interesting semantic phenomena and their mapping into structured representations of a target domain.

When a semantic meta-model is available, for example in the form of an existing ontology, HLT allows concepts to be located in text (independently of the variable forms in which they appear), marked according to Knowledge Representation Languages (such as RDF or OWL), thus unifying different shallow representations of the same concepts.

In this way, semantic annotations are obtained for the original document, making it more suitable for clustering, retrieval, and browsing activities. In synthesis, HLT enables and simplifies advanced functionalities (e.g. semantic search) over text.

Knowledge Engineering, Semantic Web and Linked Data

Knowledge Engineering (KE) deals with the translation of human expertise about a specific field or domain into artifacts known as knowledge bases. The intention of the term is broad, encompassing all aspects—scientific, technical, and social—related to this creative act.

In the era of the Web, the need emerged to collect, represent, and expose data so that information can be consumed not only by humans but also by machines in convenient ways. This fosters a global ecosystem of agents able to retrieve, exchange, and understand data, and to cooperate in solving complex tasks.

This extension of the Web is called the Semantic Web. Data published on the Web, represented according to well-defined standards and interconnected with other data, is called Linked Data.

Machine Learning

Statistical learning methods assume that lexical or grammatical observations provide useful hints for modeling semantic inferences. Linguistic observations supply features to learning methods, which are generalized into predictive components in the final model induced from training examples.

In Mitchell (1997), Tom Mitchell provided the following definition of a learning program:

“ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“

Web & Information Retrieval

Modern Information Technology systems need to access the vast amount of information that is stored and continuously produced on the Web.

Most human knowledge is represented and expressed through language, and the proper application of Natural Language Processing (NLP) techniques is crucial for effectively exploiting such data.