Show simple item record

dc.contributor.authorHiguchi, Suemi
dc.contributor.authorFreitas, Cláudia
dc.contributor.authorClaro, Bruno Cuconato
dc.contributor.authorAlexandre, Rademaker
dc.date.accessioned2020-05-25T21:16:08Z
dc.date.available2020-05-25T21:16:08Z
dc.date.issued2018
dc.identifier.urihttps://hdl.handle.net/10438/29143
dc.description.abstractThis paper presents the initial efforts towards the creation of a new corpus on the history domain. Motivated by the historians’ need to interrogate a vast material - almost 9 million words - in a non-linear way, our approach privileges deep linguistic analysis on an encyclopaedic style data. In this context, the work presented here focuses on the preparation of the corpus, which is prior to the mining activity: the morphosyntactic annotation, the definition of semantic types for named entity (NE) and named entities relations relevant to the History domain. Taking advantage of the semantic nature of appositive structures, we manually analysed a sample of 1,049 sentences in order to verify its potential as additional semantic clues to be considered. The results show that we are on the right track.por
dc.language.isoeng
dc.subjectDigital humanitiespor
dc.subjectText miningpor
dc.subjectCorpus annotationpor
dc.subjectAppositivespor
dc.titleText mining for history: first steps on building a large datasetpor
dc.typePreprinteng
dc.subject.areaCiências sociaispor
dc.contributor.unidadefgvDemais unidades::RPCApor
dc.subject.bibliodataMineração de dados (Computação)por
dc.subject.bibliodataHistóriapor
fgv.relation.ispartofTecnologia aplicada à pesquisa com fontes primáriaspor
fgv.relation.ispartofProjetos de Pesquisa Aplicadapor


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record