Global RCAAP Repository | PISTA Digital

Automatic label correction while learning classification models with genetic programming

Um dos principais tipos de aprendizagem automática é a aprendizagem super- visionada, em que se faz uso de conjuntos de dados etiquetados por especialistas da área à qual o conjunto de dados pertence. Estes conjuntos de dados são usados para treinar modelos de previsão que, quando recebem novos inputs pertencentes ao mesmo domínio do problema que aprenderam, conseguem fazer uma previsão fidedigna do seu valor. No entanto, devido a vários fatores como o volume de da- dos nestes conjuntos, a facilidade de ocorrerem erros humanos e a complexidade do problema a complicar a classificação (entre outros), a presença de erros nos conjuntos de dados é quase inevitável. Estes erros levam regularmente a problemas com os modelos que foram evoluídos usando dados mal etiquetados e, dependendo dos parâmetros do algoritmo usado, mesmo uma baixíssima percentagem de erros pode ter consequências claramente negativas para as classificações feitas por estes modelos. Tendo em conta que procurar estes erros manualmente é impraticável para conjuntos de dados destas dimensões, diversos métodos foram criados para evitar as consequências deste ruído nas etiquetas, que pertencem ou à categoria de algoritmos robustos (resistentes à existência de ruído) ou então à dos métodos de tratamento de dados (encontrar e tratar dos erros no conjunto de dados antes de correr o algoritmo em si). Neste documento apresentamos um novo método que não pertencem a nenhuma das categorias e que tenta permitir ao algoritmo detetar os erros durante a aprendizagem e os corrigir. A abordagem faz uso de Programação Genética, o algoritmo de computação evolucionária que evolui programas começando a partir de uma população gerada aleatoriamente e baseando-se no conceito de evolução Darwiniana(usando operações genéticas e reproduzindo novos programas usando como pais os melhores indivíduos da geração anterior). Para se guiar durante a evolução de novos indivíduos, a Programação Genética usa uma medida chamada fitness que determina quão bem os programas evoluídos resolveram o problema que lhes foi apresentado. Este método baseou-se na ideia de confiar nos modelos evoluídos mais do que nas etiquetas dos dados, tendo como ideia principal a redução do peso destas etiquetas originais na determinação final de se uma amostra estava corretamente etiquetada ou não. Antes de o método permitir a alteração de uma etiqueta original é necessário que as previsões dos modelos passem certos testes. Estas previsões são ajustadas para etiquetas previstas (aplicando-lhes um valor limite de transformação) que são depois usadas em uma ou mais verificações e, se estas forem satisfeitas, então a etiqueta é considerada certa e comparada com a etiqueta original do conjunto de dados. Caso as duas etiquetas forem diferentes então a etiqueta original é marcada para ser corrigida pelo método, o que resulta em um conjunto de dados novo com as correções realizadas ao longo das gerações, no fim da evolução. O método apresentado foi baseado em outra implementação que faz uso de aprendizagem semi-supervisionada, um outro tipo de aprendizagem mais recente que as duas principais (supervisionada e não supervisionada) que aplica conceitos de ambas as abordagens originais, sendo o mais relevante neste caso o seu conjunto de dados etiquetados que é estendido com uma componente sem etiquetas. Esta implementação é adaptada com um método especial de calcular a fitness dos modelos evoluídos fazendo uso da componente estendida do conjunto de dados. Esta função de fitness faz os cálculos necessários da mesma maneira que uma função normal, usando as etiquetas originais, até lhe ser apresentada uma amostra do conjunto extendido, sem etiqueta. Neste caso a previsão do modelo é ajustada para uma etiqueta prevista(como foi mencionado em cima) e é usada para o cálculo, o que permite que os modelos evoluídos consigam aprender tendo em conta não só as etiquetas do conjunto de dados original como também as etiquetas previstas pelos modelos. De modo a que ambos os métodos pudessem ser comparados, esta abordagem também foi implementada. Cada implementação teve a sua metodologia, sendo que para realizar o método de correção de etiquetas foi necessário começar por uma análise exploratória, visto que nenhuma documentação foi encontrada relativa ao tema de correção de etiquetas em classificação supervisionada. Esta análise envolveu criar diversas versões do método corretivo, testando-as em conjuntos de dados sintéticos criados pela sua simplicidade de manipulação e baixa complexidade. Eventualmente determinou-se a melhor abordagem a seguir, baseada numa combinação de diversos critérios que todos devem ser cumpridos antes que a correção de uma etiqueta seja realizada. Durante esta análise também foi detetado algum favorecimento em relação às previsões feitas pelos modelos, sendo que as amostras de uma classe eram mais facilmente corrigidas que as da outra. Vários testes foram realizados de modo a verificar se este comportamento era consistente, o que se verificou, levando a que as implementações e testes finais de ambos os métodos fosse feita usando codificação bipolar em vez de binária. Devido a isto, foram feitos ajustes de escala para que os valores dos atributos e etiqueta dos conjuntos de dados usados ficassem colocados entre os valores [-1, 1]. A segunda etapa desta metodologia envolveu a implementação de diversos critérios que a abordagem pudesse usar para determinar se uma etiqueta está correta ou não, tendo sidos criados 3 critérios (dois dos quais se basearam em ideias anteriores para critérios de correção nas primeiras versões da análise exploratória)que usam as previsões feitas pelos modelos em comparações com certos aspetos do processo evolutivo relevantes para a determinação do valor certo da etiqueta. A terceira etapa correspondeu aos testes para determinar a melhor combinação destes critérios, que demonstrou ser uma mistura de todos os três. Para a abordagem semi-supervisionada a metodologia consistiu em replicar os passos seguidos pelos autores do artigo em que foi apresentada a implementação. O primeiro destes passos foi implementar a extensão do conjunto de treino dos dados(usando as mesmas amostras, mas sem etiqueta) em código para cada conjunto de dados que fosse fornecido; isto foi necessário para evitar fornecer informação a mais ao conjunto de treino que pudesse afetar os resultados dos testes que se seguiriam. O segundo passo foi implementar uma nova versão da função responsável por calcular a ser calculado. No nosso caso este cálculo foi ajustado da maneira mencionada acima, fazendo uso também das etiquetas previstas quando necessário. Para ambas as abordagens, foi necessário implementar scripts de recolha de da- dos. No caso do método corretivo este era responsável por obter os conjuntos de dados alterados pelo método (um por cada run do teste) e verificar que etiquetas foram alteradas e quantas vezes, determinando as etiquetas mais consideradas erradas pela abordagem. No caso do método semi-supervisionado o script era responsável por obter os modelos evoluídos e os usar para prever os valores das etiquetas do conjunto de dados usado inicialmente. O mesmo script corre a análise dos dados recolhidos, comparando as previsões feitas com as etiquetas originais e verificando quantas e quais das amostras tiveram a maior percentagem de mudanças feitas à sua etiqueta. Os resultados obtidos por ambas as abordagens foram comparados entre si e com os resultados de modelos evoluídos normalmente, sem aplicar nenhuma das abordagens, o que nos permitiu realizar duas observações: primeiro, os modelos normais têm alguma capacidade de deteção das etiquetas erradas e de evitação da aprendizagem destes erros; segundo, ambas as abordagens resultam numa maior percentagem de mudanças às etiquetas do conjunto de dados, sendo que a parte do conjunto mais afetada por estes aumento é a mesma para ambas as abordagens. Verificou-se que esta seção do conjunto de dados é precisamente aquela em que os erros estão localizados e que, embora ambos os métodos implementados levem a um aumento de mudanças presumivelmente incorretas à volta das amostras que se sabe estarem erradas, também aumentam a percentagem de mudança dessas para 100%, algo que os modelos normais não atingiram. Apesar disto é claro que o método corretivo tem falhas devido a permitir a alteração de demasiadas etiquetas corretas quando comparado com as outras abordagens, pelo que é necessário melhorar a abordagem de modo a que detete e corrija os erros antes de os aprender para que se evite alterações incorretas.

2025-10-28T12:08:55Z

http://hdl.handle.net/10451/63617

Ferreira, Tomás da Silva

DNA metabarcoding reveals the diet of the invasive fish Oreochromis mossambicus in mangroves of São Tomé Island (Gulf of Guinea)

Invasive species can trigger profound effects on recipient ecosystems, namely through the food web. Despite being recognized as one of the worst invasive species, little is known about the feeding ecology of the Mozambique tilapia Oreochromis mossambicus. To understand how this invasive species might impact food webs, we applied metabarcoding to analyze its diet’s composition in two African mangroves, in the Obô Natural Park in the oceanic island of São Tomé. Given the particular importance of mangroves as fish nurseries, we specifically aimed to determine if this invader might predate on other fish species. However, we found that tilapia were mostly phytoplanktivorous and indication on predation of other fish was very limited. Instead, due to their local high densities, tilapia may impact basal trophic levels and nutrient availability with the potential to cascade through the food web by means of bottom-up disruption. In addition, we recorded important changes in the taxonomic composition of the diet, linked to locations and life stages, suggesting that its opportunistic feeding associated with its aggressive territorial behavior may result in resource competition with native species with which it has overlapping dietary niches.

2025-10-28T12:17:32Z

http://hdl.handle.net/10451/63618

Nogueira, S. Curto, M. Gkenas, C. Afonso, F. Dias, D. Heumüller, J. Félix, P. M. Silva De Lima, Rivete Chaínho, P. Brito, A. C. Ribeiro, F.

Assessing the Impact of Charcoal Production on Southern Angolan Miombo and Mopane Woodlands

About 80% of Angola’s forest surface is covered by Miombo and Mopane woodlands, which are explored for diverse activities such as fuelwood and food. This study aimed to assess the recovery dynamics of Miombo and Mopane woodlands after the selective cutting of tree species for charcoal production. For that, the structure and composition of plant communities in 37 plots, located in southwestern Angola, were characterized in fallows of different ages. Results showed that the diameter at breast height, basal area, biomass, and biovolume of trees all rose as the age of the fallow increased, and there were no significant differences in richness, diversity, or dominance of trees between adult–young classes or recent–older fallows. In Mopane, fallows took longer to regenerate, were more affected by environmental and anthropogenic factors, and also presented a higher species adaptation to disturbance. There were more sprouter and seeder trees in Miombo, and new kilns were more distant from roads and villages. Moreover, the selective removal of species deeply altered the community structure and dynamics, despite not directly affecting tree diversity. Thus, new management strategies are needed to ensure the survival of these woodlands such as expanding protected areas and increasing systematic research.

2025-10-28T12:27:27Z

http://hdl.handle.net/10451/63619

Kissanga, Raquel Catarino, Luís Máguas, C. Cabral, Ana I. R. Chozas, Sergio

Lineage development of cell fusion hybrids upon somatic reprogramming

Somatic cell reprogramming has been extensively studied over the last years and opened new perspectives in the use of pluripotent cells for regenerative biomedical purposes. Spontaneous cell fusion has been suggested to be involved in regenerative processes in vivo. Strong evidences support the hypothesis that the reprogrammed hybrids resulting from the fusion between a pluripotent cell and a somatic cell exhibit pluripotent characteristics and may provide a source for cell therapy in the future. Previous evidences show that tetraploid hybrid cells are originated after the fusion event and that both in vitro and in vivo these cells can give rise to diploid cells by mitotic processes that are not fully understood. This “ploidy reduction” is the focus of this project. The fate of the hybrid cells was characterized by addressing the karyotype of the reprogrammed cells originated after fusion between Embryonic Stem cells and multipotent Neural Stem cells. We identified stable tetraploid and diploid clones that resisted the selection system after fusion and exhibit pluripotent characteristics. Furthermore, we showed that the obtained diploid cells have a fusion origin and are not a result of transdifferentiation or resistant Embryonic Stem cells. This study shows that ploidy reduction could be a consequence of fusion-mediated reprogramming corroborating the results published by other research group. We hypothesize that fusion-derived diploid cells might have been ignored in other studies or confounded with transdifferentiation events. The characterization of ploidy reduction is important to understand the role of cell fusion-mediated reprogramming during tissue regeneration and to uncover how these hybrids proliferate to eventually repopulate the damaged area.

2025-10-28T12:14:15Z

http://hdl.handle.net/10451/6362

Frade, João Manuel Rodrigues, 1988

Sex and season explain spleen weight variation in the Egyptian mongoose

The Egyptian mongoose (Herpestes ichneumon Linnaeus, 1758) is a medium-sized carnivore that experienced remarkable geographic expansion over the last 3 decades in the Iberian Peninsula. In this study, we investigated the association of species-related and abiotic factors with spleen weight (as a proxy for immunocompetence) in the species. We assessed the relationship of body condition, sex, age, season, and environmental conditions with spleen weight established for 508 hunted specimens. Our results indicate that the effects of sex and season outweigh those of all other variables, including body condition. Spleen weight is higher in males than in females, and heavier spleens are more likely to be found in spring, coinciding with the highest period of investment in reproduction due to mating, gestation, birth, and lactation. Coupled with the absence of an effect of body condition, our findings suggest that spleen weight variation in this species is mostly influenced by life-history traits linked to reproduction, rather than overall energy availability, winter immunoenhancement, or energy partitioning effects, and prompt further research focusing on this topic.

2025-10-28T12:14:01Z

http://hdl.handle.net/10451/63620

Bandeira, Victor Virgós, Emilio Azevedo, Alexandre Carvalho, João Cunha, Mónica V. Fonseca, Carlos

Pathways towards a sustainable future envisioned by early‐career conservation researchers

Scientists have warned decision-makers about the severe consequences of the global environmental crisis since the 1970s. Yet ecological degradation continues and little has been done to address climate change. We investigated early-career conservation researchers' (ECR) perspectives on, and prioritization of, actions furthering sustainability. We conducted a survey (n = 67) and an interactive workshop (n = 35) for ECR attendees of the 5th European Congress of Conservation Biology (2018). Building on these data and discussions, we identified ongoing and forthcoming advances in conservation science. These include increased transdisciplinarity, science communication, advocacy in conservation, and adoption of a transformation-oriented social–ecological systems approach to research. The respondents and participants had diverse perspectives on how to achieve sustainability. Reformist actions were emphasized as paving the way for more radical changes in the economic system and societal values linked to the environment and inequality. Our findings suggest that achieving sustainability requires a strategy that (1) incorporates the multiplicity of people's views, (2) places a greater value on nature, and (3) encourages systemic transformation across political, social, educational, and economic realms on multiple levels. We introduce a framework for ECRs to inspire their research and practice within conservation science to achieve real change in protecting biological diversity.

2025-10-28T12:26:21Z

http://hdl.handle.net/10451/63621

Raatikainen, Kaisa J. Purhonen, Jenna Pohjanmies, Tähti Peura, Maiju Nieminen, Eini Mustajärvi, Linda Helle, Ilona Shennan‐Farpón, Yara Ahti, Pauliina A. Basile, Marco Bernardo, Nicola Bertram, Michael G. Bouarakia, Oussama Brias‐Guinart, Aina Fijen, Thijs Froidevaux, Jérémy S. P. Hemmingmoore, Heather Hocevar, Sara Kendall, Liam Lampinen, Jussi Marjakangas, Emma‐Liina Martin, Jake M. Oomen, Rebekah A. Segre, Hila Sidemo‐Holm, William Silva, André Thorbjørnsen, Susanna Huneide Torrents‐Ticó, Miquel Zhang, Di Ziemacki, Jasmin

What Is the Giant Wall Gecko Having for Dinner? Conservation Genetics for Guiding Reserve Management in Cabo Verde

Knowledge on diet composition of a species is an important step to unveil its ecology and guide conservation actions. This is especially important for species that inhabit remote areas within biodiversity hotspots, with little information about their ecological roles. The emblematic giant wall gecko of Cabo Verde, Tarentola gigas, is restricted to the uninhabited Branco and Raso islets, and presents two subspecies. It is classified as Endangered, and locally Extinct on Santa Luzia Island; however, little information is known about its diet and behaviour. In this study, we identified the main plant, arthropods, and vertebrates consumed by both gecko subspecies using next generation sequencing (NGS) (metabarcoding of faecal pellets), and compared them with the species known to occur on Santa Luzia. Results showed that plants have a significant role as diet items and identified vertebrate and invertebrate taxa with higher taxonomic resolution than traditional methods. With this study, we now have data on the diet of both subspecies for evaluating the reintroduction of this threatened gecko on Santa Luzia as potentially successful, considering the generalist character of both populations. The information revealed by these ecological networks is important for the development of conservation plans by governmental authorities, and reinforces the essential and commonly neglected role of reptiles on island systems.

2025-10-28T12:13:47Z

http://hdl.handle.net/10451/63622

Pinho, Catarina Santos, Bárbara Mata, Vanessa Seguro, Mariana M. Romeiras, Maria Lopes, Ricardo Vasconcelos, Raquel

The expansion and establishment of the New Zealand mud snail Potamopyrgus antipodarum (Gray, 1843) in the freshwater ecosystems of Madeira Island (NE Atlantic)

This study reports the spread of the New Zealand mud snail Potamopyrgus antipodarum throughout freshwater ecosystems of Madeira Island, located in the NE Atlantic. Potamopyrgus antipodarum was first detected in 2017–2018 in two streams located in the north coast of the island. Since then, we have visually inspected the island's freshwater ecosystems and detected this gastropod in nine other streams. Previous evidence suggests that this species was introduced to Madeira since at least 2017, likely in the northern part. Our findings indicate that P. antipodarum is now well established in the initial invaded locations and has since spread to the south region and upper streams of the island. Although it is difficult to conclusively determine the origin and vector of this introduction, it is plausible to assume that humans and fish may have contributed to its current distribution. Our records represent the first evidence of vast geographical distribution of P. antipodarum on Madeira Island. Madeira seems to be the first invaded oceanic island of Macaronesia and the westernmost European distribution range for this invasive species.

2025-10-28T12:12:52Z

http://hdl.handle.net/10451/63623

Órfão, Inês Ramalhosa, Patrício Kerckhof, Francis João Canning-Clode, João

Is holistic processing of written words modulated by phonology?

Holistic processing, a hallmark of face processing, has been shown for written words, signaled by the word composite effect. Fluent readers find it harder to focus on one half of a written word (e.g., the first syllable of a CV.CV word) while ignoring the other half (e.g., the second syllable), especially when the two halves are aligned rather than misaligned. Given the linguistic nature of written words, in the present study, we examined whether the word composite effect is modulated by phonology. In Experiment 1, participants saw two sequentially presented CV.CV words and had to decide if the left half (first syllable) was the same or not, regardless of the right half. The word pairs were either phonologically consistent (univocal orthography to phonology mapping; e.g., TI is always /ti/ in Portuguese) or inconsistent (orthography can map into different phonological representations; e.g., CA can correspond to /ka/ or /kɐ/). The word composite effect was found for phonologically consistent words but not for phonologically inconsistent words. In Experiment 2, timing of trial events was reduced to test whether the influence of phonology was fast and automatic. Similar to what was found in Experiment 1, the word composite effect was found only for phonologically consistent words. The faster trial events in Experiment 2 rendered it less likely that the influence of phonology in word composite effect is merely a result of strategic processing. These findings suggest that holistic processing of visual words is modulated by fast and automatic activation of lexical phonological representations.

2025-10-28T12:28:59Z

http://hdl.handle.net/10451/63624

Ventura, Paulo Fernandes, Tânia Leite, Isabel Pereira, Alexandre Wong, Alan C.-N.

The development of holistic face processing: An evaluation with the complete design of the composite task

The composite paradigm is widely used to quantify holistic processing (HP) of faces: participants perform a sequential same-different task on one half (e.g., top) of a test-face relative to the corresponding half of a study-face. There is, however, debate regarding the appropriate design in this task. In the partial design, the irrelevant halves (e.g., bottom) of test- and study-faces are always different; an alignment effect indexes HP. In the complete design, besides alignment, congruency between the irrelevant and critical halves of the test-face is manipulated regarding the same/different response status of the study-face. The HP indexed in the complete design does not confound congruency and alignment and has good construct and convergent validities. De Heering, Houthuys, & Rossion (2007) argued that HP is mature as early as 4-year-olds but employed the partial design. Here we revisit this claim, testing four groups of 4- to 9/10 year-old children and two groups of adults. We found evidence of HP only from 6-year-olds on when considering the complete design, whereas significant alignment effects were found in the index adopted in the partial design already in 4-year-olds but which we demonstrate that reflects other factors besides HP, including response bias associated with congruency.

2025-10-28T12:16:34Z

http://hdl.handle.net/10451/63625

Ventura, Paulo Leite, Isabel Fernandes, Tânia

Cork-associated genes in the suberization dynamics of Arabidopsis tissues: a contribution for their functional characterization

Cork (or phellem), from cork oak (Quercus suber), is produced by the secondary meristem phellogen (or cork cambium), and constitutes the external barrier that protects the tree trunk and roots. Despite its socio-economic relevance little is known about the molecular mechanisms controlling cork formation. Recently, a comparative transcriptomic study to phellogen/phellem and xylem tissues on cork oak identified a list of candidate genes involved in phellogen activity and phellem differentiation. Subsequent studies using the Arabidopsis root and hypocotyl models, during primary and secondary growth, identified transcription factors possibly involved in root endodermis/periderm suberization: WOX9/STIP, required for meristem growth and early development, and ANT, involved in floral organ development. Here we further characterize WOX9 and ANT functions in Arabidopsis root endodermis suberization. Reverse transcription quantitative real-time-PCR analysis of root-hypocotyl tissues, targeting genes of the abscisic acid (ABA) pathway, revealed that lack of or over-expression of WOX9 significantly affects ABA biosynthesis and signaling, suggesting WOX9 might be a positive regulator of suberization through regulation of the ABA pathway. Additionally, we built a genetic construct expressing QsWOX9 under CaMV35S constitutive promoter, enabling complementation experiments to study functional similarities between Arabidopsis and cork oak homologues. Histochemical analysis in post-embryonic ant loss-of-function roots displayed only slight reduction in suberized zones. However, through RT-qPCR targeting key genes of the suberin pathway in stem and root-hypocotyl tissues, we demonstrated that lack of ANT alters the expression of genes involved in biosynthesis and assembly of suberin monomers. Moreover, we showed that exogenous ABA induces suberization in ant seedlings. The expression analysis of key ABA pathway genes in ant-9 root-hypocotyl tissues revealed the reduction of biosynthesis and downregulation of signaling, suggesting ANT might be a positive regulator of the ABA signaling. Overall, our results support a regulatory role for WOX9 and ANT in the mechanisms underlying the suberization process.

2025-10-28T12:15:39Z

http://hdl.handle.net/10451/63626

Vila Verde, Ana Catarina dos Santos

The Word Composite Effect Depends on Abstract Lexical Representations But Not Surface Features Like Case and Font

Prior studies have shown that words show a composite effect: When readers perform a same-different matching task on a target-part of a word, performance is affected by the irrelevant part, whose influence is severely reduced when the two parts are misaligned. However, the locus of this word composite effect is largely unknown. To enlighten it, in two experiments, Portuguese readers performed the composite task on letter strings: in Experiment 1, in written words varying in surface features (between-participants: courier, notera, alternating-cAsE), and in Experiment 2 in pseudowords. The word composite effect, signaled by a significant interaction between alignment of the two word parts and congruence between parts was found in the three conditions of Experiment 1, being unaffected by NoVeLtY of the configuration or by handwritten form. This effect seems to have a lexical locus, given that in Experiment 2 only the main effect of congruence between parts was significant and was not modulated by alignment. Indeed, the cross-experiment analysis showed that words presented stronger congruence effects than pseudowords only in the aligned condition, because when misaligned the whole lexical item configuration was disrupted. Therefore, the word composite effect strongly depends on abstract lexical representations, as it is unaffected by surface features and is specific to lexical items.

2025-10-28T12:29:40Z

http://hdl.handle.net/10451/63627

Ventura, Paulo Fernandes, Tânia Leite, Isabel Almeida, Vítor Casqueiro, Inês Wong, Alan C.-N.

AutoVizuA11y: A tool to automate accessibility in data visualizations for screen reader users

Data visualizations remain widely inaccessible for screen reader users on the web. Despite recent research, developers still lack the experience, knowledge, and time to consistently implement accessible features. As a result, screen reader users lose access to information and are compelled to resort to tabular alternatives, when available, limiting their access to information. These issues make it impossible for users of screen readers to work in areas like financial fraud detection and demand the exploration of data through different chart types. We worked with screen reader users and chart creators at Feedzai to develop AutoVizuA11y, a tool that automates the addition of accessible features to web-based charts. It enables keyboard navigation, offers shortcuts for faster exploration, provides automatic labeling, and generates human-like descriptions of the data using a large language model. Through a series of task-based tests (16 tasks and 15 users) comparing two interfaces — one with AutoVizuA11y charts resembling Feedzai’s environment and the other with accessible tables — we show that with AutoVizuA11y users are, on average, faster (66 vs 78 seconds) and more accurate (89% vs 79% accuracy).

2025-10-28T12:19:40Z

http://hdl.handle.net/10451/63628

Duarte, Diogo Ramalho

A Hybrid Machine Learning System for Vulnerability Detection in Web Applications

Security in web applications is often compromised by poorly written code that is exploited by attackers. Source code vulnerability detection tools have been developed using static analysis and machine learning techniques. The best performing tools seek for very low false negative rates along with acceptable false positives. Static analysis requires manual programming to identify vulnerabilities, depends on human expertise and is usually limited to a specific programming language. On the other hand, classical supervised machine learning approaches previously used may be limited to identify zero-day vulnerabilities or prone to overfit due to limited available datasets. This dissertation aims to develop a hybrid machine learning (ML) system for vulnerability detection of web applications. The system developed will use a combination of static analysis and Natural Language Processing (NLP) techniques to identify functions related to vulnerabilities that will be used to build representative datasets. The datasets will be used as input for unsupervised machine learning and other behaviour based anomaly detection algorithms in order to signalize as suspicious the code snippets under analysis. For these source code snippets, the system will aim to confirm which are vulnerable and identify the type of vulnerability via supervised machine learning techniques. The dissertation explores a novel approach to vulnerability detection by combining unsupervised anomaly detection models with supervised machine learning and Natural Language Processing techniques. Previous research in vulnerability detection has primarily focused on either unsupervised or supervised methods, neglecting the potential benefits of a hybrid approach. The goal of this research is to investigate the efficacy of hybrid architectures in identifying software vulnerabilities and to determine the optimal machine learning models and datasets for this purpose. The proposed hybrid model consists of different layers. The first uses a One Class Support Vector Machine model (OCSVM) to detect anomalies, the second employs a Random Forest Model to confirm the presence of vulnerabilities on the anomalies. The type of vulnerability is classified by a Logistic Regression Model that relies on the Doc2Vec model for feature extraction. The research includes experimentation with various machine learning models and datasets, evaluating simple binary features to more complex Doc2Vec embeddings. The thesis demonstrates OCSVM’s suitability for semi-unsupervised anomaly detection, yielding promising results across various datasets. Additionally, the study assesses Random Forests’ effectiveness in classifying vulnerable source code snippets based on OCSVMdetected anomalies and validate the use NLP techniques for feature extraction of sourcecode snippets. Overall, the proposed hybrid model achieved an accuracy of 65%. Although these results seems to be low, this research offers a promising hybrid approach to vulnerability detection, leveraging the strengths of unsupervised and supervised machine learning models. The findings suggest opportunities for further enhancements and optimizations, paving the way for more effective software vulnerability detection systems.

2025-10-28T12:28:20Z

http://hdl.handle.net/10451/63629

Oliveira, Miguel César de Albuquerque

Cognitive subtyping of university students with dyslexia in a semi-transparent orthography: what can weaknesses and strengths tell us about compensation?

Developmental dyslexia is characterized by a profile of reading- and writing-related difficulties which stands out as a core deficit in phonological processing. Although these difficulties seem to persist into adulthood, it is still an open question to what extent they are immune, or not, to the extensive training resulting from extended schooling. The main objective of this study was to explore the heterogeneity of the cognitive profile of European Portuguese highly literate adults with dyslexia. Thirty-one university students diagnosed with dyslexia during childhood and their matched skilled adult control readers were assessed through a battery of reading and cognitive tests. A cluster analysis of data obtained from participants with dyslexia identified two profile groups. While Cluster 1 grouped participants with clear phonological deficits and concomitant reading difficulties, Cluster 2 showed better performance on most of the core skills associated with reading and also better general cognitive abilities, suggesting that these dyslexic readers have partially resolved their phonological constraints along the development, probably due to the systematic exposure to reading and writing. As Cluster 2 matched typical readers in general cognitive abilities, it might also be the case that cognitive strengths associated with general intelligence worked as protective factors, helping students to strategically compensate for their reading difficulties. Overall, these results suggest that both mechanisms– partial remediation of the core phonological deficit and adoption of compensatory strategies supported by general cognitive skills–might contribute together to improving the reading performance of highly literate adults with dyslexia.

2025-10-28T12:16:21Z

http://hdl.handle.net/10451/63630

Faísca, Luís Reis, Alexandra Araújo, Susana

Development of a Mechatronic Platform for Passive Tactile Stimulation Using a Rotating Drum with Embossed Patterns

Here by it is presented Emily, an automatically controlled mechatronic platform for passive tactile stimulation using a rotating drum with different embedded textured surfaces. The stimulator has two DC motors and therefore two degrees of freedom, one from the rotating drum and one from the linear guide where the drum is mounted, which moves the stimulator along the guide changing the stimuli presented to the subject. The stimulator is a in house design built with 3D printed components with a versatile design concept allowing for a variety of experimental protocols. The platform uses a sbRIO-9637 with LabVIEW FPGA installed, which guarantees a high degree of flexibility on how the platform can be programmed to operate. The goal of this platform is to create an automatically controlled, versatile, and standardized manner of performing tactile stimulation while conducting parallel electrophysiological and psychophysiological tests to deepen our understanding of the neuronal processes underlying the human sense of touch. Emily comprises a series of advantages such as (1) automatic control; (2) small size and ease of transportation; (3) control of the motors rotation speed; (4) display of the contact force exerted by the subject in real-time by a force sensor; (5) stimulation with different topographies in short intervals; (6) the use of the sbRIO with LabVIEW FPGA embedded making the experimental protocol configurable, improvable and versatile, while allowing to view live the status of the platform with great accuracy; (7) low electromagnetic interference by the use of a linear current amplifier; (8) use of commercially available components. This thesis is a guide on how the platform was thought out, designed, and built, so that in the future other investigators can recreate, and improve upon it to further progress our knowledge of the neural processes of touch in humans.

2025-10-28T12:11:02Z

http://hdl.handle.net/10451/63631

Januário, Rodrigo Nogueira

Alterações nas dimensões do acesso aos estabelecimentos comerciais: análise dos efeitos do turismo a partir da visão da população vulnerável do centro histórico de Lisboa

No summary/description provided

2025-10-28T12:20:21Z

http://hdl.handle.net/10451/63632

Guimarães, Pedro Silva, Katielle

A Dislexia e a Alfabetização: Da Evidência Científica à Sala de Aula

Este capítulo apresenta uma visão científica atual da dislexia de desenvolvimento, uma das perturbações do neurodesenvolvimento mais comuns e que se caracteriza por dificuldades específicas e permanentes na aquisição e desenvolvimento da leitura/escrita. Numa primeira parte introduzo o conceito de dislexia e traço as manifestações comportamentais típicas que lhe estão associadas, em crianças e adultos, para de seguida aprofundar os défices cognitivos centrais da perturbação e os seus correlatos cerebrais. Por fim, refiro os ingredientes-chave dos programas de intervenção eficazes, que podem ser usados como estratégias com alta probabilidade de eficácia na instrução de crianças com e sem dificuldades de leitura. Forneço alguns exemplos de atividades focadas no sucesso da aprendizagem da leitura que podem (e devem) ser implementadas em contexto de sala de aula.

2025-10-28T12:12:26Z

http://hdl.handle.net/10451/63633

Araújo, Susana

Early Brain Sensitivity to Word Frequency and Lexicality During Reading Aloud and Implicit Reading

The present study investigated the influence of lexical word properties on the early stages of visual word processing (<250 ms) and how the dynamics of lexical access interact with task-driven top-down processes. We compared the brain’s electrical response (event-related potentials, ERPs) of 39 proficient adult readers for the effects of word frequency and word lexicality during an explicit reading task versus a visual immediaterepetition detection task where no linguistic intention is required. In general, we observed that left-lateralized processes linked to perceptual expertise for reading are task independent. Moreover, there was no hint of a word frequency effect in early ERPs, while there was a lexicality effect which was modulated by task demands: during implicit reading, we observed larger N1 negativity in the ERP to real words compared to pseudowords, but in contrast, this modulation by stimulus type was absent for the explicit reading aloud task (where words yielded the same activation as pseudowords). Thus, data indicate that the brain’s response to lexical properties of a word is open to influences from top-down processes according to the representations that are relevant for the task, and this occurs from the earliest stages of visual recognition (within ~200 ms). We conjectured that the loci of these early top-down influences identified for implicit reading are probably restricted to lower levels of processing (such as whole word orthography) rather than the process of lexical access itself.

2025-10-28T12:11:16Z

http://hdl.handle.net/10451/63634

Faísca, Luís Reis, Alexandra Araújo, Susana

Hall Screens for Gymnastics Competitions

This document presents an engineering project developed in Acro Companion, a company specialized in gymnastics software. This project is important to Acro Companion’s flagship product, an application designed to streamline gymnastics competitions for organizers, coaches, judges, and gymnasts. The platform automates various aspects of competition management, real-time scoring, club membership, and display control in competition venues. The project’s focus was on improving the Scoring service, particularly the Hall Screens area, responsible for displaying and controlling real-time information on the competition’s screens. The problems plaguing the Hall Screens diverge into two distinct categories. In the first category are the inadequacies in the scoring screen’s architectural design, hampering maintainability, enhancements and the diversification of the view types. The second category, related to the controller mechanism, is the lack of this mechanism in power, reactivity and usability, which ends up disrupting the user’s workflow. This project rallies around three primary objectives. Leveraging UI designs as blueprints, the first objective was to re-imagine the Hall Screens, fostering new designs and better architectural foundations to enhance maintainability, correct behaviour and expansion potential. The second objective is the development of a new powerful and reactive solution for the mechanism that controls the screens. The last objective is the development of a suite of end-to-end tests specific to the Hall Screens area to ensure reliability and correctness. The results of this project were significant. The Hall Screens component improved significantly in terms of stability, robustness, and architectural organization. Simultaneously, the new controller mechanism, though not feature-complete, was entirely reinvented, enhancing maintainability, understandability, reactivity and user experience. Additionally, a large suite of end-to-end tests was implemented, improving the overall stability of the Hall Screens and increasing safety during development. In summary, this project within Acro Companion tackled critical issues in their gymnastics competition platform, resulting in substantial improvements to the Hall Screens, Controller Mechanism, and overall system stability. These enhancements contributed to the company’s mission of providing efficient solutions for gymnastic competitions.

2025-10-28T12:25:26Z

http://hdl.handle.net/10451/63635

Lamelas, Cláudio André Rodrigues

Repositório RCAAP

Automatic label correction while learning classification models with genetic programming

DNA metabarcoding reveals the diet of the invasive fish Oreochromis mossambicus in mangroves of São Tomé Island (Gulf of Guinea)

Assessing the Impact of Charcoal Production on Southern Angolan Miombo and Mopane Woodlands

Lineage development of cell fusion hybrids upon somatic reprogramming

Sex and season explain spleen weight variation in the Egyptian mongoose

Pathways towards a sustainable future envisioned by early‐career conservation researchers

What Is the Giant Wall Gecko Having for Dinner? Conservation Genetics for Guiding Reserve Management in Cabo Verde

The expansion and establishment of the New Zealand mud snail Potamopyrgus antipodarum (Gray, 1843) in the freshwater ecosystems of Madeira Island (NE Atlantic)

Is holistic processing of written words modulated by phonology?

The development of holistic face processing: An evaluation with the complete design of the composite task

Cork-associated genes in the suberization dynamics of Arabidopsis tissues: a contribution for their functional characterization

The Word Composite Effect Depends on Abstract Lexical Representations But Not Surface Features Like Case and Font

AutoVizuA11y: A tool to automate accessibility in data visualizations for screen reader users

A Hybrid Machine Learning System for Vulnerability Detection in Web Applications

Cognitive subtyping of university students with dyslexia in a semi-transparent orthography: what can weaknesses and strengths tell us about compensation?

Development of a Mechatronic Platform for Passive Tactile Stimulation Using a Rotating Drum with Embossed Patterns

Alterações nas dimensões do acesso aos estabelecimentos comerciais: análise dos efeitos do turismo a partir da visão da população vulnerável do centro histórico de Lisboa

A Dislexia e a Alfabetização: Da Evidência Científica à Sala de Aula

Early Brain Sensitivity to Word Frequency and Lexicality During Reading Aloud and Implicit Reading

Hall Screens for Gymnastics Competitions