I made a list of terms in [[Category:Portuguese_lemmas]]
and not in [[Category:Portuguese_forms_superseded_by_AO1990]]
. Afterwards, I used regex to filter for only terms containing hyphens, but neither at the start nor at the end of the word (to exclude prefixes, suffixes, and infixes).
The code responsible for filtering terms is available to check at https://github.com/PolomoPT/estranhezas-wiktionary/tree/main/Hífen.
This should include words that are no longer spelled with a hyphen, but not categorized as superseded. There are many false hits resulting from the process — it might be possible to narrow this list down further.
The lists below already filter out the terms included above.
Prefixes ending in vowels only take hyphens if the following term starts with the same vowel or the letter h.
My code did not find any wrongly hyphenated terms resulting from the prefixes sob- and sub-. When I manually inserted some in the corpus, it managed to detect them; I conclude that there just aren't any.
My code did not find any wrongly hyphenated terms resulting from the prefixes co- and re-. When I manually inserted some in the corpus, it managed to detect them; I conclude that there just aren't any.
My code did not find any wrongly hyphenated terms resulting from the prefixes in- and des-. When I manually inserted some in the corpus, it managed to detect them; I conclude that there just aren't any.
My code did not find any wrongly hyphenated terms resulting from the prefixes circum- and pan-. When I manually inserted some in the corpus, it managed to detect them; I conclude that there just aren't any.
My code did not find any wrongly hyphenated terms resulting from the prefix mal-. When I manually inserted some in the corpus, it managed to detect them; I conclude that there just aren't any.
The list below was compiled by searching for terms with two hyphens separating a part of the compound. This list includes terms that have a linking element and yet are still hyphenated. In names for plants and animals, this is correct; however, other terms in this format should be marked as pre-1990 spellings.
Of course, other terms with no linking element may be mistakenly included.
The block below includes capitalized terms, which are mainly toponyms. Most of the terms below are not superseded.
This list includes words that didn't qualify for the other categories, and include a hyphen between a vowel and a letter that is not the same vowel. E.g., a-e, a-n, e-c. Since these mostly don't include prefixes, this list has basically nothing superseded.
Every hyphenated term that didn't get included in the previous lists. This has some superseded terms.