Impact of categorical variables encoding on property mass valuation

CC BY-NC-ND Logo DOI

The main aim of the article was to present impact of categorical variables encoding on property mass valuation. Categorical variables are often used for describing important properties’ characteristics. In some countries, i.e., Poland, description of properties is mainly conducted with categorical variables, both nominal and ordinal. When property mass valuation is carried out it is important to introduce this kind of variables in best way to achieve most accurate results. There are many techniques of categorical variables encoding. In this study some of them were used in data pre-processing to determine whether the choice of encoding technique affects valuation results obtained with several regression algorithms. Three types of regression models were used in the research: a ridge regression model, k nearest neighbours regression and random forest regression algorithm. Each algorithm used explanatory variables coded using five techniques: one hot encoding, catboost encoding, Helmert encoding, target encoding and ordinal encoding. The results show that mass valuation results vary depending on how the encoding of categorical variables occurs. The regression algorithms used in the study respond differentially to the variable encoding techniques. Nevertheless, the one-hot encoding technique proved to be the best choice. The practical implications of the study are related to the reform of property taxation in Poland. Under this reform, values would become the basis for property taxation. This will be a complex undertaking, requiring the testing of various types of computational techniques to accurately determine the value of an enormous number of properties. The machine learning techniques presented in the study could be a part of a decision support system for introducing a new way of property taxation.

Tytuł
Impact of categorical variables encoding on property mass valuation
Twórca
Gnat Sebastian ORCID 0000-0003-0310-4254
Słowa kluczowe
categorical variables encoding; property mass valuation; valuation accuracy
Słowa kluczowe
kodowanie zmiennych jakościowych; masowa wycena nieruchomości; trafność wyceny
Data
2021
Typ zasobu
artykuł
Identyfikator zasobu
DOI 10.1016/j.procs.2021.09.127
Źródło
Procedia Computer Science, 2021, vol. 192, pp. 3542-3550
Język
angielski
Prawa autorskie
CC BY-NC-ND CC BY-NC-ND
Dyscyplina naukowa
Dziedzina nauk społecznych; Ekonomia i finanse
Kategorie
Publikacje pracowników US
Data udostępnienia29 lis 2022, 12:33:35
Data mod.29 lis 2022, 12:33:35
DostępPubliczny
Aktywnych wyświetleń0