Abstract:The soil water characteristic curve (SWCC) is the foundation for studying the permeability, strength prediction, and constitutive relationships of unsaturated soils. The machine learning algorithm has the characteristics of efficient processing of large amounts of data and feature extraction. Six machine learning algorithms (four ensemble learning algorithms and two traditional machine learning algorithms) were utilized to model 154 SWCCs comprising 1976 data points sourced from the American Unsaturated Soil Database. The performance of the algorithms was assessed using four performance evaluation indicators (R2, EVS, MAE and RMSE). Two types of data input methods were selected: logarithmic processing of pressure head and untreated. The results indicate that, under the two input types, the impact on the LightGBM, GPR, XGB and AdaBoost algorithms is minimal; however, in the case where pressure head is not logarithmically processed, the impact on the GPR and SVM two traditional machine learning algorithms is significant, R2 drops sharply and it may even result in the inability to model SWCC. Additionally, LightGBM outperforms other models in simulating the SWCC test set, with high trend evaluation indicators (R2 and EVS) and low error measurement indicators (MAE and RMSE). The ranking of the six algorithms in terms of the quality of SWCC simulation is as follows: LightGBM, GPR, XGB, RF, AdaBoost and SVM. Finally, utilizing the LightGBM model trained on the aforementioned database, predictions were made for 9 SWCCs not included in the database. The study revealed that LightGBM can effectively predict the soil water characteristics of unsaturated soils. These research findings have important implications for improving SWCC models for different types of soils.