diff --git a/Machine-Learning.html b/Machine-Learning.html index 187b855..43ad6df 100644 --- a/Machine-Learning.html +++ b/Machine-Learning.html @@ -7517,7 +7517,12 @@ a.anchor-link {
-

Machine Learning project in SoSe 2025 at HTW Saar

Idea

The goal of this project is predicting the genre(s) of a game/bundle through its given description(s)

+

Machine Learning project in SoSe 2025 at HTW Saar

Contributors

+

Idea

The goal of this project is predicting the genre(s) of a game/bundle through its given description(s)

Dataset

For our project we use a Steam Dataset provided on moodle, since it has all information we plan on using. The Dataset has been cut to only 2000 data points to be runnable on weaker devices.

@@ -8184,10 +8189,10 @@ When comparing these models between datasets, it is evident that a bigger datase
@@ -8298,12 +8303,12 @@ Most important words of class 'Strategy':
  • Hyperparameter validation should also be performed. For example, in LinearSVC, the C parameter controls the learning rate and could be further optimized.

  • -
  • Instead of a simple train-test split, k-fold cross validation should be used to achieve better data mixing and more robust results.

    +
  • Instead of a simple train-test split, k-fold cross validation without a fixed random_state should be used to prevent overfitting, better data mixing and more robust results.

  • -
  • Additionally, ensemble learning methods could be considered to further improve performance.

    +
  • Additionally, ensemble learning methods could further improve performance.

  • -

    The biggest limitation of our dataset is the presence of too many languages but too few entries for each, which is also constrained by our computing resources.

    +

    The biggest limitation of our dataset is the presence of many (especially CJK-) languages but too few entries for each, which is also constrained by our computing resources.

    @@ -8317,7 +8322,7 @@ Most important words of class 'Strategy':

    Conclusion and outlook

    To conclude we can say that our model performs reasonably well for the intended application. With a larger dataset, the results would likely improve further. Considering the points mentioned above, it is quite impressive that the model achieves these results using only a small dataset and limited computational resources.

    Our collaboration as a team worked very smoothly throughout the project. Communication and planning were effective, allowing us to coordinate our tasks efficiently and make steady progress.

    The main challenge we faced was the limited computational resources available to us. Especially when working with the 10k dataset, training the models for statistical evaluation took a considerable amount of time. To address this, each team member ran different models in parallel on their own machines, with some training processes running for several days.

    -

    Due to these computational constraints, we decided not to process the full dataset with 80,000 entries. Even though we had access to very powerful PCs equipped with the latest high-end components, the training times were still prohibitively long. As a result, we focused our efforts on the smaller datasets to ensure we could complete the project within a reasonable timeframe.

    +

    Due to these computational constraints, we decided not to process the full dataset with 80,000 entries. Even though we had access to PCs equipped with the mid to high-range components, the training times were still prohibitively long. As a result, we focused our efforts on the smaller datasets to ensure we could complete the project within a reasonable timeframe.

    In summary, this project provided us with valuable insights into the challenges and opportunities of machine learning in a real-world context. Despite the limitations we faced, we were able to develop a functioning model and gain practical experience in data preprocessing, model selection, and evaluation. We are proud of what we achieved as a team and look forward to applying the knowledge and skills gained here to future projects.

    diff --git a/Machine-Learning.pdf b/Machine-Learning.pdf index 162d270..c7928bc 100644 Binary files a/Machine-Learning.pdf and b/Machine-Learning.pdf differ