preview

Nt1330 Unit 5 Final Report

Decent Essays

Hi all,

Sorry for late update mail. I've been working improving what's already done as I didn't have much else to do during last week. This is brief summary of things that I've done during last week.

I already explained before why perfect grouping is not possible. Because even though data makes sense computationally while clustering, the resulting clusters are perfect clusters but not good groups. That's why we're using classification after initial process. Because after manual correction, the classifier will be much better at predicting groups. This is good, but this was my largest pain point as well. Because I wanted to decrease manual work and get near optimal groups. I dig up some more in this regard but it was not much helpful. I came across this question on stackexchange. The problem and suggestion is kinda similar to what we're doing. However, I did some changes in preprocessing and we're getting better results than before. I know one particular heuristic to get even better results. I've mentioned that further down in this mail. …show more content…

I did some changes there. There's a route /generate/ to generate groups. There isn't any parameters required If one need standard generation in which I'm generating n upper level groups, where n is 10% of all products in that category. By calling this route, all the generated data will be saved at generated_data folder. There's a standard method to calculate number of clusters known as elbow method. But I'm not using it because it optimizes n based on cluster distance on feature matrix, which gives good n for clusters but not for our expected groups. Which results in further division on all upper level groups. However there's an option to get expected number of groups in this route if one provides a json file with this data. ('produce' : 120, 'snacks' : 300). I've added this to keep more flexibility for data generation process. However, this is not needed to run /generate/

Get Access