Know Thy Customer (KTC) is a financial consulting company that provides personalized financial advice to its clients. As a basis for developing this tailored advising, KTC would like to segment its customers into several representative groups based on key characteristics. Peyton Blake, the director of KTC’s fledging analytics division, plans to establish the set of representative customer profiles based on 600 customer records in the file KnowThyCustomer. Each customer record contains data on age, gender, annual income, marital status, number of children, whether the customer has a car loan, and whether the customer has a home mortgage. KTC’s market research staff has determined that these seven characteristics should form the basis of the customer clustering. Peyton has invited a summer intern, Danny Riles, into her office so they can discuss how to proceed. As they review the data on the computer screen, Peyton’s brow furrows as she realizes that this task may not be trivial. The data contains both categorical variables (Female, Married, Car, and Mortgage) and numerical variables (Age, Income, and Children). Link to Dataset below: https://college.cengage.com/nextbook/business/camm_180959/student/data_files/chapter_05/knowthycustomer.xlsx Using Manhattan distance to compute dissimilarity between observations, apply hierarchical clustering on all seven variables, experimenting with using complete linkage and group average linkage. Normalize the values of the input variables. Recommend a set of customer profiles (clusters). Describe these clusters according to their “average” characteristics. Why might hierarchical clustering not be a good method to use for these seven variables? Apply a two-step approach: Using matching distance to compute dissimilarity between observations, employ hierarchical clustering with group average linkage to produce four clusters using the variables Female, Married, Loan, and Mortgage. Based on the clusters from part (a), split the original 600 observations into four separate data sets as suggested by the four clusters from part (a). For each of these four data sets, apply k-means clustering with  using Age, Income, and Children as variables. Normalize the values of the input variables. This will generate a total of eight clusters. Describe these eight clusters according to their “average” characteristics. What benefit does this two-step clustering approach have over just using hierarchical clustering on all seven variables as in part (1) or just using k-means clustering on all seven variables? What weakness does it have?

icon
Related questions
Question
100%

Know Thy Customer (KTC) is a financial consulting company that provides personalized financial advice to its clients. As a basis for developing this tailored advising, KTC would like to segment its customers into several representative groups based on key characteristics. Peyton Blake, the director of KTC’s fledging analytics division, plans to establish the set of representative customer profiles based on 600 customer records in the file KnowThyCustomer. Each customer record contains data on age, gender, annual income, marital status, number of children, whether the customer has a car loan, and whether the customer has a home mortgage. KTC’s market research staff has determined that these seven characteristics should form the basis of the customer clustering.

Peyton has invited a summer intern, Danny Riles, into her office so they can discuss how to proceed. As they review the data on the computer screen, Peyton’s brow furrows as she realizes that this task may not be trivial. The data contains both categorical variables (Female, Married, Car, and Mortgage) and numerical variables (Age, Income, and Children).

Link to Dataset below:

https://college.cengage.com/nextbook/business/camm_180959/student/data_files/chapter_05/knowthycustomer.xlsx

  1. Using Manhattan distance to compute dissimilarity between observations, apply hierarchical clustering on all seven variables, experimenting with using complete linkage and group average linkage. Normalize the values of the input variables. Recommend a set of customer profiles (clusters). Describe these clusters according to their “average” characteristics. Why might hierarchical clustering not be a good method to use for these seven variables?

  2. Apply a two-step approach:

    1. Using matching distance to compute dissimilarity between observations, employ hierarchical clustering with group average linkage to produce four clusters using the variables Female, Married, Loan, and Mortgage.

    2. Based on the clusters from part (a), split the original 600 observations into four separate data sets as suggested by the four clusters from part (a). For each of these four data sets, apply k-means clustering with  using Age, Income, and Children as variables. Normalize the values of the input variables. This will generate a total of eight clusters. Describe these eight clusters according to their “average” characteristics. What benefit does this two-step clustering approach have over just using hierarchical clustering on all seven variables as in part (1) or just using k-means clustering on all seven variables? What weakness does it have?

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps

Blurred answer