Data Mining Assignment


Part 1 – 10 marks

There are many large datasets available at:

Choose TWO large datasets and analyse them with pivot tables. Document the insights and trends that you find during the analysis. Submit a word document that includes FOR EACH DATASET:

1. The URL of the dataset.

2. A description of the dataset.

3. A screen capture showing the first page of the Excel spreadsheet containing the dataset.

4. Screen captures of TWO DIFFERENT pivot tables on the dataset utilised together with any graphs output.

5. A clear analysis of your findings.

6. Focus on the insight you are trying to gain: try differentiating dimensions from facts in the dataset. You will usually have dimensions as the rows (time, location, product type) and facts in the centre (revenue, cost etc).

Part 2 – 10 marks

Write a 1200 word (I am not going to count them) technical report (in MS Word), complete with proper referencing, from the position of a professional business analyst, to address the following:

(a) Discuss the important features of data mining tools; and

(b) Discuss how data mining can realize the value of a data warehouse.

This exercise is from questions 18.13 and 18.14 on page 460 of the textbook.

Part 3 – 5 marks

The four “V”s of big data are volume, variety, velocity and veracity which reflect the amount of data, the different types, the speed with which it is collected, and the uncertainty relating to its truth.

You are a large department store thinking about using big data to understand your customers better.

Draw an entity relationship model containing key attributes from a customer’s internet browsing activity, your transaction sales database, social media activity and publicly available demographic data on your site (or any other interesting sources – CCTV, telephone). The title of the diagram should contain the purpose of the model, what you are trying to achieve.