Share Pdf : Introduction To Data Mining Tu E

## Transcription

UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Course overview, Lectures 15 2 30 hours, Wed 8 15 10 00 Thu 12 15 14 00 Fri 10 15 12 00. all in Ag Beeta, Nov 17 12 00 14 00 Sami s public examination of PhD after the. Tutorial followed by an assignment 5 2 10 hours, Tue 14 15 16 00 Ag B212 2 Mountains. but week 50 Wed 8 15 10 00, Seminar 2 hours, Ag B212 2 Mountains. 5 10 min presentation by each student about the final assingment. Final assignment no final exam, to be sent to mpechen cs jyu fi and samiayr mit jyu fi by the end.

of Jan 07 always use TIES443 keyword in the subject field. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Credits Passing the Course Grading, 5 ECTS 3 ov, Five assignments final assignment 5 5 3 5 40 points as max. 1 2 pages report for each of five assignments should be submitted by. e mail to mpechen cs jyu fi and samiayr mit jyu fi within a week of. the day of the assignment, Report on final assignment should be submitted by the end of Jan 07. I will tell you more during the first lab, Communication outside the classes. ties443 korppi jyu fi, Appointment by sending a request to mpechen cs jyu fi or. samiayr mit jyu fi, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction.

UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Course Contents BI DW Part, Introduction to introduction. Basic definitions, DM and KDD, History of DM, Motivation for DM reference disciplines DM community. Major DM tasks and application, Prediction knowledge discovery. Major issues in DM, Introduction to Business Intelligence. DM in BI context, DM myths OLAP vs DM, Introduction to Data Warehousing.

DW architecture design implementation, Data cubes OLAP operations. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Course Contents DM Part, DM Input and Output, Input Concepts instances attributes. What is a concept an example an attribute, Output Knowledge Representation. Decision tables trees and rules relations CBR, DM Techniques. Data preparation, Cleaning missing values transformation Curse of dimensionality.

Clustering Classification Associations Visualization. The largest part of the course, DM Evaluation and Credibility. Predicting Performance, Train test and validation sets cross validation unbalanced data. Comparing Data Mining Schemes, ROC cost sensitive learning Occam s razor parameters tuning. DM KDD process, Iterative interactive, DM Miscellaneous issues. Privacy ethics distributed DM, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction.

UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Course Contents Tutorials, Prototyping DM techniques and solutions. WEKA and YALE open source software, MATLAB environment. Mining time series data, Review of basic techniques. Mining image data, Review of basic techniques, Mining text data. Review of basic techniques, ExtMiner Miika Nurminen.

An assignment will follow each tutorial, Application of DM to benchmark real world data. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Witten I Frank E 2000 Data Mining Practical machine learning tools. with Java implementations Morgan Kaufmann San Francisco book. software page, Crawford D 1996 Special Issue on Data Mining Communications of. the ACM Volume 39 Number 11 November 1996, Reinartz T 1999 Focusing Solutions for Data Mining LNAI 1623. Berlin Heidelberg, Han J and Kamber M 2000 Data Mining Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems Morgan. Kaufmann Publishers 550 pages ISBN 1 55860 489 8 ppt slides to the. Data Mining A Practitioner s Approach ELCA Informatique SA 2001. CRISP DM 1 0 Step by step data mining guide SPSS Inc. Check TIES443 homepage for more It will be updated regularly. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. More on DM and KDD, KDnuggets com, News Publications.

Software Solutions, Courses Meetings Education, Publications Websites Datasets. Companies Jobs, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Special Acknowledgments, for many adopted adapted ppt slides used in the. Piatetski Shapiro KDnuggets, Witten Frank s book, http www cs waikato ac nz ml weka book html. Eamon Keogh http www cs ucr edu eamonn, Han s DM book http www cs sfu ca han dmbook.

and many many DM related courses available in www, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Topics for this week, Introduction to the DM field. definitions motivation brief history, DM tasks and application examples. Business Intelligence, DM in BI context, DM myths OLAP vs DM. Data Warehousing DW, DW architecture design implementation.

Data cubes OLAP operations, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Topics for today, What is Data Mining, Basic definitions. DM and KDD, History of DM, Motivation for DM reference disciplines DM community. Major DM tasks and application, Prediction knowledge discovery. Major issues in DM, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction.

UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. What Is Data Mining, Data mining knowledge discovery in databases. Extraction of interesting non trivial implicit previously unknown and. potentially useful information or patterns from data in large databases. the process of selecting exploring and modeling large amounts of data. to uncover previously unknown patterns for a business advantage SAS. Data mining is an area in the intersection of machine learning. statistics and databases Holsheimer et al, Alternative names and their inside stories. Data mining a misnomer, Knowledge discovery mining in databases KDD knowledge. extraction data pattern analysis data archeology data dredging. information harvesting business intelligence etc, What is not data mining. Deductive query processing, Expert systems or small ML statistical programs.

TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Data Mining in the BI Context, Data Mining is a business driven process supported by. adequate tools aimed at the discovery and consistent use. of meaningful profitable knowledge from corporate data. A kind of operationalization of Machine Learning with. emphasis on process and actions, Hand 2000 Data Mining is the process of seeking interesting or. valuable information in large data bases, Large commercial data bases could be used to increase profitability by. pinpointing target classes of clients and pointing clients toward. desirable options, The result large data mining programs sold by SAS SPSS etc. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Motivation for DM, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction.

UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Motivation Necessity is the Mother of Invention, Data explosion problem. Automated data collection tools and mature database technology. lead to tremendous amounts of data stored in databases data. warehouses and other information repositories, We are drowning in data but starving for knowledge. Solution Data warehousing and data mining, Data warehousing and on line analytical processing. Extraction of interesting knowledge rules regularities patterns. constraints from data in large databases, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Trends leading to Data Flood, More data is, Bank telecom other.

business transactions, Scientific data, astronomy biology etc. Web text and e, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Big Data Examples, Europe s Very Long Baseline Interferometry. VLBI has 16 telescopes each of which produces, 1 Gigabit second of astronomical data over a 25. day observation session, storage and analysis a big problem.

AT T handles billions of calls per day, so much data it cannot be all stored analysis has to. be done on the fly on streaming data, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Largest databases in 2003, Commercial databases, Winter Corp 2003 Survey France Telecom has largest. decision support DB 30TB AT T 26 TB, Alexa internet archive 7 years of data 500 TB. Google searches 4 Billion pages many hundreds TB, IBM WebFountain 160 TB 2003.

Internet Archive www archive org 300 TB, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Growth Trends, Moore s law, computer speed doubles every 18. Storage law, total storage doubles every 9 months, exabytes million terabytes of new data. are created every year, huge DBs telecom AT T astronomy. Consequence, very little data will ever be looked at.

by a human, data flood information overload, DM KDD is needed to make. sense and use of data, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Largest Database Data Mined June 2006, Data miners are tackling much larger databases in 2006 The. median value for the largest database size is between 1 1 and 10. Gigabytes and 12 report mining terabyte size databases. http www kdnuggets com polls 2006 largest database mined htm. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Data Mining Reference Disciplines, Statistics, Technology. Data Mining Visualization, Information Other, Science Disciplines.

TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Brief History of Time, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Brief History, 1800 Statistics starts, Benjamin Disraeli later quoted by Mark Twain said. There are three kinds of lies lies damned lies, and statistics. And now comes Data Mining, 1985 machine learning starts. 1990 data mining starts, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction.

UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. False Positives in Astronomy, cartoon used with permission. Copyright 2003 KDnuggets, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Evolution of Database Technology, Data collection database creation IMS and network DBMS. Relational data model relational DBMS implementation. RDBMS advanced data models extended relational OO, deductive etc and application oriented DBMS spatial. scientific engineering etc, 1990s 2000s, Data mining and data warehousing multimedia databases.

and Web databases, TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. Many Names of Data Mining, Data Fishing Data Dredging 1960. used by Statistician as bad name, Data Mining 1990. used DB business, in 2003 bad image because of TIA. Knowledge Discovery in Databases 1989, used by AI Machine Learning Community.

also Data Archaeology Information Harvesting, Information Discovery Knowledge Extraction. Currently Data Mining and Knowledge Discovery, are used interchangeably. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction. UNIVERSITY OF JYV SKYL DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY. A Brief History of Data Mining Society, 1989 IJCAI Workshop on KDD Piatetsky Shapiro. Knowledge Discovery in Databases G Piatetsky Shapiro and W. Frawley 1991, 1991 1994 Workshops on KDD, Advances in Knowledge Discovery and Data Mining U Fayyad G. Piatetsky Shapiro P Smyth and R Uthurusamy 1996, 1995 1998 International Conferences on Knowledge.

Discovery in Databases and Data Mining KDD 95 98, Journal of Data Mining and Knowledge Discovery 1997. 1998 ACM SIGKDD SIGKDD 1999 2001 conferences and, SIGKDD Explorations. TIES443 Introduction to DM Lecture 1 Course Overview and Introduction Introduction to Data Mining Department of Mathematical Information Technology