Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics
Author: Atanu Biswas
With the increasing use of biotechnology in medical research and the sophisticated advances in computing, it has become essential for practitioners in the biomedical sciences to be fully educated on the role statistics plays in ensuring the accurate analysis of research findings. Statistical Advances in the Biomedical Sciences explores the growing value of statistical knowledge in the management and comprehension of medical research and, more specifically, provides an accessible introduction to the contemporary methodologies used to understand complex problems in the four major areas of modern-day biomedical science: clinical trials, epidemiology, survival analysis, and bioinformatics.
Composed of contributions from eminent researchers in the field, this volume discusses the application of statistical techniques to various aspects of modern medical research and illustrates how these methods ultimately prove to be an indispensable part of proper data collection and analysis. A structural uniformity is maintained across all chapters, each beginning with an introduction that discusses general concepts and the biomedical problem under focus and is followed by specific details on the associated methods, algorithms, and applications. In addition, each chapter provides a summary of the main ideas and offers a concluding remarks section that presents novel ideas, approaches, and challenges for future research.
Doody Review Services
Reviewer:James C. Torner, MS, PhD(University of Iowa College of Public Health)
Description:This contemporary compilation of chapters on the main areas of biomedical research provides an overview of the state of analytical methods currently used and offers insights on future methods.
Purpose:This book is designed to highlight major contributions to methodology in topical areas, providing insight into methods and methodological issues. It does not provide mathematical details, which is not necessary given that the chapters are well referenced. It provides an excellent overview of contemporary statistical methods, meeting the authors' objectives of providing a compilation of key methods in biomedical science research.
Audience:This book is intended for advanced students or researchers. A basic understanding of common statistical methods is needed. The chapters are organized by subject matter so readers can find the methodology appropriate to their area. The authors for individual chapters are recognized contributors to the field.
Features:Areas covered include clinical trial design, epidemiology modeling, survival analysis, genetic analysis, bioinformatics, and other modeling topics. The book also focuses on contemporary methods. The outstanding organization and coverage is the best part of the book. The inconsistent use of examples is a shortcoming.
Assessment:This will be one of the landmark books that define the state of methods. It will provide an introduction and reference for advanced methods and will be a launching pad for future method development. It is challenging for the reader and may take severalattempts and literature searches to use the methods, but the book lays the foundation. This is an impressive collection of topical methods.
Books about: Partisan Politics Divided Government and the Economy or Personal Finance8th Edition
Web Data Mining
Author: Bing Liu
Web mining aims to discover useful information and knowledge from the Web hyperlink structure, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semistructured and unstructured nature of the Web data and its heterogeneity. It has also developed many of its own algorithms and techniques.
Liu has written a comprehensive text on Web data mining. Key topics of structure mining, content mining, and usage mining are covered both in breadth and in depth. His book brings together all the essential concepts and algorithms from related areas such as data mining, machine learning, and text processing to form an authoritative and coherent text.
The book offers a rich blend of theory and practice, addressing seminal research ideas, as well as examining the technology from a practical point of view. It is suitable for students, researchers and practitioners interested in Web mining both as a learning text and a reference book. Lecturers can readily use it for classes on data mining, Web mining, and Web search. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
Table of Contents:
Introduction 1
What is the World Wide Web? 1
A Brief History of the Web and the Internet 2
Web Data Mining 4
What is Data Mining? 6
What is Web Mining? 6
Summary of Chapters 8
How to Read this Book 11
Bibliographic Notes 12
Data Mining Foundations
Association Rules and Sequential Patterns 13
Basic Concepts of Association Rules 13
Apriori Algorithm 16
Frequent Itemset Generation 16
Association Rule Generation 20
Data Formats for Association Rule Mining 22
Mining with Multiple Minimum Supports 22
Extended Model 24
Mining Algorithm 26
Rule Generation 31
Mining Class Association Rules 32
Problem Definition 32
Mining Algorithm 34
Mining with Multiple Minimum Supports 37
Basic Concepts of Sequential Patterns 37
Mining Sequential Patterns Based on GSP 39
GSP Algorithm 39
Mining with Multiple Minimum Supports 41
Mining Sequential Patterns Basedon PrefixSpan 45
PrefixSpan Algorithm 46
Mining with Multiple Minimum Supports 48
Generating Rules from Sequential Patterns 49
Sequential Rules 50
Label Sequential Rules 50
Class Sequential Rules 51
Bibliographic Notes 52
Supervised Learning 55
Basic Concepts 55
Decision Tree Induction 59
Learning Algorithm 62
Impurity Function 63
Handling of Continuous Attributes 67
Some Other Issues 68
Classifier Evaluation 71
Evaluation Methods 71
Precision, Recall, F-score and Breakeven Point 73
Rule Induction 75
Sequential Covering 75
Rule Learning: Learn-One-Rule Function 78
Discussion 81
Classification Based on Associations 81
Classification Using Class Association Rules 82
Class-Association Rules as Features 86
Classification Using Normal Association Rules 86
Naive Bayesian Classification 87
Naive Bayesian Text Classification 91
Probabilistic Framework 92
Naive Bayesian Model 93
Discussion 96
Support Vector Machines 97
Linear SVM: Separable Case 99
Linear SVM: Non-Separable Case 105
Nonlinear SVM: Kernel Functions 108
K-Nearest Neighbor Learning 112
Ensemble of Classifiers 113
Bagging 114
Boosting 114
Bibliographic Notes 115
Unsupervised Learning 117
Basic Concepts 117
K-means Clustering 120
K-means Algorithm 120
Disk Version of the K-means Algorithm 123
Strengths and Weaknesses 124
Representation of Clusters 128
Common Ways of Representing Clusters 129
Clusters of Arbitrary Shapes 130
Hierarchical Clustering 131
Single-Link Method 133
Complete-Link Method 133
Average-Link Method 134
Strengths and Weaknesses 134
Distance Functions 135
Numeric Attributes 135
Binary and Nominal Attributes 136
Text Documents 138
Data Standardization 139
Handling of Mixed Attributes 141
Which Clustering Algorithm to Use? 143
Cluster Evaluation 143
Discovering Holes and Data Regions 146
Bibliographic Notes 149
Partially Supervised Learning 151
Learning from Labeled and Unlabeled Examples 151
EM Algorithm with Naive Bayesian Classification 153
Co-Training 156
Self-Training 158
Transductive Support Vector Machines 159
Graph-Based Methods 160
Discussion 164
Learning from Positive and Unlabeled Examples 165
Applications of PU Learning 165
Theoretical Foundation 168
Building Classifiers: Two-Step Approach 169
Building Classifiers: Direct Approach 175
Discussion 178
Derivation of EM for Naive Bayesian Classification 179
Bibliographic Notes 181
Web Mining
Information Retrieval and Web Search 183
Basic Concepts of Information Retrieval 184
Information Retrieval Models 187
Boolean Model 188
Vector Space Model 188
Statistical Language Model 191
Relevance Feedback 192
Evaluation Measures 195
Text and Web Page Pre-Processing 199
Stopword Removal 199
Stemming 200
Other Pre-Processing Tasks for Text 200
Web Page Pre-Processing 201
Duplicate Detection 203
Inverted Index and Its Compression 204
Inverted Index 204
Search Using an Inverted Index 206
Index Construction 207
Index Compression 209
Latent Semantic Indexing 215
Singular Value Decomposition 215
Query and Retrieval 218
An Example 219
Discussion 221
Web Search 222
Meta-Search: Combining Multiple Rankings 225
Combination Using Similarity Scores 226
Combination Using Rank Positions 227
Web Spamming 229
Content Spamming 230
Link Spamming 231
Hiding Techniques 233
Combating Spam 234
Bibliographic Notes 235
Link Analysis 237
Social Network Analysis 238
Centrality 238
Prestige 241
Co-Citation and Bibliographic Coupling 243
Co-Citation 244
Bibliographic Coupling 245
PageRank 245
PageRank Algorithm 246
Strengths and Weaknesses of PageRank 253
Timed PageRank 254
Hits 255
Hits Algorithm 256
Finding Other Eigenvectors 259
Relationships with Co-Citation and Bibliographic Coupling 259
Strengths and Weaknesses of Hits 260
Community Discovery 261
Problem Definition 262
Bipartite Core Communities 264
Maximum Flow Communities 265
Email Communities Based on Betweenness 268
Overlapping Communities of Named Entities 270
Bibliographic Notes 271
Web Crawling 273
A Basic Crawler Algorithm 274
Breadth-First Crawlers 275
Preferential Crawlers 276
Implementation Issues 277
Fetching 277
Parsing 278
Stopword Removal and Stemming 280
Link Extraction and Canonicalization 280
Spider Traps 282
Page Repository 283
Concurrency 284
Universal Crawlers 285
Scalability 286
Coverage vs Freshness vs Importance 288
Focused Crawlers 289
Topical Crawlers 292
Topical Locality and Cues 294
Best-First Variations 300
Adaptation 303
Evaluation 310
Crawler Ethics and Conflicts 315
Some New Developments 318
Bibliographic Notes 320
Structured Data Extraction: Wrapper Generation 323
Preliminaries 324
Two Types of Data Rich Pages 324
Data Model 326
HTML Mark-Up Encoding of Data Instances 328
Wrapper Induction 330
Extraction from a Page 330
Learning Extraction Rules 333
Identifying Informative Examples 337
Wrapper Maintenance 338
Instance-Based Wrapper Learning 338
Automatic Wrapper Generation: Problems 341
Two Extraction Problems 342
Patterns as Regular Expressions 343
String Matching and Tree Matching 344
String Edit Distance 344
Tree Matching 346
Multiple Alignment 350
Center Star Method 350
Partial Tree Alignment 351
Building DOM Trees 356
Extraction Based on a Single List Page: Flat Data Records 357
Two Observations about Data Records 358
Mining Data Regions 359
Identifying Data Records in Data Regions 364
Data Item Alignment and Extraction 365
Making Use of Visual Information 366
Some Other Techniques 366
Extraction Based on a Single List Page: Nested Data Records 367
Extraction Based on Multiple Pages 373
Using Techniques in Previous Sections 373
RoadRunner Algorithm 374
Some Other Issues 375
Extraction from Other Pages 375
Disjunction or Optional 376
A Set Type or a Tuple Type 377
Labeling and Integration 378
Domain Specific Extraction 378
Discussion 379
Bibliographic Notes 379
Information Integration 381
Introduction to Schema Matching 382
Pre-Processing for Schema Matching 384
Schema-Level Match 385
Linguistic Approaches 385
Constraint Based Approaches 386
Domain and Instance-Level Matching 387
Combining Similarities 390
1:m Match 391
Some Other Issues 392
Reuse of Previous Match Results 392
Matching a Large Number of Schemas 393
Schema Match Results 393
User Interactions 394
Integration of Web Query Interfaces 394
A Clustering Based Approach 397
A Correlation Based Approach 400
An Instance Based Approach 403
Constructing a Unified Global Query Interface 406
Structural Appropriateness and the Merge Algorithm 406
Lexical Appropriateness 408
Instance Appropriateness 409
Bibliographic Notes 410
Opinion Mining 411
Sentiment Classification 412
Classification Based on Sentiment Phrases 413
Classification Using Text Classification Methods 415
Classification Using a Score Function 416
Feature-Based Opinion Mining and Summarization 417
Problem Definition 418
Object Feature Extraction 424
Feature Extraction from Pros and Cons of Format 1 425
Feature Extraction from Reviews of of Formats 2 and 3 429
Opinion Orientation Classification 430
Comparative Sentence and Relation Mining 432
Problem Definition 433
Identification of Gradable Comparative Sentences 435
Extraction of Comparative Relations 437
Opinion Search 439
Opinion Spam 441
Objectives and Actions of Opinion Spamming 441
Types of Spam and Spammers 442
Hiding Techniques 443
Spam Detection 444
Bibliographic Notes 446
Web Usage Mining 449
Data Collection and Pre-Processing 450
Sources and Types of Data 452
Key Elements of Web Usage Data Pre-Processing 455
Data Modeling for Web Usage Mining 462
Discovery and Analysis of Web Usage Patterns 466
Session and Visitor Analysis 466
Cluster Analysis and Visitor Segmentation 467
Association and Correlation Analysis 471
Analysis of Sequential and Navigational Patterns 475
Classification and Prediction Based on Web User Transactions 479
Discussion and Outlook 482
Bibliographic Notes 482
References 485
Index 517