Natural Language Processing with Python (2009).pdf
(
3159 KB
)
Pobierz
Natural Language Processing
with Python
Steven Bird, Ewan Klein, and Edward Loper
Beijing
•
Cambridge
•
Farnham
•
Köln
•
Sebastopol
•
Taipei
•
Tokyo
Natural Language Processing with Python
by Steven Bird, Ewan Klein, and Edward Loper
Copyright © 2009 Steven Bird, Ewan Klein, and Edward Loper. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions
are also available for most titles (http://my.safaribooksonline.com). For more information, contact our
corporate/institutional sales department: (800) 998-9938 or
corporate@oreilly.com.
Editor:
Julie Steele
Production Editor:
Loranah Dimant
Copyeditor:
Genevieve d’Entremont
Proofreader:
Loranah Dimant
Printing History:
June 2009:
First Edition.
Indexer:
Ellen Troutman Zaig
Cover Designer:
Karen Montgomery
Interior Designer:
David Futato
Illustrator:
Robert Romano
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of
O’Reilly Media, Inc.
Natural Language Processing with Python,
the image of a right whale, and related
trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a
trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information con-
tained herein.
ISBN: 978-0-596-51649-9
[M]
1244726609
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Language Processing and Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Computing with Language: Texts and Words
A Closer Look at Python: Texts as Lists of Words
Computing with Language: Simple Statistics
Back to Python: Making Decisions and Taking Control
Automatic Natural Language Understanding
Summary
Further Reading
Exercises
1
10
16
22
27
33
34
35
2. Accessing Text Corpora and Lexical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Accessing Text Corpora
Conditional Frequency Distributions
More Python: Reusing Code
Lexical Resources
WordNet
Summary
Further Reading
Exercises
39
52
56
59
67
73
73
74
3. Processing Raw Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
Accessing Text from the Web and from Disk
Strings: Text Processing at the Lowest Level
Text Processing with Unicode
Regular Expressions for Detecting Word Patterns
Useful Applications of Regular Expressions
Normalizing Text
Regular Expressions for Tokenizing Text
Segmentation
Formatting: From Lists to Strings
80
87
93
97
102
107
109
112
116
v
3.10 Summary
3.11 Further Reading
3.12 Exercises
121
122
123
4. Writing Structured Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
Back to the Basics
Sequences
Questions of Style
Functions: The Foundation of Structured Programming
Doing More with Functions
Program Development
Algorithm Design
A Sample of Python Libraries
Summary
Further Reading
Exercises
130
133
138
142
149
154
160
167
172
173
173
5. Categorizing and Tagging Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
Using a Tagger
Tagged Corpora
Mapping Words to Properties Using Python Dictionaries
Automatic Tagging
N-Gram Tagging
Transformation-Based Tagging
How to Determine the Category of a Word
Summary
Further Reading
Exercises
179
181
189
198
202
208
210
213
214
215
6. Learning to Classify Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
Supervised Classification
Further Examples of Supervised Classification
Evaluation
Decision Trees
Naive Bayes Classifiers
Maximum Entropy Classifiers
Modeling Linguistic Patterns
Summary
Further Reading
Exercises
221
233
237
242
245
250
254
256
256
257
7. Extracting Information from Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.1 Information Extraction
vi | Table of Contents
261
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
Chunking
Developing and Evaluating Chunkers
Recursion in Linguistic Structure
Named Entity Recognition
Relation Extraction
Summary
Further Reading
Exercises
264
270
277
281
284
285
286
286
8. Analyzing Sentence Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
Some Grammatical Dilemmas
What’s the Use of Syntax?
Context-Free Grammar
Parsing with Context-Free Grammar
Dependencies and Dependency Grammar
Grammar Development
Summary
Further Reading
Exercises
292
295
298
302
310
315
321
322
322
9. Building Feature-Based Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.1
9.2
9.3
9.4
9.5
9.6
Grammatical Features
Processing Feature Structures
Extending a Feature-Based Grammar
Summary
Further Reading
Exercises
327
337
344
356
357
358
10. Analyzing the Meaning of Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
Natural Language Understanding
Propositional Logic
First-Order Logic
The Semantics of English Sentences
Discourse Semantics
Summary
Further Reading
Exercises
361
368
372
385
397
402
403
404
11. Managing Linguistic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
11.1
11.2
11.3
11.4
Corpus Structure: A Case Study
The Life Cycle of a Corpus
Acquiring Data
Working with XML
407
412
416
425
Table of Contents | vii
Plik z chomika:
DobryChomik99
Inne pliki z tego folderu:
A Byte of Python, v1.20 (for Python 2.x) (2005).pdf
(337 KB)
A Byte of Python, v1.92 (for Python 3.0) (2009).pdf
(608 KB)
A Learner's Guide to Programming Using the Python Language (2009).pdf
(17233 KB)
A Primer on Scientific Programming with Python (2009).pdf
(6983 KB)
Beginning Game Development with Python and Pygame - From Novice to Professional (2007).pdf
(7814 KB)
Inne foldery tego chomika:
android
beletrystyka
cloud computing
coffeescript & node.js
computer vision & gpu programming
Zgłoś jeśli
naruszono regulamin