034_LDA_20NewsGroupsSmall(Scala)

Topic Modeling with Latent Dirichlet Allocation

This is an augmentation of a notebook from Databricks Guide.
This notebook will provide a brief algorithm summary, links for further reading, and an example of how to use LDA for Topic Modeling.

Algorithm Summary

Intro to LDA by David Blei

Watch at least the first 25 or so minutes of this video by David Blei on a crash introduction to topic modeling via Latent Dirichlet Allocation (LDA).

AJ's What is Collaborative Filtering

Readings for LDA

Also read the methodological and more formal papers cited in the above links if you want to know more.

Let's get a bird's eye view of LDA from http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf next.

  • See pictures (hopefully you read the paper last night!)
  • Algorithm of the generative model (this is unsupervised clustering)
  • For a careful introduction to the topic see Section 27.3 and 27.4 (pages 950-970) pf Murphy's Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
  • We will be quite application focussed or applied here!
  • Understand Expectation Maximization Algorithm read Section 8.5 The EM Algorithm in The Elements of Statistical Learning by Hastie, Tibshirani and Freidman (2001, Springer Series in Statistics). Read from free 21MB PDF of the book available from here https://web.stanford.edu/~hastie/Papers/ESLII.pdf or from its backup here http://lamastex.org/research_events/Readings/StatLearn/ESLII.pdf.
Show code
Show code
Show code

Probabilistic Topic Modeling Example

This is an outline of our Topic Modeling workflow. Feel free to jump to any subtopic to find out more.

  • Step 0. Dataset Review
  • Step 1. Downloading and Loading Data into DBFS
    • (Step 1. only needs to be done once per shard - see details at the end of the notebook for Step 1.)
  • Step 2. Loading the Data and Data Cleaning
  • Step 3. Text Tokenization
  • Step 4. Remove Stopwords
  • Step 5. Vector of Token Counts
  • Step 6. Create LDA model with Online Variational Bayes
  • Step 7. Review Topics
  • Step 8. Model Tuning - Refilter Stopwords
  • Step 9. Create LDA model with Expectation Maximization
  • Step 10. Visualize Results

Step 0. Dataset Review

In this example, we will use the mini 20 Newsgroups dataset, which is a random subset of the original 20 Newsgroups dataset. Each newsgroup is stored in a subdirectory, with each article stored as a separate file.



The following is the markdown file 20newsgroups.data.md of the original details on the dataset, obtained as follows:

$ wget -k http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.data.html
--2016-04-07 10:31:51--  http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.data.html
Resolving kdd.ics.uci.edu (kdd.ics.uci.edu)... 128.195.1.95
Connecting to kdd.ics.uci.edu (kdd.ics.uci.edu)|128.195.1.95|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4371 (4.3K) [text/html]
Saving to: '20newsgroups.data.html’

100%[======================================>] 4,371       --.-K/s   in 0s      

2016-04-07 10:31:51 (195 MB/s) - '20newsgroups.data.html’ saved [4371/4371]

Converting 20newsgroups.data.html... nothing to do.
Converted 1 files in 0 seconds.

$ pandoc -f html -t markdown 20newsgroups.data.html > 20newsgroups.data.md

20 Newsgroups

Data Type

text

Abstract

This data set consists of 20000 messages taken from 20 newsgroups.

Sources

Original Owner and Donor
Tom Mitchell
School of Computer Science
Carnegie Mellon University
tom.mitchell@cmu.edu

Date Donated: September 9, 1999

Data Characteristics

One thousand Usenet articles were taken from each of the following 20 newsgroups.

    alt.atheism
    comp.graphics
    comp.os.ms-windows.misc
    comp.sys.ibm.pc.hardware
    comp.sys.mac.hardware
    comp.windows.x
    misc.forsale
    rec.autos
    rec.motorcycles
    rec.sport.baseball
    rec.sport.hockey
    sci.crypt
    sci.electronics
    sci.med
    sci.space
    soc.religion.christian
    talk.politics.guns
    talk.politics.mideast
    talk.politics.misc
    talk.religion.misc

Approximately 4% of the articles are crossposted. The articles are typical postings and thus have headers including subject lines, signature files, and quoted portions of other articles.

Data Format

Each newsgroup is stored in a subdirectory, with each article stored as a separate file.

Past Usage

T. Mitchell. Machine Learning, McGraw Hill, 1997.

T. Joachims (1996). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, Computer Science Technical Report CMU-CS-96-118. Carnegie Mellon University.

Acknowledgements, Copyright Information, and Availability

You may use this material free of charge for any educational purpose, provided attribution is given in any lectures or publications that make use of this material.

References and Further Information

Naive Bayes code for text classification is available from: http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html


The UCI KDD Archive \ Information and Computer Science \ University of California, Irvine \ Irvine, CA 92697-3425 \

Last modified: September 9, 1999



NOTE: The mini dataset consists of 100 articles from the following 20 Usenet newsgroups:

alt.atheism
comp.graphics
comp.os.ms-windows.misc
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
misc.forsale
rec.autos
rec.motorcycles
rec.sport.baseball
rec.sport.hockey
sci.crypt
sci.electronics
sci.med
sci.space
soc.religion.christian
talk.politics.guns
talk.politics.mideast
talk.politics.misc
talk.religion.misc

Some of the newsgroups seem pretty similar on first glance, such as comp.sys.ibm.pc.hardware and comp.sys.mac.hardware, which may affect our results.

NOTE: A simpler and slicker version of the analysis is available in this notebook:

Step 2. Loading the Data and Data Cleaning

We have already used the wget command to download the file, and put it in our distributed file system (this process takes about 10 minutes). To repeat these steps or to download data from another source follow the steps at the bottom of this worksheet on Step 1. Downloading and Loading Data into DBFS.

Let's make sure these files are in dbfs now:

display(dbutils.fs.ls("dbfs:/datasets/mini_newsgroups")) // this is where the data resides in dbfs (see below to download it first, if you go to a new shard!)
dbfs:/datasets/mini_newsgroups/alt.atheism/alt.atheism/0
dbfs:/datasets/mini_newsgroups/comp.graphics/comp.graphics/0
dbfs:/datasets/mini_newsgroups/comp.os.ms-windows.misc/comp.os.ms-windows.misc/0
dbfs:/datasets/mini_newsgroups/comp.sys.ibm.pc.hardware/comp.sys.ibm.pc.hardware/0
dbfs:/datasets/mini_newsgroups/comp.sys.mac.hardware/comp.sys.mac.hardware/0
dbfs:/datasets/mini_newsgroups/comp.windows.x/comp.windows.x/0
dbfs:/datasets/mini_newsgroups/misc.forsale/misc.forsale/0
dbfs:/datasets/mini_newsgroups/rec.autos/rec.autos/0
dbfs:/datasets/mini_newsgroups/rec.motorcycles/rec.motorcycles/0
dbfs:/datasets/mini_newsgroups/rec.sport.baseball/rec.sport.baseball/0
dbfs:/datasets/mini_newsgroups/rec.sport.hockey/rec.sport.hockey/0
dbfs:/datasets/mini_newsgroups/sci.crypt/sci.crypt/0
dbfs:/datasets/mini_newsgroups/sci.electronics/sci.electronics/0
dbfs:/datasets/mini_newsgroups/sci.med/sci.med/0
dbfs:/datasets/mini_newsgroups/sci.space/sci.space/0
dbfs:/datasets/mini_newsgroups/soc.religion.christian/soc.religion.christian/0
dbfs:/datasets/mini_newsgroups/talk.politics.guns/talk.politics.guns/0
dbfs:/datasets/mini_newsgroups/talk.politics.mideast/talk.politics.mideast/0
dbfs:/datasets/mini_newsgroups/talk.politics.misc/talk.politics.misc/0
dbfs:/datasets/mini_newsgroups/talk.religion.misc/talk.religion.misc/0

Now let us read in the data using wholeTextFiles().

Recall that the wholeTextFiles() command will read in the entire directory of text files, and return a key-value pair of (filePath, fileContent).

As we do not need the file paths in this example, we will apply a map function to extract the file contents, and then convert everything to lowercase.

// Load text file, leave out file paths, convert all strings to lowercase
val corpus = sc.wholeTextFiles("/datasets/mini_newsgroups/*").map(_._2).map(_.toLowerCase()).cache() // let's cache
corpus: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7860] at map at command-1805207615647603:2
corpus.count // there are 2000 documents in total - this action will take about 2 minutes
res4: Long = 2000

Review first 5 documents to get a sense for the data format.

corpus.take(5)
res5: Array[String] = Array("xref: cantaloupe.srv.cs.cmu.edu alt.atheism:51121 soc.motss:139944 rec.scouting:5318 newsgroups: alt.atheism,soc.motss,rec.scouting path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!wupost!uunet!newsgate.watson.ibm.com!yktnews.watson.ibm.com!watson!watson.ibm.com!strom from: strom@watson.ibm.com (rob strom) subject: re: [soc.motss, et al.] "princeton axes matching funds for boy scouts" sender: @watson.ibm.com message-id: <1993apr05.180116.43346@watson.ibm.com> date: mon, 05 apr 93 18:01:16 gmt distribution: usa references: <c47efs.3q47@austin.ibm.com> <1993mar22.033150.17345@cbnewsl.cb.att.com> <n4hy.93apr5120934@harder.ccr-p.ida.org> organization: ibm research lines: 15 in article <n4hy.93apr5120934@harder.ccr-p.ida.org>, n4hy@harder.ccr-p.ida.org (bob mcgwier) writes: |> [1] however, i hate economic terrorism and political correctness |> worse than i hate this policy. |> [2] a more effective approach is to stop donating |> to any organizating that directly or indirectly supports gay rights issues |> until they end the boycott on funding of scouts. can somebody reconcile the apparent contradiction between [1] and [2]? -- rob strom, strom@watson.ibm.com, (914) 784-7641 ibm research, 30 saw mill river road, p.o. box 704, yorktown heights, ny 10598 ", "path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!noc.near.net!news.centerline.com!uunet!olivea!sgigate!sgiblab!adagio.panasonic.com!nntp-server.caltech.edu!keith from: keith@cco.caltech.edu (keith allan schneider) newsgroups: alt.atheism subject: re: >>>>>>pompous ass message-id: <1pi9btinnqa5@gap.caltech.edu> date: 2 apr 93 20:57:33 gmt references: <1ou4koinne67@gap.caltech.edu> <1p72bkinnjt7@gap.caltech.edu> <93089.050046mvs104@psuvm.psu.edu> <1pa6ntinns5d@gap.caltech.edu> <1993mar30.210423.1302@bmerh85.bnr.ca> <1pcnqjinnpon@gap.caltech.edu> <kmr4.1344.733611641@po.cwru.edu> organization: california institute of technology, pasadena lines: 9 nntp-posting-host: punisher.caltech.edu kmr4@po.cwru.edu (keith m. ryan) writes: >>then why do people keep asking the same questions over and over? >because you rarely ever answer them. nope, i've answered each question posed, and most were answered multiple times. keith ", "path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!noc.near.net!news.centerline.com!uunet!olivea!sgigate!sgiblab!adagio.panasonic.com!nntp-server.caltech.edu!keith from: keith@cco.caltech.edu (keith allan schneider) newsgroups: alt.atheism subject: re: >>>>>>pompous ass message-id: <1pi9jkinnqe2@gap.caltech.edu> date: 2 apr 93 21:01:40 gmt references: <1ou4koinne67@gap.caltech.edu> <1p72bkinnjt7@gap.caltech.edu> <93089.050046mvs104@psuvm.psu.edu> <1pa6ntinns5d@gap.caltech.edu> <1993mar30.205919.26390@blaze.cs.jhu.edu> <1pcnp3innpom@gap.caltech.edu> <1pdjip$jsi@fido.asd.sgi.com> organization: california institute of technology, pasadena lines: 14 nntp-posting-host: punisher.caltech.edu livesey@solntze.wpd.sgi.com (jon livesey) writes: >>>how long does it [the motto] have to stay around before it becomes the >>>default? ... where's the cutoff point? >>i don't know where the exact cutoff is, but it is at least after a few >>years, and surely after 40 years. >why does the notion of default not take into account changes >in population makeup? specifically, which changes are you talking about? are you arguing that the motto is interpreted as offensive by a larger portion of the population now than 40 years ago? keith ", "path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!wupost!sdd.hp.com!sgiblab!adagio.panasonic.com!nntp-server.caltech.edu!keith from: keith@cco.caltech.edu (keith allan schneider) newsgroups: alt.atheism subject: re: <political atheists? date: 2 apr 1993 21:22:59 gmt organization: california institute of technology, pasadena lines: 44 message-id: <1piarjinnqsa@gap.caltech.edu> references: <1p9bseinni6o@gap.caltech.edu> <1pamva$b6j@fido.asd.sgi.com> <1pcq4pinnqp1@gap.caltech.edu> <11702@vice.ico.tek.com> nntp-posting-host: punisher.caltech.edu bobbe@vice.ico.tek.com (robert beauchaine) writes: >>but, you don't know that capital punishment is wrong, so it isn't the same >>as shooting. a better analogy would be that you continue to drive your car, >>realizing that sooner or later, someone is going to be killed in an automobile >>accident. you *know* people get killed as a result of driving, yet you >>continue to do it anyway. >uh uh. you do not know that you will be the one to do the >killing. i'm not sure i'd drive a car if i had sufficient evidence to >conclude that i would necessarily kill someone during my lifetime. yes, and everyone thinks as you do. no one thinks that he is going to cause or be involved in a fatal accident, but the likelihood is surprisingly high. just because you are the man on the firing squad whose gun is shooting blanks does not mean that you are less guilty. >i don't know about jon, but i say *all* taking of human life is >murder. and i say murder is wrong in all but one situation: when >it is the only action that will prevent another murder, either of >myself or another. you mean that killing is wrong in all but one situtation? and, you should note that that situation will never occur. there are always other options thank killing. why don't you just say that all killing is wrong. this is basically what you are saying. >i'm getting a bit tired of your probabilistic arguments. are you attempting to be condescending? >that the system usually works pretty well is small consolation to >the poor innocent bastard getting the lethal injection. is your >personal value of human life based solely on a statistical approach? >you sound like an unswerving adherent to the needs of the many >outweighing the needs of the few, so fuck the few. but, most people have found the risk to be acceptable. you are probably much more likely to die in a plane crash, or even using an electric blender, than you are to be executed as an innocent. i personally think that the risk is acceptable, but in an ideal moral system, no such risk is acceptable. "acceptable" is the fudge factor necessary in such an approximation to the ideal. keith ", "path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!husc-news.harvard.edu!kuhub.cc.ukans.edu!wupost!howland.reston.ans.net!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ursa!pooh!halat newsgroups: alt.atheism subject: re: there must be a creator! (maybe) message-id: <30066@ursa.bear.com> from: halat@pooh.bears (jim halat) date: 1 apr 93 21:24:35 gmt reply-to: halat@pooh.bears (jim halat) sender: news@bear.com references: <16ba1e927.drporter@suvm.syr.edu> lines: 24 in article <16ba1e927.drporter@suvm.syr.edu>, drporter@suvm.syr.edu (brad porter) writes: > > science is wonderful at answering most of our questions. i'm not the type >to question scientific findings very often, but... personally, i find the >theory of evolution to be unfathomable. could humans, a highly evolved, >complex organism that thinks, learns, and develops truly be an organism >that resulted from random genetic mutations and natural selection? [...stuff deleted...] computers are an excellent example...of evolution without "a" creator. we did not "create" computers. we did not create the sand that goes into the silicon that goes into the integrated circuits that go into processor board. we took these things and put them together in an interesting way. just like plants "create" oxygen using light through photosynthesis. it's a much bigger leap to talk about something that created "everything" from nothing. i find it unfathomable to resort to believing in a creator when a much simpler alternative exists: we simply are incapable of understanding our beginnings -- if there even were beginnings at all. and that's ok with me. the present keeps me perfectly busy. -jim halat ")

To review a random document in the corpus uncomment and evaluate the following cell.

corpus.takeSample(false, 1)
res6: Array[String] = Array("path: cantaloupe.srv.cs.cmu.edu!magnesium.club.cc.cmu.edu!pitt.edu!uunet!noc.near.net!howland.reston.ans.net!zaphod.mps.ohio-state.edu!uwm.edu!linac!att!princeton!ernie.princeton.edu!qpliu from: qpliu@ernie.princeton.edu (q.p.liu) newsgroups: alt.atheism subject: re: christian morality is message-id: <1993apr20.210858.23666@princeton.edu> date: 20 apr 93 21:08:58 gmt references: <11853@vice.ico.tek.com> <4949@eastman.uucp> sender: news@princeton.edu (usenet news system) reply-to: qpliu@princeton.edu organization: princeton university lines: 14 originator: news@nimaster nntp-posting-host: ernie.princeton.edu in article <4949@eastman.uucp> dps@nasa.kodak.com writes: >simple logic arguments are folly. if you read the bible you will see >that jesus made fools of those who tried to trick him with "logic". why don't you cite the passages so that we can focus on some to discuss. then, following jesus, you can make fools of us and our "logic". > if you rely simply on your reason then you will never >know more than you do now. indeed, if you can justifiably make this assertion, you must be a genius in logic and making fools of us should be that much easier. -- qpliu@princeton.edu standard opinion: opinions are delta-correlated. ")

Note that the document begins with a header containing some metadata that we don't need, and we are only interested in the body of the document. We can do a bit of simple data cleaning here by removing the metadata of each document, which reduces the noise in our dataset. This is an important step as the accuracy of our models depend greatly on the quality of data used.

// Split document by double newlines, drop the first block, combine again as a string and cache
val corpus_body = corpus.map(_.split("\\n\\n")).map(_.drop(1)).map(_.mkString(" ")).cache()
corpus_body: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7864] at map at command-1805207615647610:2
corpus_body.count() // there should still be the same count, but now without meta-data block
res7: Long = 2000

Let's review first 5 documents with metadata removed.

corpus_body.take(5)
res8: Array[String] = Array("in article <n4hy.93apr5120934@harder.ccr-p.ida.org>, n4hy@harder.ccr-p.ida.org (bob mcgwier) writes: |> [1] however, i hate economic terrorism and political correctness |> worse than i hate this policy. |> [2] a more effective approach is to stop donating |> to any organizating that directly or indirectly supports gay rights issues |> until they end the boycott on funding of scouts. can somebody reconcile the apparent contradiction between [1] and [2]? -- rob strom, strom@watson.ibm.com, (914) 784-7641 ibm research, 30 saw mill river road, p.o. box 704, yorktown heights, ny 10598 ", "kmr4@po.cwru.edu (keith m. ryan) writes: >>then why do people keep asking the same questions over and over? >because you rarely ever answer them. nope, i've answered each question posed, and most were answered multiple times. keith ", "livesey@solntze.wpd.sgi.com (jon livesey) writes: >>>how long does it [the motto] have to stay around before it becomes the >>>default? ... where's the cutoff point? >>i don't know where the exact cutoff is, but it is at least after a few >>years, and surely after 40 years. >why does the notion of default not take into account changes >in population makeup? specifically, which changes are you talking about? are you arguing that the motto is interpreted as offensive by a larger portion of the population now than 40 years ago? keith ", "bobbe@vice.ico.tek.com (robert beauchaine) writes: >>but, you don't know that capital punishment is wrong, so it isn't the same >>as shooting. a better analogy would be that you continue to drive your car, >>realizing that sooner or later, someone is going to be killed in an automobile >>accident. you *know* people get killed as a result of driving, yet you >>continue to do it anyway. >uh uh. you do not know that you will be the one to do the >killing. i'm not sure i'd drive a car if i had sufficient evidence to >conclude that i would necessarily kill someone during my lifetime. yes, and everyone thinks as you do. no one thinks that he is going to cause or be involved in a fatal accident, but the likelihood is surprisingly high. just because you are the man on the firing squad whose gun is shooting blanks does not mean that you are less guilty. >i don't know about jon, but i say *all* taking of human life is >murder. and i say murder is wrong in all but one situation: when >it is the only action that will prevent another murder, either of >myself or another. you mean that killing is wrong in all but one situtation? and, you should note that that situation will never occur. there are always other options thank killing. why don't you just say that all killing is wrong. this is basically what you are saying. >i'm getting a bit tired of your probabilistic arguments. are you attempting to be condescending? >that the system usually works pretty well is small consolation to >the poor innocent bastard getting the lethal injection. is your >personal value of human life based solely on a statistical approach? >you sound like an unswerving adherent to the needs of the many >outweighing the needs of the few, so fuck the few. but, most people have found the risk to be acceptable. you are probably much more likely to die in a plane crash, or even using an electric blender, than you are to be executed as an innocent. i personally think that the risk is acceptable, but in an ideal moral system, no such risk is acceptable. "acceptable" is the fudge factor necessary in such an approximation to the ideal. keith ", in article <16ba1e927.drporter@suvm.syr.edu>, drporter@suvm.syr.edu (brad porter) writes: > > science is wonderful at answering most of our questions. i'm not the type >to question scientific findings very often, but... personally, i find the >theory of evolution to be unfathomable. could humans, a highly evolved, >complex organism that thinks, learns, and develops truly be an organism >that resulted from random genetic mutations and natural selection? [...stuff deleted...] computers are an excellent example...of evolution without "a" creator. we did not "create" computers. we did not create the sand that goes into the silicon that goes into the integrated circuits that go into processor board. we took these things and put them together in an interesting way. just like plants "create" oxygen using light through photosynthesis. it's a much bigger leap to talk about something that created "everything" from nothing. i find it unfathomable to resort to believing in a creator when a much simpler alternative exists: we simply are incapable of understanding our beginnings -- if there even were beginnings at all. and that's ok with me. the present keeps me perfectly busy. -jim halat)

Feature extraction and transformation APIs

To use the convenient Feature extraction and transformation APIs, we will convert our RDD into a DataFrame.

We will also create an ID for every document using zipWithIndex

// Convert RDD to DF with ID for every document 
val corpus_df = corpus_body.zipWithIndex.toDF("corpus", "id")
corpus_df: org.apache.spark.sql.DataFrame = [corpus: string, id: bigint]
//display(corpus_df) // uncomment to see corpus 
// this was commented out after a member of the new group requested to remain anonymous on 20160525

Step 3. Text Tokenization

We will use the RegexTokenizer to split each document into tokens. We can setMinTokenLength() here to indicate a minimum token length, and filter away all tokens that fall below the minimum.

import org.apache.spark.ml.feature.RegexTokenizer

// Set params for RegexTokenizer
val tokenizer = new RegexTokenizer()
.setPattern("[\\W_]+") // break by white space character(s)  - try to remove emails and other patterns
.setMinTokenLength(4) // Filter away tokens with length < 4
.setInputCol("corpus") // name of the input column
.setOutputCol("tokens") // name of the output column

// Tokenize document
val tokenized_df = tokenizer.transform(corpus_df)
import org.apache.spark.ml.feature.RegexTokenizer tokenizer: org.apache.spark.ml.feature.RegexTokenizer = regexTok_0c0f0302da88 tokenized_df: org.apache.spark.sql.DataFrame = [corpus: string, id: bigint ... 1 more field]
//display(tokenized_df) // uncomment to see tokenized_df 
// this was commented out after a member of the new group requested to remain anonymous on 20160525
display(tokenized_df.select("tokens"))
["article","n4hy","93apr5120934","harder","n4hy","harder","mcgwier","writes","however","hate","economic","terrorism","political","correctness","worse","than","hate","this","policy","more","effective","approach","stop","donating","organizating","that","directly","indirectly","supports","rights","issues","until","they","boycott","funding","scouts","somebody","reconcile","apparent","contradiction","between","strom","strom","watson","7641","research","mill","river","road","yorktown","heights","10598"]
["kmr4","cwru","keith","ryan","writes","then","people","keep","asking","same","questions","over","over","because","rarely","ever","answer","them","nope","answered","each","question","posed","most","were","answered","multiple","times","keith"]
["livesey","solntze","livesey","writes","long","does","motto","have","stay","around","before","becomes","default","where","cutoff","point","know","where","exact","cutoff","least","after","years","surely","after","years","does","notion","default","take","into","account","changes","population","makeup","specifically","which","changes","talking","about","arguing","that","motto","interpreted","offensive","larger","portion","population","than","years","keith"]
["bobbe","vice","robert","beauchaine","writes","know","that","capital","punishment","wrong","same","shooting","better","analogy","would","that","continue","drive","your","realizing","that","sooner","later","someone","going","killed","automobile","accident","know","people","killed","result","driving","continue","anyway","know","that","will","killing","sure","drive","sufficient","evidence","conclude","that","would","necessarily","kill","someone","during","lifetime","everyone","thinks","thinks","that","going","cause","involved","fatal","accident","likelihood","surprisingly","high","just","because","firing","squad","whose","shooting","blanks","does","mean","that","less","guilty","know","about","taking","human","life","murder","murder","wrong","situation","when","only","action","that","will","prevent","another","murder","either","myself","another","mean","that","killing","wrong","situtation","should","note","that","that","situation","will","never","occur","there","always","other","options","thank","killing","just","that","killing","wrong","this","basically","what","saying","getting","tired","your","probabilistic","arguments","attempting","condescending","that","system","usually","works","pretty","well","small","consolation","poor","innocent","bastard","getting","lethal","injection","your","personal","value","human","life","based","solely","statistical","approach","sound","like","unswerving","adherent","needs","many","outweighing","needs","fuck","most","people","have","found","risk","acceptable","probably","much","more","likely","plane","crash","even","using","electric","blender","than","executed","innocent","personally","think","that","risk","acceptable","ideal","moral","system","such","risk","acceptable","acceptable","fudge","factor","necessary","such","approximation","ideal","keith"]
["article","16ba1e927","drporter","suvm","drporter","suvm","brad","porter","writes","science","wonderful","answering","most","questions","type","question","scientific","findings","very","often","personally","find","theory","evolution","unfathomable","could","humans","highly","evolved","complex","organism","that","thinks","learns","develops","truly","organism","that","resulted","from","random","genetic","mutations","natural","selection","stuff","deleted","computers","excellent","example","evolution","without","creator","create","computers","create","sand","that","goes","into","silicon","that","goes","into","integrated","circuits","that","into","processor","board","took","these","things","them","together","interesting","just","like","plants","create","oxygen","using","light","through","photosynthesis","much","bigger","leap","talk","about","something","that","created","everything","from","nothing","find","unfathomable","resort","believing","creator","when","much","simpler","alternative","exists","simply","incapable","understanding","beginnings","there","even","were","beginnings","that","with","present","keeps","perfectly","busy","halat"]
["livesey","solntze","livesey","writes","along","comes","keith","schneider","says","here","objective","moral","system","then","start","about","definitions","that","this","objective","system","depends","predictably","whole","thing","falls","apart","only","falls","apart","attempt","apply","this","doesn","mean","that","objective","system","exist","just","means","that","cannot","implemented","keith"]
["article","1993apr3","214741","14026","ultb","snm6394","ultb","mozumder","writes","claim","that","person","that","committs","crime","doesn","believe","moment","that","crime","committed","least","whether","they","originally","believers","believe","good","your","statistics","indicate","people","that","have","declared","atheism","doubtless","when","atheist","does","charity","they","temporarily","become","baptist"]
["article","1ph4c8","shrike","dace","shrike","dace","writes","herb","huston","huston","access","digex","wrote","actually","cannibalism","quite","widespread","favorite","examples","sand","sharks","mackerel","sharks","fetuses","begin","cannibalizing","each","other","that","eventually","born","enters","with","full","stomache","would","like","some","more","gruesome","examples","fair","enough","pretty","well","aware","examples","used","mine","were","very","rapidly","thoughtlessly","pulled","thin","point","making","that","cannibalism","doesn","imply","value","over","other","animals","something","happen","while","wasn","looking","when","homo","sapiens","become","cannibalistic","herb","huston","huston","access","digex"]
["date","1993","from","stilgar","west","next02cville","article","kmr4","1422","733983061","cwru","kmr4","cwru","keith","ryan","writes","article","1993apr5","025924","11361","west","next02cville","stilgar","writes","illiad","undisputed","word","prove","wrong","dispute","ergo","counter","example","proven","wrong","dispute","your","counter","example","ergo","counter","counter","example","wrong","right","nanny","nanny","tbbbbbbbtttttthhhhh","this","looks","like","serious","case","temporary","islam"]
["article","1993apr5","024626","19942","ultb","snm6394","ultb","mozumder","writes","peace","bobby","this","hell","your","until","learn","what","stands","really","mean","beauchaine","bobbe","vice","they","said","that","queens","could","stay","they","blew","bronx","away","sank","manhattan"]
["article","114127","jaeger","buphy","gregg","jaeger","writes","understand","point","this","petty","sarcasm","basic","principle","islam","that","born","muslim","says","testify","that","there","mohammad","prophet","that","long","does","explicitly","reject","islam","word","then","must","considered","muslim","muslims","phenomenon","attempting","make","into","general","rule","psychology","direct","odds","with","basic","islamic","principles","want","attack","islam","could","better","than","than","argue","against","something","that","islam","explicitly","contradicts","then","mozumder","incorrect","when","says","that","when","committing","acts","people","temporarily","become","atheists"]
["article","930404","112127","rusnews","w165w","mantis","mathew","mantis","mathew","writes","livesey","solntze","livesey","writes","meaning","people","drive","accept","risks","doing","contribute","money","design","systems","minimize","those","risks","already","have","systems","minimize","those","risks","just","that","drivers","want","them","they","called","bicycles","trains","buses","poor","matthew","million","posters","call","drivers","chooses","owner"]
["article","1993apr3","195642","25261","njitgw","njit","dmu5391","hertz","njit","david","utidjian","writes","article","31mar199321091163","juliet","caltech","juliet","caltech","henling","lawrence","writes","complete","description","what","atheism","agnosticism","atheism","answers","think","utidjian","remarque","berkeley","apologize","posting","this","thought","only","going","talk","origins","also","took","definitions","from","1938","websters","nonetheless","apparent","past","arguments","over","these","words","imply","that","like","bimonthly","biweekly","they","have","commonly","accepted","definitions","should","used","with","care","larry","henling","shakes","caltech"]
["article","1993apr2","223248","19014","princeton","qpliu","princeton","writes","article","1993apr2","115300","batman","jbrown","batman","writes","created","lucifer","with","perfect","nature","gave","along","with","other","angels","free","moral","will","could","have","prevented","lucifer","fall","taking","away","ability","choose","between","moral","alternatives","worship","worship","himself","lucifer","moral","choices","determined","will","what","determines","what","will","qpliu","princeton","standard","opinion","opinions","delta","correlated","bobby","posts","said","that","lucifer","free","will","from","above","seems","believes","contrary","talking","about","same","lucifer","suggest","experiment","determine","which","wrong","claim","that","both","right","norman"]
["article","kmr4","1433","734039535","cwru","kmr4","cwru","keith","ryan","writes","article","1993apr5","163050","13308","west","next02cville","stilgar","writes","article","kmr4","1422","733983061","cwru","kmr4","cwru","keith","ryan","writes","article","1993apr5","025924","11361","west","next02cville","stilgar","writes","illiad","undisputed","word","prove","wrong","dispute","ergo","counter","example","proven","wrong","dispute","your","counter","example","ergo","counter","counter","example","wrong","right","nanny","nanny","tbbbbbbbtttttthhhhh","premis","stated","that","undisputed","fine","illiad","word","disputed","dispute","that","matter","prove","wrong","brian","west","this","file","earth","have","been","this","file","here","blink","file","were","gone","tomorrow","posted","west","would","missed","doesn","care","knows","jurassic","park","diclaimer","said","this","meant","this","nobody","made"]
["article","kmr4","1444","734058912","cwru","kmr4","cwru","keith","ryan","writes","article","suopanki","93apr6024902","stekt6","oulu","suopanki","stekt6","oulu","heikki","suopanki","writes","eternal","jesus","therefore","jesus","eternal","this","works","both","logically","mathematically","things","which","eternal","jesus","subset","therefore","jesus","belongs","things","which","eternal","first","premise","conclusion","properly","translated","identity","statements","since","those","statements","predication","rather","than","identity","instead","they","should","translated","using","predicate","letter","using","designate","designate","jesus","predicate","letter","property","being","eternal","first","premise","conclusion","second","premise","appears","contain","identity","which","case","properly","symbolized","your","remark","that","jesus","subset","suggests","that","strict","identity","desired","here","however","first","premise","means","that","members","making","have","property","being","eternal","same","conclusion","follows","lippard","lippard","ccit","arizona","dept","philosophy","lippard","arizvms","bitnet","university","arizona","tucson","85721"]
["article","chrisb","734064380","baarnie","chrisb","tafe","chris","bell","writes","jbrown","batman","writes","syllogism","form","therefore","this","logically","valid","construction","your","syllogism","however","form","therefore","therefore","yours","logically","invalid","construction","your","comments","apply","those","identity","both","syllogisms","valid","however","predicate","then","second","syllogism","invalid","first","syllogism","have","pointed","valid","whether","predicate","designates","individual","lippard","lippard","ccit","arizona","dept","philosophy","lippard","arizvms","bitnet","university","arizona","tucson","85721"]
["have","addition","regarding","there","atheist","hospitals","recall","correctly","johns","hopkins","built","provide","medical","services","without","backing","religious","group","thus","making","hospital","dedicated","glory","weak","atheism","might","someone","check","this","brian","evans","mood","mood","sure","mood","bevans","carina","haven","ever","virgin","mary"]
["article","1993apr05","174537","14962","watson","strom","watson","strom","writes","article","16ba7f16c","i3150101","dbstu1","i3150101","dbstu1","benedikt","rosenau","writes","didn","have","time","read","rest","posting","respond","this","absolutely","messianic","another","mistake","sorry","should","have","read","messianic","more","carefully","benedikt"]
["article","1993apr6","124112","12959","warwick","simon","warwick","simon","clippingdale","writes","said","just","arrived","asked","whether","bobby","real","betcha","welcome","atheism","rest","assured","that","gets","worse","have","pearls","wisdom","from","bobby","which","reproduce","below","anyone","keith","keeping","file","such","stuff","sorry","somehow","have","misplaced","diskette","from","last","couple","months","however","thanks","efforts","bobby","being","replenished","rather","quickly","here","recent","favorite","satan","angels","have","freewill","they","what","tells","them","mozumder","snm6394","ultb","satan","angels","have","freewill","they","what","tells","them","mozumder","snm6394","ultb"]
["halat","pooh","bears","halat","writes","think","objective","morality","does","exist","that","most","flavors","morality","only","approximations","once","again","natural","objective","morality","fairly","easily","defined","long","have","goal","mind","that","what","purpose","this","morality","maybe","quite","getting","what","mean","this","think","objective","morality","oxymoron","definition","seems","goal","oriented","issue","like","this","subjective","nature","using","word","objective","goal","need","subjective","instance","goal","natural","morality","propogation","species","perhaps","wasn","really","until","more","intelligent","animals","came","along","that","some","revisions","this","were","necessary","intelligent","animals","have","different","needs","than","others","hence","morality","suited","them","must","more","complicated","than","jungle","think","that","self","actualization","subjective","might","think","objectivity","assuming","that","ideals","such","system","could","carried","completely","keith"]
["keith","ryan","kmr4","cwru","wrote","wild","fanciful","claims","require","greater","evidence","state","that","books","your","room","blue","certainly","need","much","evidence","believe","than","were","claim","that","there","headed","leapard","your","mean","male","lover","leotard","keith","issue","what","truth","then","consequences","whatever","proposition","argued","irrelevent","issue","what","consequences","such","such","true","then","truth","irrelevent","which","bill"]
["prudenti","juncol","juniata","wrote","upon","arriving","home","joseph","probably","took","advantage","mary","with","speak","course","word","this","couldn","around","mary","being","highly","religious","follower","that","decided","just","that","impregnated","will","ever","know","thus","seen","trustworthy","honorable","soul","believed","then","came","jesus","child","born","from","violence","dave","explain","purpose","your","post","imagine","what","must","have","thougt","meant","bill"]
["maddi","hausmann","madhaus","netcom","wrote","thank","lord","that","bill","connor","returned","straight","know","happy","when","lexus","se400","wipes","that","rain","slick","curve","1997","rest","best","straighten","because","your","time","even","more","limited","most","going","1994","maddi","know","glad","have","visit","stay","long","this","time","just","shopping","around","bill"]
["very","true","length","time","discussions","creationism","evolutionism","atheists","christians","have","been","debating","since","still","debate","with","unabated","passion","michael","cobb","raise","taxes","middle","university","illinois","class","programs","champaign","urbana","bill","clinton","debate","cobb","alexia","uiuc","with","taxes","spending","cuts","still","have","billion","dollar","deficits"]
["from","mathew","mathew","mantis","latest","news","seems","that","koresh","will","give","himself","once","finished","writing","sequel","bible","article","2944079995","p00261","psilink","robert","knowles","p00261","psilink","writes","writing","seven","seals","something","along","those","lines","already","written","first","seven","which","around","pages","handed","over","assistant","proofreading","would","expect","decent","messiah","have","built","spellchecker","maybe","koresh","will","come","with","heard","asked","provide","with","word","processor","does","anyone","know","koresh","requested","that","wordperfect5","written","owned","mormons","theological","implications","requesting","refusing","profound","darin","wilkins","scubed","scubed","will","president","food"]
["article","bskendigc5jcwx","netcom","bskendig","netcom","brian","kendig","writes","quotes","deleted","really","looks","like","these","people","have","idea","what","means","atheist","there","more","bobby","mozumder","clones","world","than","thought","well","that","explains","some","things","posted","religion","islam","with","attached","quote","bobby","effect","that","atheists","lying","evil","scum","asked","commonly","held","idea","among","muslims","response","asking","about","unknown","guess","karl","lastly","come","china","hope","touch","fulfilling","lifelong","ambition","your","life","will","ever","dropping","acid","great","wall","duke","pink","floyd","still","even","billion","people","believe"]
["article","115468","jaeger","buphy","gregg","jaeger","writes","article","1qg79g","fido","livesey","solntze","livesey","writes","amazed","that","find","difficult","grasp","when","people","justify","death","threats","against","rushdie","with","claim","born","muslim","this","empty","rhetoric","amazed","your","inability","understand","what","saying","that","find","difficult","grasp","when","people","justify","death","threats","find","amazing","that","your","ability","consider","abstract","questions","isolation","seem","believe","falsity","principles","consequence","their","abuse","must","hate","physics","closer","than","might","imagine","certainly","despised","living","under","soviet","regime","when","purported","organize","society","according","what","they","fondly","imagined","objective","conclusions","marxist","dialectic","hate","physics","long","some","clown","doesn","start","trying","control","life","assumption","that","interchangeable","atoms","rather","than","individual","human","beings"]
["1qjahh","horus","mchp","frank","d012s658","uucp","frank","dwyer","writes","article","140493214334","spac","rice","spacsun","rice","peter","walker","writes","article","1qie61","horus","mchp","frank","d012s658","uucp","frank","dwyer","wrote","objective","morality","morality","built","from","objective","values","where","those","objective","values","come","from","measure","them","what","mediated","thair","interaction","with","real","world","moralon","scalar","valuino","field","science","real","world","basis","values","other","round","would","wish","there","such","thing","objective","value","then","science","objectively","said","more","useful","than","kick","head","simple","theories","with","accurate","predictions","could","objectively","said","more","useful","than","tarot","cards","like","those","conclusions","know","they","exist","first","place","assumes","objective","reality","doesn","know","frank","dwyer","hatching","that","odwyer","from","hens","evelyn","conlon","measure","truth","beauty","goodness","love","friendship","trust","honesty","things","have","basis","objective","fact","then","aren","limited","what","know","true","that","examples","instances","reason","cannot","measure","reason","that","semantics","michael","cobb","raise","taxes","middle","university","illinois","class","programs","champaign","urbana","bill","clinton","debate","cobb","alexia","uiuc","with","taxes","spending","cuts","still","have","billion","dollar","deficits"]
["pick","former","yugoslavia","instead","their","problems","caused","communism","doesn","really","matter","guess","religious","leaders","calling","that","religiously","motivated","this","despite","fact","that","christians","carve","crosses","dead","muslims","chests","maybe","they","just","want","land","maybe","something","else","they","want","maybe","cross","carvings","just","accidental","know","just","looks","suspicious","most","likely","tragic","situation","bosnia","combination","ethnical","religious","motives","where","religion","just","attribute","that","separates","groups","from","each","other","must","agree","that","saga","bosnia","terrible","example","case","where","religion","helping","instead","used","weapon","against","other","humans","sympathies","mostly","bosnian","side","looks","like","serbs","oppressors","willing","even","christianity","weapon","against","their","former","friends","cheers","kent","sandvik","newton","apple","alink","ksand","private","activities"]
["bissda","saturn","lawrence","bissell","writes","first","want","start","right","that","christian","know","shouldn","involved","deleted","book","says","that","jesus","either","liar","crazy","modern","koresh","actually","said","some","reasons","wouldn","liar","follows","would","wouldn","people","able","tell","liar","people","gathered","around","kept","doing","many","gathered","from","hearing","seeing","someone","been","healed","call","fool","believe","heal","people","niether","lunatic","would","more","than","entire","nation","drawn","someone","crazy","very","doubtful","fact","rediculous","example","anyone","drawn","david","koresh","obviously","fool","logical","people","this","right","away","therefore","since","wasn","liar","lunatic","must","have","been","real","thing","righto","this","with","your","cornflakes","book","says","that","muhammad","either","liar","crazy","modern","mahdi","actually","said","some","reasons","wouldn","liar","follows","would","wouldn","people","able","tell","liar","people","gathered","around","kept","doing","many","gathered","from","hearing","seeing","made","stand","still","call","fool","believe","make","stand","still","niether","lunatic","would","more","than","entire","nation","drawn","someone","crazy","very","doubtful","fact","rediculous","example","anyone","drawn","mahdi","obviously","fool","logical","people","this","right","away","therefore","since","wasn","liar","lunatic","must","have","been","real","thing","house","house","helios","toowoomba","australia"]
["article","115565","jaeger","buphy","gregg","jaeger","writes","article","1qi3l5","fido","livesey","solntze","livesey","writes","hope","islamic","bank","something","other","than","bcci","which","ripped","many","small","depositors","among","muslim","community","elsewhere","grow","childish","propagandist","gregg","really","sorry","having","pointed","that","practice","things","aren","quite","wonderful","utopia","folks","seem","claim","them","upsets","exactly","being","childish","here","open","question","bbci","example","islamically","owned","operated","bank","what","will","someone","they","weren","real","islamic","owners","operators","actually","turned","long","running","quite","ruthless","operation","steal","money","from","small","often","quite","naive","depositors","these","naive","depositors","their","life","savings","into","bcci","rather","than","nasty","interest","motivated","western","bank","down","street","could","that","they","believed","islamically","owned","operated","bank","couldn","possibly","cheat","them","please","into","thinking","that","will","work","right","next","time"]
["article","1ql8ekinn635","caltech","keith","caltech","keith","allan","schneider","writes","livesey","solntze","livesey","writes","what","morally","particular","reason","then","moral","what","morality","instinctive","most","animals","saying","that","morality","instinctive","animals","attempt","assume","your","conclusion","which","conclusion","conclusion","correct","that","behaviour","which","instinctive","animals","natural","moral","system","disagreeing","definition","moral","here","earlier","said","that","must","conscious","your","definition","instinctive","behavior","pattern","could","morality","trying","apply","human","terms","humans","pardon","trying","apply","human","terms","humans","think","there","must","some","confusion","here","saying","that","animal","behaviour","instinctive","then","does","have","moral","sugnificance","does","refusing","apply","human","terms","animals","turned","into","applying","human","terms","think","that","even","someone","conscious","alternative","this","does","prevent","behavior","from","being","moral","sure","think","this","about","trying","convince","think","that","morality","behavior","pattern","what","human","morality","moral","action","that","consistent","with","given","pattern","that","enforce","certain","behavior","moral","keep","getting","this","backwards","trying","show","that","behaviour","pattern","morality","whether","morality","behavior","pattern","irrelevant","since","there","behavior","pattern","example","motions","planets","that","most","people","would","call","morality","show","your","definition","shown","offered","four","times","think","accept","your","definition","allow","ascribe","moral","significence","orbital","motion","planets","morality","thought","large","class","princples","could","defined","terms","many","things","laws","physics","wish","however","seems","silly","talk","moral","planet","because","obeys","laws","phyics","less","silly","talk","about","animals","they","have","least","some","free","will","silly","less","silly","what","livesey","finds","intuitive","silly","what","schneider","finds","intuitive","less","silly","that","devastating","argument"]
["article","115571","jaeger","buphy","gregg","jaeger","writes","article","2bcc892b","21864","bvickers","brett","vickers","writes","article","115290","jaeger","buphy","gregg","jaeger","writes","well","seeing","muslim","sort","fatwa","issued","khomeini","would","relevant","understand","your","fear","persecution","share","even","more","than","being","muslim","however","rushdie","behavior","completely","excusable","much","considered","some","called","islam","related","dialogue","here","total","waste","time","somehow","restrain","myself","this","instance","gregg","this","come","senses","accept","knowing","wisdom","power","quran","allah","only","that","allah","himself","drops","congratulate","wise","choice","allah","rolls","bones","down","then","allah","gets","crisco","bends","over","invites","take","spin","around","block","realize","that","maybe","allah","looking","more","commitment","than","ready","some","programming","gotta","call","thinking","over","renounce","islam","gregg","allah","said","still","thinks"]
["article","1qlf7ginn8sn","caltech","keith","caltech","keith","allan","schneider","writes","livesey","solntze","livesey","writes","another","part","this","thread","been","telling","that","goal","natural","morality","what","animals","survive","that","right","humans","have","gone","somewhat","beyond","this","though","perhaps","goal","self","actualization","humans","have","gone","somewhat","beyond","what","exactly","thread","telling","that","natural","morality","what","animals","survive","this","thread","claiming","that","omniscient","being","definitely","what","right","what","wrong","what","does","this","omniscient","being","criterion","long","term","survival","human","species","what","does","omniscient","into","definitely","being","able","assign","right","wrong","actions","suppose","that","your","omniscient","being","told","that","long","term","survival","humanity","requires","exterminate","some","other","species","either","terrestrial","alien","letting","omniscient","being","give","information","this","part","original","premise","well","your","original","premises","have","habit","changing","over","time","perhaps","like","review","tell","what","difference","between","omniscient","being","able","assign","right","wrong","actions","telling","result","does","that","make","moral","which","type","morality","talking","about","natural","sense","immoral","harm","another","species","long","doesn","adversely","affect","your","guess","talking","about","morality","introduced","which","going","implemented","this","omniscient","being","that","definitely","assign","right","wrong","actions","tell","what","type","morality","that"]
["livesey","solntze","livesey","writes","humans","have","gone","somewhat","beyond","what","exactly","thread","telling","that","natural","morality","what","animals","survive","this","thread","claiming","that","omniscient","being","definitely","what","right","what","wrong","what","does","this","omniscient","being","criterion","long","term","survival","human","species","what","well","that","question","goals","probably","that","obvious","goals","like","happiness","liberty","golden","rule","these","goals","aren","inherent","they","have","defined","before","objective","system","possible","does","omniscient","into","definitely","being","able","assign","right","wrong","actions","difficult","have","goals","mind","absolute","knoweldge","everyone","intent","letting","omniscient","being","give","information","this","part","original","premise","well","your","original","premises","have","habit","changing","over","time","perhaps","like","review","tell","what","difference","between","omniscient","being","able","assign","right","wrong","actions","telling","result","omniscience","fine","long","information","given","away","this","resolution","free","will","problem","interactive","omniscient","being","changes","situation","which","type","morality","talking","about","natural","sense","immoral","harm","another","species","long","doesn","adversely","affect","your","guess","talking","about","morality","introduced","which","going","implemented","this","omniscient","being","that","definitely","assign","right","wrong","actions","tell","what","type","morality","that","well","speaking","about","objective","system","general","didn","mention","specific","goal","which","would","necessary","determine","morality","action","keith"]
["article","healta","734928689","saturn","healta","saturn","tammy","healy","writes","hope","going","flame","please","give","same","coutesy","given","tammy","person","gives","well","balanced","reasoned","argument","tammy","then","happy","discuss","with","makes","astounding","claims","which","backed","with","evidence","then","must","expected","substantiate","them","original","author","said","that","everything","opinion","supportable","then","people","would","have","simply","ignored","claimed","many","things","logic","seriously","flawed","argument","christianity","effort","convince","atheists","like","myself","believe","message","will","take","things","read","told","that","pink","fluffy","elephants","dance","sugar","plum","fairy","dark","side","jupiter","then","would","demand","evidence","adda","adda","wainwright","does","atal","llanw","eczcaw","mips","nott","werth"]
["nancyo","fraser","nancy","patricia","connor","writes","timmbake","ucsb","bake","timmons","writes","rule","apples","with","oranges","that","extermination","mongols","worse","than","stalin","khan","conquered","people","unsympathetic","cause","that","atrocious","stalin","killed","millions","people","loved","worshipped","atheist","state","anyone","worse","than","that","right","david","koresh","claimed","christian","hear","millions","cheering","right","josef","stalin","your","heart","bake","timmons","there","nothing","higher","stronger","more","wholesome","more","useful","life","than","some","good","memory","alyosha","brothers","karamazov","dostoevsky"]
["article","c5r8vh","news","uiuc","cobb","alexia","uiuc","mike","cobb","writes","actually","trying","make","point","just","trying","find","coherence","societally","based","morality","position","assumption","societally","based","morals","wrong","would","need","background","reading","sociology","point","where","discussion","would","focused","enough","helpful","interaction","values","behavior","among","people","been","major","defining","element","both","psychology","sociology","century","part","both","disciplines","social","psychology","that","strikes","most","relevant","various","naive","arguments","about","morality","exceptionally","good","place","clear","view","social","norms","action","micro","sociology","ervin","goffman","there","some","very","good","introductory","essays","deference","demeanor","classic","well","accessible","books","like","interaction","ritual","more","difficult","theoretical","some","later","books","like","frame","analysis","even","most","academic","goffman","escapes","dreadful","boredom","heavily","jargon","laden","theorizing","that","makes","most","standard","sociology","unreadable","morality","essentially","playing","individual","goals","aims","setting","their","expectations","other","people","others","expectations","them","this","becomes","systematized","socially","mandated","simply","because","otherwise","have","invent","entire","context","interpersonal","realtions","with","every","single","interaction","engage","social","inter","actions","usually","hundreds","thousands","times","that","renego","tiation","human","interaction","each","time","pretty","ridiculous","notion","simply","learn","most","early","along","with","language","which","main","exemplars","michael","siemon","stand","stand","window","panix","tears","scald","start","ulysses","shall","love","your","crooked","neighbor","standard","disclaimer","with","your","crooked","heart"]
["article","1993apr20","053355","19185","bmerh85","dgraham","bmers30","douglas","graham","writes","fallacious","state","something","undisputed","fact","when","fact","truth","assumes","that","side","dispute","correct","this","were","truly","fallacy","nobody","would","able","anything","about","anything","fallacy","state","that","earth","nearly","spherical","knowing","full","well","that","there","people","think","flat","were","arguing","with","group","people","likely","contain","flat","earthers","mention","sphericity","earth","instead","made","comment","which","used","hidden","assumption","otherwise","phrased","that","flat","earthers","would","agree","with","first","after","christmas","truelove","served","leftover","turkey","second","after","christmas","truelove","served","turkey","casserole","that","made","from","leftover","turkey","days","deleted","flaming","turkey","wings","pizza","commercial","bait","arromdee","arromdee","jyusenkyou"]
["article","1qibo2","horus","mchp","frank","d012s658","uucp","frank","dwyer","writes","absence","some","convincing","evidence","that","theist","fanatics","more","dangerous","than","atheist","fanatics","continue","wary","fanatics","stripe","think","that","agnostic","fanatics","most","dangerous","fair","point","actually","mentioned","theists","atheists","left","agnostics","culpa","wonder","light","that","probably","theist","tries","pass","agnostic","still","remember","your","post","about","your","daughter","singing","chrismas","carols","your","feelings","well","would","show","marginal","honesty","answer","many","questions","left","open","when","ceased","respond","last","time","benedikt"]
["article","1ql0d3","pepper","east","geoff","east","writes","your","posting","provoked","into","checking","save","file","memorable","posts","first","captured","arromdee","1990","subject","atheist","that","article","here","your","question","article","53766","which","average","about","articles","last","three","years","others","have","noted","current","posting","rate","such","that","kill","file","depressing","large","among","posting","saved","early","days","were","articles","from","following","notables","from","loren","sunlight","llnl","loren","petrich","from","jchrist","nazareth","israel","jesus","christ","nazareth","from","tomobiki","washington","mark","crispin","from","perry","apollo","perry","from","lippard","uavax0","ccit","arizona","james","lippard","from","minsky","media","marvin","minsky","interesting","bunch","wonder","where","didn","hear","address","changed","reached","following","address","dkoresh","branch","davidian","compound","waco","think","last","seen","posting","messianic","dead","actor","plays","part","sting","words","fear","will","find","their","place","your","heart","history","without","voice","reason","every","faith","curse","will","teach","without","freedom","from","past","things","only","worse","nothing"]
["maddi","hausmann","chirps","timmbake","ucsb","bake","timmons","writes","first","seem","reasonable","more","honest","include","sentence","afterwards","that","honest","just","ended","like","that","swear","that","nice","hmmmm","recognize","warning","signs","alternating","polite","rude","coming","into","newsgroup","with","huge","chip","shoulder","calls","people","names","then","makes","nice","whirrr","click","whirrr","forgot","third","equality","whirrr","click","whirrr","below","whirr","click","whirr","frank","dwyer","might","also","contained","that","shell","stack","determine","whirr","click","whirr","killfile","keith","allen","schneider","frank","closet","theist","dwyer","maddi","sound","geek","hausmann","whirrr","click","whirrr","bake","timmons","there","nothing","higher","stronger","more","wholesome","more","useful","life","than","some","good","memory","alyosha","brothers","karamazov","dostoevsky"]
["just","side","note","this","german","translation","whatever","josephus","historian","friend","says","that","latest","craze","about","josephus","japanese","translation","says","thing","selling","like","cakes","over","japan","good","powerful","josephus","reads","english","suspect","reads","even","better","japanese"]
["mccullou","whipple","wisc","writes","turn","went","back","reread","your","post","attack","atheism","that","agnosticism","wasn","funny","atheism","nowhere","does","that","imply","that","agnostic","weak","atheist","most","people","post","such","inflammatory","remarks","theists","reasonable","assumption","sorry","right","clearly","state","rule","condescending","population","large","theists","will","many","people","your","faith","anytime","soon","only","ruins","your","credibility","being","condescending","population","large","stating","something","that","happened","true","long","time","couldn","believe","that","people","actually","believed","this","idea","alien","concept","trying","people","faith","have","faith","religion","issue","when","attitude","above","because","never","even","occurred","believe","atheist","default","guess","could","most","common","form","condescending","rational","versus","irrational","attitude","once","accepted","assumption","that","there","then","consider","other","faiths","irrational","simply","because","their","assumption","contradict","your","assumption","then","would","there","lack","consistency","here","know","about","faith","positive","belief","that","does","exist","were","closed","logical","argument","many","rational","people","have","problems","with","that","logic","probably","like","seem","soft","atheist","sorry","flamage","line","about","atheists","haveing","something","their","sleeves","what","seemed","imply","that","sorry","been","reading","much","clipper","project","lately","paranoia","over","there","have","seeped","some","what","clipper","project","rule","apples","with","oranges","that","extermination","mongols","worse","than","stalin","khan","conquered","people","unsympathetic","cause","that","atrocious","stalin","killed","millions","people","loved","worshipped","atheist","state","anyone","worse","than","that","many","rulers","have","done","similar","things","past","only","stalin","when","there","plenty","documentation","afix","blame","evidence","that","some","early","european","rulers","ruled","with","iron","fist","much","like","stalin","threw","numbers","sick","hearing","about","stalin","example","because","example","doesn","apply","managed","angry","with","your","post","because","appeared","attack","forms","atheism","might","have","appeared","attack","atheism","general","point","that","mass","killing","happens","sorts","reasons","people","will","hate","they","will","will","wave","whatever","flag","justify","cross","hammer","sickle","stalin","example","important","only","because","still","widely","unappreciated","that","people","want","forget","also","because","people","really","love","ideas","even","after","that","wrought","evidence","referring","more","lack","evidence","than","negative","evidence","claim","there","pink","crows","have","never","seen","pink","crow","that","doesn","mean","couldn","exist","this","person","here","claims","that","there","pink","crows","even","though","admits","hasn","been","able","capture","photo","find","with","sense","that","evidence","believe","existence","pink","crows","that","what","saying","when","look","evidence","look","suppossed","evidence","deity","show","flawed","doesn","show","what","theists","want","show","first","pink","crows","unicorns","elves","arguments","world","will","sway","most","people","they","simply","accept","analogy","reasons","that","many","many","people","want","something","beyond","this","life","pretend","that","they","want","this","accept","even","want","myself","sometimes","there","nothing","unique","this","example","people","want","love","truth","proven","logically","themselves","namely","gods","principle","hard","theists","necessarily","arrogant","makes","sense","they","seem","arrogant","make","such","claim","previous","refutation","still","stands","believe","there","another","john","baptist","boasted","jesus","many","people","find","hard","that","behavior","arrogant","many","christians","know","also","boast","this","still","necessarily","arrogance","course","know","arrogant","christians","doctors","teachers","well","technically","might","consider","person","originally","made","given","claim","arrogant","jesus","instance","talking","about","atheism","just","strong","atheism","talking","about","weak","atheism","which","believe","then","refuse","such","claim","atheism","lack","belief","used","good","occam","razor","make","final","rejection","deity","that","things","even","present","hypothesises","equal","fasion","find","theist","argument","plausible","speak","against","strong","atheism","also","often","find","that","evidence","supporting","faith","very","subjective","just","evidence","supporting","love","truth","subjective","believe","answered","that","apologize","stated","incorrect","assumption","your","theism","nothing","indicate","that","were","agnostic","only","that","were","just","another","newbie","christian","trying","some","cheap","shots","apology","necessary","bake","timmons","there","nothing","higher","stronger","more","wholesome","more","useful","life","than","some","good","memory","alyosha","brothers","karamazov","dostoevsky"]
["reply","timmbake","ucsb","bake","timmons","same","kind","ignorance","demonstrated","just","about","every","post","this","newsgroup","instance","generalizations","about","christianity","popular","which","newsgroup","have","been","reading","anti","christian","posts","virtually","response","some","christian","posting","some","will","burn","hell","kind","drivel","soft","atheist","courtesy","even","know","enough","about","bible","that","repeatedly","warns","false","prophets","preaching","name","bake","transparently","obvious","that","theist","pretending","atheist","probably","think","very","clever","this","time","possibilities","creator","eternity","carry","with","them","much","emotional","power","dismiss","merely","basis","this","line","course","have","dismissed","them","because","atheist","right","just","like","other","religion","hard","atheism","faith","other","words","didn","read","after","david","nyeda","cnsvax","uwec","midelfort","clinic","claire","this","patently","absurd","whoever","wishes","become","philosopher","must","learn","frightened","absurdities","bertrand","russell"]
["article","1993apr15","223844","16453","rambo","atlanta","atlanta","bill","rawlins","writes","talking","about","origins","merely","science","science","cannot","explain","origins","person","exclude","anything","science","from","issue","origins","that","there","higher","truth","than","science","this","false","premise","what","manner","argue","that","universe","created","with","higher","truth","than","science","would","love","define","truth","this","arguement","then","must","state","know","this","what","subbornly","state","that","there","higher","truth","that","isyour","would","prove","this","obviously","cannot","besides","assume","moment","that","there","higher","truth","then","prove","your","another","religion","what","makes","arrogant","push","forward","your","idea","creation","over","many","peoples","study","laws","nature","science","science","study","nature","open","minded","theory","doesn","facts","then","trash","theory","construct","which","does","flexible","your","definition","science","presupposes","that","science","ignores","this","character","altogether","this","then","only","because","evidence","found","enjoy","science","fortunately","mentally","shackled","into","constructing","scientific","conclusions","placing","jesus","holy","ghost","into","every","paragraph","sycophantic","manner","truly","wonder","observing","creation","macroevolution","mixture","percent","science","percent","religion","guaranteed","within","three","percent","error","indeed","wonder","observing","random","effects","creation","this","course","assumes","definition","aethetics","which","forward","macroevolution","please","give","references","more","information","from","where","your","figures","adda","bill","rawlins","atlanta","speak","myself","only","adda","wainwright","does","atal","llanw","eczcaw","mips","nott","werth"]
["article","c5s9zm","cbnewsj","decay","cbnewsj","dean","kaflowitz","writes","from","decay","cbnewsj","dean","kaflowitz","subject","will","hell","date","1993","article","c5lh4p","portal","videocart","dfuller","portal","videocart","dave","fuller","writes","jsn104","psuvm","writes","blashephemers","will","hell","believing","prepared","your","eternal","damnation","what","mean","prepared","surrounded","thumpers","like","yourself","proven","hellish","enough","even","dead","well","here","prepared","those","beach","umbrellas","some","those","pack","things","coleman","cooler","which","loaded","with","miller","draft","like","miller","draft","pair","balance","sneakers","sony","watchman","couple","cartons","bonton","cheddar","cheese","popcorn","haven","decided","what","wear","what","does","wear","eternal","damnation","dean","kaflowitz","should","wear","your","nicest","boxer","shorts","bring","plenty","sunscreen","grab","bathing","suit","towerl","some","veggie","hotdogs","have","bonfire","cookout","does","that","sound","good","enough","dean","every","poster","invited","tammy","trim","healy"]
["article","healta","735350336","saturn","healta","saturn","tammy","healy","writes","should","wear","your","nicest","boxer","shorts","bring","plenty","sunscreen","grab","bathing","suit","towerl","some","veggie","hotdogs","have","bonfire","cookout","does","that","sound","good","enough","dean","every","poster","invited","there","room","nudists","after","believe","most","upstanding","moral","churches","nudity","sole","intention","learning"]
["article","1qu485","horus","mchp","frank","d012s658","uucp","frank","dwyer","writes","article","1qkovl","fido","livesey","solntze","livesey","writes","false","dichotomy","claimed","killing","were","religiously","motivated","saying","that","wrong","saying","that","each","every","killing","religiously","motivate","spelled","detail","which","killings","religously","motivated","example","would","claim","that","recent","assassination","four","catholic","construction","workers","connection","with","probably","religously","motivated","time","writing","think","that","someone","claims","current","violence","motivated","religion","reaching","what","would","call","when","someone","writes","killings","religously","motivated","possible","argue","that","religion","past","major","contributing","factor","violence","present","know","evidence","that","this","enough","historian","debate","given","that","avowed","take","northern","ireland","into","country","that","particular","church","written","into","constitution","which","restriction","civil","rights","dictated","that","church","fail","word","past","appropriate","claim","that","killings","religously","motivated","grotesque","that","means","that","church","believers","doing","what","they","always","with","history","they","face","they","rewrite","attacking","different","claim","claim","that","when","terrorist","plants","bomb","warrington","does","have","motive","greater","glory","sorry","frank","what","quotes","your","words","from","your","posting","1qi83b","horus","mchp","tell","that","different","claim","longer","stand","behind","your","original","claim","just","mean","same","thing","when","killings","religously","motivated","when","when","terrorist","plants","bomb","doesn","have","religious","motive","example","meant","clarify","claim","different","claim","which","refer","claim","which","were","seemingly","attacking","previous","post","namely","that","religion","major","historical","cause","present","violence","assert","that","assert","opposite","have","hand","bunch","double","talk","about","what","seemingly","attacking","quoted","what","attacking"]
["article","1r1cl7innknk","bozo","dsinc","perry","dsinc","perry","writes","anyway","since","seem","only","following","this","particular","line","discussion","wonder","many","rest","readership","have","read","this","book","what","your","thoughts","read","when","first","came","controversy","broke","name","waiting","list","library","that","book","really","offensive","none","money","would","find","author","publisher","read","cover","cover","phrase","that","seems","popular","here","right","liked","writing","style","little","hard","used","well","worth","effort","coming","from","similar","background","rushdie","grew","bombay","muslim","family","moved","england","grew","delhi","made","strong","impression","used","many","strange","constructions","indian","english","yaar","sentence","butbutbut","occasional","hindi","phrase","time","still","sorta","kinda","thought","myself","muslim","couldn","what","flap","about","seemed","clear","that","this","allegory","clear","that","described","some","local","prostitutes","took","names","personae","muhammed","wives","grandfather","thundered","implied","that","muhammed","wives","were","prostitutes","short","every","angry","muslim","that","read","even","part","book","seemed","have","missed","point","completely","mention","fact","that","most","militant","them","never","even","seen","book","oops","just","perhaps","deep","sense","book","insulting","islam","because","exposes","silliness","revealed","religion","does","omnipotent","deity","need","agent","come","directly","know","that","muhammed","didn","just","into","desert","smoke","something","know","that","scribes","dictated","quran","didn","screw","their","little","verses","muhammed","marry","more","than","four","women","when","other","muslim","allowed","although","think","biggest","insult","islam","that","majority","followers","would","want","suppress","book","sight","unseen","some","holy","mention","murder","author","over","years","when","have","made","this","point","various","primarily","muslim","posters","have","responded","saying","that","indeed","they","have","read","book","called","such","things","filth","lies","would","rank","rushdie","book","with","hitler","mein","kempf","worse","much","same","response","when","tried","talk","about","book","really","silly","argument","after","many","these","same","people","have","read","mein","kampf","just","made","wonder","what","they","afraid","they","just","read","book","decide","themselves","maybe","reaction","muslim","community","book","absence","protest","from","liberal","muslims","khomeini","fatwa","outrage","final","push","needed","into","atheism","shamim","mohamed","uunet","noao","cmcl2","arizona","shamim","shamim","arizona","take","this","cross","garlic","here","mezuzah","jewish","page","koran","muslim","buddhist","your","member","league","programming","freedom","write","uunet"]
["article","66615","mimsy","mangoe","charley","wingate","writes","livesey","writes","what","said","that","people","took","time","copy","text","correctly","translations","present","completely","different","issues","read","papers","that","qumram","texts","different","versions","some","texts","misunderstand","reading","newspapers","learn","about","this","kind","stuff","best","idea","world","newspaper","reporters","notoriously","ignorant","subject","religion","prone","exaggeration","interests","having","real","story","that","bigger","headline","back","1935","this","point","have","masoretic","text","various","targums","translations","commentaries","aramaic","septuagint","ancient","greek","translation","masoretic","text","standard","jewish","text","essentially","does","vary","some","places","obvious","corruptions","which","copied","faithfully","from","copy","copy","these","passages","past","were","interpreted","reference","targums","septuagint","when","they","took","time","copy","text","correctly","that","includes","obvious","corruptions","septuagint","differs","from","masoretic","text","particulars","first","includes","additional","texts","second","some","passages","there","variant","readings","from","masoretic","text","addition","fixing","predating","various","corrupted","passages","must","emphasized","that","best","knowledge","these","variations","only","signifcant","bible","scholars","have","little","theological","import","when","they","took","time","copy","text","correctly","that","does","exclude","variant","readings","from","masoretic","text","which","little","theological","import","dead","scroll","materials","this","ancient","copy","almost","isaiah","fragments","various","sizes","almost","other","books","there","also","abundance","other","material","know","there","sign","there","hebrew","antecdent","apocrypha","extra","texts","septuagint","analysis","proceeded","there","also","variations","between","texts","masoretic","versions","these","tend","reflect","septuagint","where","latter","obviously","error","again","though","differences","thus","significant","theologically","there","this","expectation","that","there","great","theological","surprises","lurking","material","this","hasn","happened","important","because","there","almost","textual","tradition","unlike","expert"]
["article","1993apr19","112008","26198","monu6","monash","darice","yoyo","monash","fred","rice","writes","found","reference","claim","that","percentage","population","that","suffers","from","depression","been","increasing","this","century","requested","will","start","heading","thread","post","under","cool","then","discuss","increase","radio","increase","fossil","fuels","increase","travel","consumption","processed","bread","instruct","which","them","causes","increased","depression"]
["article","1r15rvinnh8p","ctron","news","ctron","king","ctron","john","king","writes","adpeters","sunflower","indiana","andy","peters","writes","macroevolution","mixture","percent","science","percent","religion","guaranteed","within","three","percent","error","bullshit","this","true","only","under","your","assertion","that","only","religion","explain","origins","history","life","through","macroevolution","falsifiable","theory","think","then","make","some","substantial","argument","against","modern","theory","evolution","inadequate","that","deserves","treated","matter","faith","francis","hitching","jack","your","joking","right","this","substantial","argument","even","your","standards","tero","sand","email","cust","helsinki","custts","helsinki","feel","most","ministers","claim","they","heard","voice","eating","much","pizza","before","they","night","really","intestinal","disorder","revelation","reverend","jerry","falwell"]
["article","4949","eastman","uucp","nasa","kodak","writes","simple","logic","arguments","folly","read","bible","will","that","jesus","made","fools","those","tried","trick","with","logic","cite","passages","that","focus","some","discuss","then","following","jesus","make","fools","logic","rely","simply","your","reason","then","will","never","know","more","than","indeed","justifiably","make","this","assertion","must","genius","logic","making","fools","should","that","much","easier","qpliu","princeton","standard","opinion","opinions","delta","correlated"]
["article","11857","vice","bobbe","vice","robert","beauchaine","writes","from","bobbe","vice","robert","beauchaine","subject","requests","date","article","c5qllg","mailer","mayne","writes","excess","stuff","deleted","however","seems","that","local","church","elder","been","getting","revelations","from","about","devastating","quake","scheduled","level","area","independent","corroboration","from","several","friends","apparently","have","similar","revelations","quake","fact","response","request","from","them","seeking","sign","from","veracity","their","visions","none","this","would","terribly","interesting","except","amount","stir","created","area","many","many","people","taking","these","claims","very","seriously","there","some","making","plans","target","date","local","religious","radio","station","devoted","hours","discussion","topic","even","called","during","live","broadcasts","tell","host","that","would","have","full","account","conversion","provided","family","survived","devastation","ruin","that","will","invariably","follow","quake","beauchaine","bobbe","vice","they","said","that","queens","could","stay","they","blew","bronx","away","sank","manhattan","know","similar","incident","about","years","climatologist","ithink","that","profession","named","iben","browning","predicted","that","earthquake","would","madrid","fault","some","schools","missouri","that","were","fault","line","actually","cancelled","school","many","people","evacuated","madrid","other","towns","wouldn","suprised","there","were","more","journalists","area","than","residents","course","earthquake","never","occured","know","about","used","live","southern","illinois","lican","middle","school","built","directly","fault","line","still","school","laughed","poor","idiots","believed","prediction","wanting","excuse","convert","christianity","gonna","have","look","elsewhere","tammy","trim","healy"]
["article","c5rgkb","darkside","osrhe","uoknor","okcforum","osrhe","bill","conner","wrote","kent","sandvik","sandvik","newton","apple","wrote","social","pressure","indeed","very","important","factor","majority","passive","christians","world","today","case","early","christianity","promise","heavenly","afterlife","independent","your","social","status","also","very","promising","gift","reason","slaves","romans","accepted","religion","very","rapidly","this","hypothetical","proposition","should","fact","should","cite","your","sources","this","amateur","sociologist","branch","however","would","suffice","alert","unwary","that","just","screwing","around","well","remember","jacoby","mythmaker","talks","about","this","cite","source","sure","christians","have","read","this","book","addition","social","experiences","from","being","raised","educated","lutheran","having","christian","friends","even","have","played","christian","rock","bands","over","have","counter","claims","sources","rest","that","shows","that","christianity","does","have","concept","social","promise","that","independent","social","status","cheers","kent","sandvik","newton","apple","alink","ksand","private","activities"]
["article","1r3qab","horus","mchp","frank","d012s658","uucp","frank","dwyer","writes","article","930421","102525","rusnews","w165w","mantis","mathew","mathew","mantis","writes","frank","d012s658","uucp","frank","dwyer","writes","article","930420","100544","rusnews","w165w","mantis","mathew","mathew","mantis","writes","this","complete","nonsense","relativism","means","saying","that","there","absolut","saying","that","some","moral","systems","better","than","others","your","opinion","then","infinite","regress","what","justification","saying","that","moral","system","terrorist","inferior","that","peace","your","saying","does","make","that","according","your","premise","mine","been","reading","these","posts","while","still","figure","what","terrible","about","just","having","opinion","seem","imply","that","admitted","your","opinion","opinion","precludes","from","justifying","convincing","others","worth","ridiculous"]
["dfuller","portal","videocart","dave","fuller","writes","nice","attempt","chris","verrry","close","missed","conspiracy","step","joseph","knew","knocked","couldn","known","that","somebody","else","mary","prego","that","wouldn","well","popularity","local","circles","what","happened","that","feeling","guilty","feeling","embarrassed","they","decided","improve","both","their","images","what","could","have","otherwise","been","downfall","both","clever","indeed","come","think","have","gained","respect","couple","maybe","joseph","mary","should","receive","praise","being","paid","jesus","lucky","them","that","baby","didn","have","obvious","deformities","could","just","mary","gets","pregnant","wedlock","save","face","joseph","that","that","pregnant","then","baby","turns","deformed","even","worse","stillborn","they","have","explaining","dave","buckminster","fuller","that","keeper","nicknames","nanci","know","sure","author","this","quote","please","send","email","nm0w","andrew","life","does","cease","funny","when","people","more","than","ceases","serious","when","people","laugh"]
["article","1993apr15","212943","15118","rashid","writes","sure","about","this","think","charge","shatim","also","applies","rushdie","encompassed","under","umbrella","fasad","ruling","please","define","words","shatim","fasad","before","them","again","john","david","munch","jmunch","hertz","elee","calpoly","heart","change","full","hate","love","people","allowed","base","their","lives","through","their","hearts","anything","happen","dangerous","situation","opinion","bobby","mozumder","describing","problems","with","atheism"]
["c5iwxm","news","chalmers","d9bertil","dtek","chalmers","bertil","jonell","writes","article","kutluk","734797558","umist","kutluk","umist","kutluk","ozguven","writes","atheists","mentioned","quran","because","from","quranic","point","view","minute","reasoning","that","there","such","thing","there","people","that","they","atheists","they","aren","atheists","what","they","when","quran","uses","word","means","individual","thinking","behaving","communal","order","protocols","based","beliefs","this","often","interpreted","much","weaker","term","religion","atheists","mentioned","quran","along","with","jews","mushriqin","christians","because","latter","have","need","beliefs","assumptions","forma","social","code","example","marxist","have","those","such","history","conflict","that","they","idols","sometimes","they","represent","those","assuptions","does","mean","they","different","from","other","mushriq","roughly","polytheists","there","cannot","social","atheism","because","when","there","community","that","community","needs","common","ideas","standard","beliefs","coordinate","society","when","they","inscribe","assumptions","nation","progress","natural","consequence","human","activity","parlamentarian","democracy","doubtlessly","best","government","however","they","individually","insist","they","have","gods","from","quranic","point","view","they","therefore","definition","atheism","does","exist","atheist","society","fact","means","reject","other","than","ours","atheism","only","exist","when","people","reject","idols","gods","dogmas","suppositions","society","that","they","part","that","case","that","personal","deviation","belief","quran","tells","about","such","deviations","disbelief","mentioned","from","quranic","point","looking","things","there","atheism","macro","level","think","took","more","than","minute","kutluk"]
["article","c5l1ey","news","uiuc","cobb","alexia","uiuc","mike","cobb","writes","11825","vice","bobbe","vice","robert","beauchaine","writes","actually","atheism","based","ignorance","ignorance","existence","fall","into","atheists","believe","because","their","pride","mistake","know","based","ignorance","couldn","that","wrong","would","wrong","fall","into","trap","that","mentioned","wrong","free","time","correct","mistake","that","continues","while","supposedly","proclaiming","undying","love","eternal","soul","speaks","volumes","trap","position","tell","that","believe","because","wish","unless","know","motivations","better","than","myself","should","believe","when","that","earnestly","searched","years","never","found","beauchaine","bobbe","vice","they","said","that","queens","could","stay","they","blew","bronx","away","sank","manhattan"]
["article","4963","eastman","uucp","nasa","kodak","schaertel","writes","article","21627","ousrvr","oulu","kempmp","phoenix","oulu","petri","pihko","writes","schaertel","nasa","kodak","wrote","love","just","much","loves","wants","seduce","know","what","would","probably","consider","rape","probably","because","rape","simple","logic","arguments","folly","read","bible","will","that","jesus","made","fools","those","tried","trick","with","logic","ability","reason","just","spec","creation","some","think","ultimate","rely","simply","your","reason","then","will","never","know","more","than","your","argument","type","know","once","there","many","atheists","have","sincerely","tried","believed","many","years","were","eventually","honest","enough","admit","that","they","lived","virtual","reality","obviously","there","many","christians","have","tried","believe","nothing","work","some","others","doesn","give","insight","into","overall","overall","truth","religion","would","seem","dependent","solely","individual","well","individually","created","since","christians","have","failed","show","there","life","better","than","ours","attempt","necessary","even","particularly","attractive","learn","must","accept","that","which","know","what","does","this","mean","learn","must","accept","that","know","something","right","learn","must","accept","something","know","this","prefer","learn","unwise","merely","swallow","everything","read","suppose","write","book","telling","great","invisible","pink","unicorn","helped","daily","problems","would","accept","this","since","know","whether","true","asks","swallow","everything","fact","jesus","warns","against","question","beleive","what","learn","history","class","that","matter","anything","school","mean","just","what","other","people","have","told","want","swallow","what","others","right","well","will","nerver","know","sure","were","told","truth","very","least","there","more","evidence","pointing","fact","that","there","military","conflict","vietnam","years","then","there","supernatural","diety","wants","live","certain","fact","that","jesus","warned","against","means","nothing","warn","against","deal","life","death","resurection","christ","documented","historical","fact","this","true","first","choices","here","life","death","scantily","documented","last","total","malarky","unless","uses","bible","that","totally","circular","perhaps","better","imagination","ignorance","someone","else","will","address","this","sure","refer","plenty","documentation","much","anything","else","learn","choose","what","believe","what","could","argue","that","george","washington","myth","never","lived","because","have","proof","except","what","told","however","major","events","life","jesus","christ","were","fortold","hundreds","years","before","neat","trick","this","there","nothing","more","disgusting","than","christian","attempts","manipulate","interpret","testament","being","filled","with","signs","coming","christ","every","little","reference","stick","wood","autmoatically","interpreted","cross","what","miscarriage","philology","there","into","sceptical","heart","have","given","sincere","effort","with","attitude","seem","have","must","trust","just","church","participate","activities","were","ever","willing","what","believed","well","since","have","skeptical","hearts","thank","goodness","there","into","here","have","irreconcilable","difference","christians","glorify","exactly","what","tend","despise","snub","trust","belief","faith","without","knowledge","lucky","happen","thinking","same","time","enkephalins","then","associate","this","sign","will","feel","right","will","trust","without","knowing","maybe","religosity","does","seem","anything","that","conclusively","arrived","rather","seems","more","sudden","affliction","believe","many","were","willing","what","believed","many","were","question","suchg","attitude","reflective","correct","healthy","morality","would","seem","same","thing","could","reflect","fanaticism","example","case","expression","simple","selfishness","adam","adam","john","cooper","verily","often","have","laughed","weaklings","7521","thought","themselves","good","simply","because","acooper","macalstr","they","claws","understand","another","fear","beyond","your","comprehension","gandalf"]
["found","this","little","know","anyone","interest","comments","everyone","commited","christian","that","battling","with","problem","know","that","romans","talks","about","saved","faith","deeds","hebrews","james","that","faith","without","deeds","useless","saying","fools","still","think","that","just","believing","enough","someone","fully","believing","there","life","totally","lead","themselves","according","romans","that","person","still","saved","there","faith","then","there","which","says","that","preferes","someone","cold","doesn","know","condemned","lukewarm","christian","someone","knows","believes","doesn","make","attempt","live","bible","opinion","that","saved","through","faith","alone","what","taught","romans","square","mind","teachings","james","conjunction","with","lukewarm","christian","being","spat","anyone","help","this","really","bothers","christ","will","adam","adam","john","cooper","verily","often","have","laughed","weaklings","7521","thought","themselves","good","simply","because","acooper","macalstr","they","claws","understand","another","fear","beyond","your","comprehension","gandalf"]
["mathew","mathew","mantis","writes","markp","elvis","mark","pundurs","writes","930415","112243","rusnews","w165w","mantis","mathew","mantis","mathew","writes","there","objective","physics","einstein","bohr","have","told","that","speaking","knows","relativity","quantum","mechanics","bullshit","speaking","someone","also","knows","relativity","quantum","mechanics","ahead","punk","make","degree","beat","your","degree","refer","place","einstein","bohr","writings","where","said","there","objective","physics","there","objective","reality","should","sufficient","prove","that","speaking","taken","bullshit","well","have","your","superior","knowledge","that","think","detect","pattern","your","responses","about","some","actual","support","your","dismissals","take","skews","your","perception","reality","come","down","your","perceptions","unskew","wonders","just","what","people","such","questions","understand","term","objective","anything","consider","useful","fiction","abstract","ideal","strive","towards","like","ideal","light","inextensible","string","doesn","actually","exist","talk","about","things","they","were","like","wrong","could","striving","toward","ideal","useful","ideal","objective","existence","actual","point","perfectly","efficient","power","station","would","convert","energy","coal","into","electricity","there","absolutely","build","perfect","power","station","ideal","striving","towards","that","ideal","undeniably","useful","valuable","narrow","question","useful","strive","toward","nonexistent","objective","ethics","what","mathew","mark","pundurs","resemblance","between","opinions","those","wolfram","research","purely","coincidental"]
["article","1r98voinnr9q","lynx","cfaehl","vesta","chris","faehl","writes","myth","which","refer","convoluted","counterfeit","athiests","have","created","make","religion","appear","absurd","counterfeit","atheists","hmmmm","just","cheap","knock","offs","true","atheists","they","must","theists","disguise","event","need","create","religious","parodies","just","look","some","actual","religions","which","absurd","34mand","35mdeep","thoughts","32mby","jack","handey","36mif","parachuting","your","parachute","doesn","open","your","friends","watching","fall","think","funny","would","pretend","were","swimming"]
["sorry","about","delay","responding","conference","paper","deadline","panic","article","1qsnqqinn1nr","senator","bedfellow","bobs","thnext","robert","singleton","writes","article","1993apr18","043207","27862","warwick","simon","warwick","simon","clippingdale","writes","alarming","amounts","agreement","deleted","made","statement","about","ockums","razor","from","experiences","physics","thanks","info","baysian","statistics","very","interesting","didn","know","before","follow","your","proof","have","questions","have","hypotheses","latter","more","complicated","which","definition","means","that","complicated","fact","where","comes","from","more","other","around","from","where","complement","axiom","anything","sense","necessarily","more","complicated","than","splitting","hairs","what","trying","that","irrespective","subjective","impressions","complicated","something","holds","with","equality","only","point","very","simple","matter","show","thus","preferd","that","consistent","with","data","elaborate","some","this","well","means","that","likely","observed","operative","operative","this","implies","that","observing","does","provide","useful","information","which","might","allow","discriminate","between","respective","possibilities","that","operative","difference","reduces","difference","between","unknown","unhelpful","prior","probabilities","where","equally","consistent","with","data","that","observing","doesn","give","pointers","which","operative","particular","case","where","however","know","that","their","prior","probabilities","ordered","although","know","actual","values","this","which","allows","deploy","razor","throw","such","also","real","world","clear","seems","always","determine","whether","equality","true","that","certainly","true","particular","point","here","whether","divine","component","actually","underlies","prevalence","religion","addition","memetic","transmission","component","which","even","religious","implicitly","acknowledge","operative","when","they","talk","spreading","word","seems","said","that","observed","variance","religious","belief","well","accounted","memetic","transmission","model","rather","less","well","proposes","divine","component","addition","since","would","expect","latter","conspire","against","wide","variance","even","mutual","exclusion","among","beliefs","thus","personal","feeling","that","even","equal","this","case","smaller","memetic","transmission","divine","component","variance","among","beliefs","happily","acknowledge","that","this","subjective","impression","beef","with","your","baysian","argument","mathematical","checked","most","your","work","didn","find","error","seem","very","careful","there","probably","math","mistake","think","mistake","philosophical","just","make","sure","understand","please","rephrase","technical","terms","think","this","reasonable","request","always","look","ways","explaining","physics","physicist","baysian","statistician","type","statistician","this","would","very","helpful","that","statistician","such","either","idea","that","both","theism","atheism","compatible","with","read","observations","date","however","theism","type","with","which","concerned","also","suggests","that","instance","prayer","answered","people","miraculously","healed","both","principle","amenable","statistical","verification","that","generally","intervene","measurable","ways","this","means","that","these","regions","space","possible","observations","which","loosely","termed","appearances","have","some","nonzero","probability","under","theistic","hypothesis","zero","under","atheistic","since","there","only","much","probability","available","each","hypothesis","scatter","around","over","observation","space","probability","which","theism","expends","making","appearances","possible","must","come","from","somewhere","else","other","possible","observations","else","being","equal","this","means","that","observation","which","appearance","must","have","slightly","higher","probability","under","atheism","than","under","theism","bayesian","stuff","implies","that","such","observations","must","cause","running","estimate","probability","atheistic","hypothesis","increase","with","corresponding","decrease","running","estimate","probability","theistic","hypothesis","sorry","that","still","jargonesque","rather","difficult","other","since","does","depend","intimately","properties","conditional","probability","densities","particularly","that","total","area","under","them","always","unity","analogy","helpful","that","hypothesis","coin","fair","that","coin","unfair","headed","used","avoid","confusion","with","heads","tails","then","total","total","observations","string","heads","with","tails","this","compatible","with","both","fair","coin","headed","coin","however","probability","expended","making","possible","appearance","tails","even","though","they","actually","appear","must","come","from","somewhere","else","since","total","must","unity","comes","this","case","from","probability","appearance","heads","running","estimates","time","observation","time","another","head","estimates","modified","according","know","actual","prior","probability","head","multiplier","half","that","this","true","every","time","coin","tossed","head","observed","thus","whatever","initial","values","estimates","after","heads","have","since","time","show","that","thus","hence","estimate","fair","coin","hypothesis","must","decrease","each","trial","that","headed","coin","hypothesis","must","increase","even","though","both","hypotheses","compatible","with","string","heads","loose","analogy","between","unfair","coin","atheism","between","fair","coin","theism","with","observations","consistent","with","both","tail","which","would","falsify","unfair","coin","analogous","appearance","which","would","falsify","atheism","claiming","that","analogy","extends","numerical","values","various","probabilities","just","that","principle","same","constant","observation","evidence","gods","evidence","them","possible","under","respective","theisms","constantly","increases","notional","estimated","probability","that","they","exist","important","draw","distinction","between","theism","that","could","supported","supported","evidence","theism","that","given","theism","which","evidence","principle","possible","doesn","make","sense","lack","evidence","supports","contrary","view","quite","this","type","theism","what","might","call","terms","ockham","razor","discussion","those","grounds","depends","upon","your","conception","this","conception","like","zeus","happened","come","down","earth","play","quite","frequently","then","agree","with","lack","evidence","this","conception","evidence","that","does","exist","your","conception","that","does","make","falsifiable","predictions","below","falsifiable","predictions","then","disagree","lack","evidence","does","support","disbelief","hypotheses","have","falsifiable","indeed","model","theism","falsifiable","used","phrase","should","obverse","given","specific","theism","does","make","prediction","that","used","word","should","theism","makes","predictions","about","specific","event","only","believe","that","such","such","after","such","such","happens","believe","will","such","such","given","never","priori","even","this","some","this","what","like","about","your","probability","also","have","assigning","these","probabilities","hold","science","positivistic","criteria","someone","cannot","tell","measure","even","principle","then","probability","applicable","hypothesis","such","case","when","theistic","atheistic","example","what","measure","have","need","above","analogy","know","prior","probabilities","deduce","that","updating","multiplier","fair","coin","hypothesis","less","than","unity","that","corresponding","multiplier","headed","coin","hypothesis","greater","than","unity","need","know","initial","values","running","estimates","either","clear","that","after","large","number","observations","fair","coin","approaches","zero","headed","coin","approaches","unity","need","know","whether","larger","than","observed","this","follows","from","assumptions","that","there","certain","events","rendered","possible","necessary","under","which","possible","under","else","equal","baysian","statistics","relies","upon","series","observations","what","hypothesis","amenable","observation","even","statements","that","amenable","observation","some","observations","relevant","sequence","observations","must","chosen","with","care","curious","know","what","types","observations","have","mind","concerning","theism","atheism","observations","like","really","doesn","matter","affect","reasoning","provided","that","there","some","possible","observations","which","would","count","appearances","examples","this","might","demonstration","efficacy","prayer","veracity","revelation","statement","about","general","still","counts","prediction","theism","question","says","that","prayer","answered","that","miracles","happen","interpretation","quoted","again","above","what","exists","means","then","this","prediction","such","what","distinguishes","from","atheist","hypothesis","which","predicts","that","this","stuff","does","happen","such","theism","does","make","claim","that","such","should","that","theism","doesn","maybe","quick","common","language","said","that","existence","mean","notion","that","deity","described","bible","christians","does","interact","with","universe","claimed","those","agents","agreed","with","this","however","must","careful","here","believe","this","making","claims","maybe","should","have","changed","does","there","important","shift","emphasis","since","only","have","belief","cannot","conclude","such","downgrade","does","interact","interact","which","would","actually","better","since","does","interact","implies","falsifiability","which","both","agree","misplaced","think","theism","makes","predictions","maybe","understanding","what","mean","prediction","could","explain","what","mean","this","word","explain","bear","mind","that","this","central","require","theism","that","make","prediction","appearances","will","never","happen","does","atheism","before","somebody","points","that","quantum","mechanics","doesn","make","this","prediction","either","difference","that","atheism","form","partition","predictions","include","such","statements","prayer","efficacious","implying","stats","will","find","that","prayer","efficacious","prayer","efficacious","verily","unto","this","generation","shall","pass","till","these","things","fulfilled","think","have","problems","misunderstanding","here","persistent","observation","this","stuff","happening","consistent","with","though","more","consistent","with","explained","bayesian","stats","post","even","exists","unfalsifiable","that","problem","argument","other","than","that","have","number","observations","infinity","falsify","asymptotically","consider","argument","that","requires","infinite","number","observations","valid","rather","that","part","argument","valid","existing","humans","never","make","infinite","number","measurments","conclusion","that","reilies","this","accept","valid","that","fine","claim","that","theism","false","merely","that","finite","number","observations","available","suggest","that","that","continue","observe","suggestion","looks","better","better","renormalization","stuff","deleted","bayesian","stats","post","assumed","that","theism","indeed","unfalsifiable","finite","number","observations","here","relevant","quote","important","assumption","that","there","some","observations","which","compatible","with","theist","hypothesis","with","atheist","hypothesis","thus","would","falsify","atheism","these","what","called","appearances","this","need","taken","literally","observation","which","requires","explanation","that","more","gods","exist","will","count","other","observations","assumed","compatible","with","both","hypotheses","this","leaves","theism","unfalsifiable","atheism","falsifiable","single","observation","only","such","appearances","here","problem","with","this","something","falsifiable","must","make","prediction","that","should","seen","seen","then","hypothesis","been","falsified","atheism","word","oposition","something","theism","theism","aserts","belief","atheism","aserts","disbelief","there","certain","atheisms","that","certainly","falsifiable","just","there","certain","theisms","that","falsifable","theism","asserts","world","only","years","that","does","decieve","then","this","been","falsified","however","atheism","that","oposition","unfalsifiable","theism","also","unfalsifiable","could","wrong","this","statment","contd","think","appearance","sufficient","falsify","atheism","whereas","general","corresponding","theism","unfalsifiable","think","more","about","until","then","here","general","question","suppse","were","unfalsifiable","also","unfalsifiable","counterexample","coin","fair","more","accurately","that","makes","sense","sides","coin","different","this","unfalsifiable","tossing","coin","even","string","heads","consistent","with","fair","coin","have","infinite","number","tosses","falsify","limit","converse","falsifiable","falsified","when","least","head","least","tail","have","appeared","this","partly","what","wrong","with","baysian","argument","which","requires","observations","made","there","simply","such","observations","that","have","truth","value","relation","statement","exists","your","symmetry","argument","understand","someone","would","since","statement","does","exist","makes","predictions","will","choose","believe","none","less","this","would","founded","type","faith","like","word","faith","insert","belief","which","there","falsifiable","evidence","instead","assume","meant","exists","there","highlight","agreed","definition","exists","statement","makes","predictions","said","above","although","falsifiable","finite","number","observations","actually","mean","does","exist","makes","predictions","oops","sorry","culpa","truth","this","statment","actually","depends","upon","which","refering","think","some","conceptions","which","true","once","again","open","posibility","that","could","wrong","give","some","examples","predictions","statment","does","exist","here","that","think","true","then","there","would","healing","miricles","this","principle","never","determined","other","there","cases","which","people","seem","recover","healed","without","help","doctor","known","reason","these","situations","fact","happen","they","consistent","with","theistic","hypothesis","support","such","hypothesis","agree","here","they","inconsistent","with","atheistic","hypothesis","think","prediction","from","does","exist","that","this","type","might","missing","something","rapture","will","happen","october","1992","said","rapture","would","have","falsified","atheism","satisfaction","happened","although","failure","happen","does","course","falsify","theisms","other","than","those","which","specifically","predicted","phenomenon","which","requires","existence","more","gods","explanation","will","ever","observed","that","about","sums","whole","thing","singleton","bobs","thnext","cheers","simon","simon","clippingdale","simon","warwick","department","computer","science","523296","university","warwick","525714","coventry"]
["article","1qk1pp","kyle","eitech","kyle","eitech","eric","rescorla","writes","what","value","assign","results","real","they","just","limited","things","valuable","aside","from","desires","results","science","value","nevertheless","still","accurately","describes","universe","works","humans","humans","accurately","described","what","about","universe","works","halat","halat","bear","bear","stearns","whatever","doesn","kill","will","only","serve","annoy","speak","only","myself"]
["article","healta","734925835","saturn","healta","saturn","tammy","healy","wrote","time","ezekiel","written","israel","apostacy","again","mistaken","tyre","about","make","israel","like","said","prince","tyre","human","ruler","tyre","wicked","calling","satan","king","tyre","ezekiel","saying","that","satan","real","ruler","over","tyre","tammy","this","explicitly","stated","bible","assume","that","know","that","ezekiel","indirectly","mentioned","could","have","been","another","metaphor","instance","ezekiel","landlord","talked","about","when","wrote","about","prince","tyre","sorry","interpretation","more","mundane","ezekiel","wrote","about","prince","tyre","when","wrote","about","prince","tyre","cheers","kent","sandvik","newton","apple","alink","ksand","private","activities"]
["article","kmr4","1696","735588167","cwru","kmr4","cwru","keith","ryan","writes","34mand","35mdeep","thoughts","32mby","jack","handey","36mif","parachuting","your","parachute","doesn","open","your","friends","watching","fall","think","funny","would","pretend","were","swimming","fall","opens","gravity","just","good","idea","dean","kaflowitz"]
["reading","this","decided","there","something","else","like","earlier","comments","please","forgive","attributions","wrong","here","also","this","really","appropriate","talk","origins","hope","will","excuse","just","this","once","they","article","c5tx38","usenet","indiana","laurence","gene","battin","battin","cyclops","iucf","indiana","wrote","article","schinder","735362755","leprss","gsfc","nasa","paul","schinder","schinder","leprss","gsfc","nasa","wrote","1993apr20","154658","iastate","kv07","iastate","warren","vonroeschlaub","writes","article","lt8d3binnj1g","exodus","emarsh","hernes","eric","marsh","writes","snip","snip","that","always","confused","once","black","hole","forms","anything","could","pass","event","horizon","perhaps","including","original","mass","that","formed","forming","black","hole","first","place","that","drop","marble","into","black","hole","races","ever","faster","towards","even","horizon","thanks","curving","space","caused","excessive","gravity","object","approaches","event","horizon","further","travel","integrating","curve","gives","time","reach","event","horizon","infinity","math","says","that","nothing","enter","black","hole","seems","that","using","physical","intuition","here","point","that","talking","about","global","conditions","influencing","local","phenomena","inappropriately","remember","that","there","such","thing","global","frame","reference","time","minds","like","pretend","that","there","imagine","things","like","calendar","alpha","centaury","being","approx","years","from","ours","earth","this","simply","wrong","there","global","time","which","applied","events","alpha","centaury","concurrently","with","events","earth","this","what","special","relativity","taught","travelling","past","earth","high","rate","speed","toward","even","have","different","view","order","occurance","events","versus","earth","thus","answer","question","what","happening","alpha","centaury","well","defined","asked","earth","until","specify","relevant","parameters","such","relative","velocities","like","will","have","different","answers","different","values","these","parameters","vicinity","black","hole","curvature","spacetime","becomes","important","enough","that","this","lack","global","frame","reference","becomes","very","important","particular","frame","used","distant","observer","quite","different","from","frame","appropriate","falling","object","minds","just","seem","able","easily","deal","with","idea","that","time","itself","could","behaving","differently","these","locations","equations","relativity","that","does","would","like","falling","object","hovering","above","horizon","object","whose","frame","rotated","wholly","away","from","ours","very","real","sense","once","object","fallen","into","hole","gone","forever","from","unless","volunteer","jump","after","that","gene","battin","battin","cyclops","iucf","indiana"]
["jbrown","batman","writes","regret","fact","that","sometimes","military","decisions","have","made","which","affect","lives","innocent","people","regret","circumstances","which","make","those","decisions","necessary","regret","suffering","caused","those","decisions","afraid","going","have","kill","worry","though","loving","christian","guarantee","that","will","regret","fact","that","have","kill","although","regret","actual","killing","hadn","intervened","allowing","hussein","keep","kuwait","then","would","have","been","appeasement","right","ever","hear","anyone","advocate","such","course","action","just","setting","strawman","setting","strawman","want","argue","against","then","only","logical","alternative","allow","hussein","keep","kuwait","false","dichotomy","diplomatic","alternatives","including","sanctions","were","ineffective","that","because","they","weren","even","attempted","what","about","those","didn","support","hitler","dreams","conquest","they","democratically","voted","policies","nsdap","elections","1933","that","last","chance","german","people","vote","matter","they","suffered","along","with","rest","does","this","bother","much","want","know","bothers","that","thousands","innocent","people","were","maimed","killed","bombing","when","from","clear","that","such","bombing","necessary","world","full","evil","circumstances","perfect","many","innocents","suffer","wrongful","actions","others","regretable","that","that","things","that","this","happening","before","gulf","didn","send","bombers","east","timor","aren","sending","bombers","probably","because","saviors","world","police","each","every","country","that","decides","self","destruct","invade","another","just","ones","that","have","ones","that","look","like","they","might","make","success","communism","strategic","position","relief","tibet","east","timor","some","other","places","that","getting","forces","east","timor","harder","than","getting","them","iraq","tibetan","people","rounded","tortured","executed","amnesty","international","recently","reported","that","torture","still","widespread","china","aren","stopping","them","fact","actively","sucking","them","trading","freely","with","them","tell","could","stop","them","support","agree","with","present","policy","sucking","them","agree","that","deplorable","fine","write","your","congressman","president","clinton","china","status","most","favoured","nation","comes","renewal","june","point","that","shouldn","offering","favourable","trading","terms","such","despicable","regime","doubt","anything","will","happen","clinton","keener","trade","sanctions","against","europe","unbelievable","comments","about","rodney","king","case","deleted","media","totally","monolithic","even","though","there","prevailing","liberal","bias","programs","such","macneil","lehrer","news","hour","give","balanced","fair","reporting","news","there","even","conservative","sources","there","know","where","look","hurrah","rush","idea","many","kill","files","just","ended","atheist","arguing","against","killing","innocent","people","supposed","christian","arguing","that","kill","innocent","people","long","some","guilty","ones","well","hardly","didn","that","good","thing","kill","innocent","people","just","unfortunately","live","perfect","world","there","perfect","solutions","going","resist","tyranny","then","innocent","people","both","sides","going","suffer","didn","unfortunate","sometimes","necessary","ends","justify","means","having","criticised","moral","relativism","past","arguing","that","position","judge","morality","allied","actions","certainly","such","position","moral","relativist","same","tired","misunderstanding","moral","relativism","means","that","there","objective","standard","morality","doesn","mean","judge","other","people","morals","christ","bike","many","times","have","tried","hammer","that","into","your","head","where","your","christian","love","where","your","absolute","morality","quick","discard","them","when","suits","ivan","stang","would","jesus","would","puke","will","stand","before","jesus","give","account","every","word","action","even","this","discourse","this","forum","understand","full","ramifications","that","prepared","believe","that","make","same","claim","obviously","atheist","think","with","jesus","though","long","haired","lunatic","peace","reason","brought","blanket","bombing","germany","because","were","bemoaning","iraqi","civilian","casualties","being","deplorable","blanket","bombing","instituted","because","bombing","wasn","accurate","enough","industrial","military","targets","decisive","other","method","that","time","gulf","precision","bombing","norm","point","make","stink","about","relatively","civilian","casualties","that","resulted","spite","precision","bombing","when","many","more","civilians","proportionately","quantitatively","died","under","blanket","bombing","right","unfortunately","turned","that","opinions","matter","were","entirely","consistent","that","condemned","bombing","dresden","think","being","glib","with","your","explanation","blanket","bombing","policy","make","sound","though","were","aiming","military","targets","could","only","them","destroying","civilian","buildings","next","door","understand","that","case","aimed","deliberately","civilian","targets","order","cause","massive","damage","inspire","terror","amongst","german","people","civilians","suffer","less","civilians","suffered","this","than","other","iany","other","history","come","with","wars","like","falklands","fresh","people","minds","that","sort","propaganda","going","fool","anyone","stories","hundreds","thousands","iraqi","civilian","dead","just","plain","bunk","bunk","lost","servicemen","over","four","years","majority","them","were","directly","involved","fighting","what","about","millions","casualties","russians","suffered","hardly","surprising","didn","lose","many","given","that","turned","late","mathew"]
["acooper","macalstr","turin","turambar","department","utter","misery","writes","that","ever","modified","define","strong","atheists","those","assert","nonexistence","those","assert","that","they","believe","nonexistence","word","mathew"]
["article","1993apr21","171807","16785","rashid","writes","article","115694","jaeger","buphy","gregg","jaeger","wrote","think","many","reading","this","group","would","also","benefit","knowing","deviant","view","articulated","above","which","true","view","khomeini","from","basic","principles","islam","that","muslim","readers","this","group","will","from","simple","basics","islam","such","views","face","them","they","contradiction","with","basics","islam","subtle","such","issues","seems","sects","exist","islam","while","they","explicitly","proscribed","discussing","here","fine","shall","start","thread","called","infallibility","islam","move","discussion","there","think","this","should","illuminating","make","first","suggestion","when","arabic","words","especially","technical","ones","become","define","them","those","especially","atheists","whom","they","terribly","familiar","please","also","note","that","though","initially","refer","khomeini","heretic","what","understood","claim","rejected","since","personal","infallibility","withdraw","this","basis","such","statement","conditionally","retain","this","reference","regard","khomeini","advocacy","thesis","infallibility","called","twelve","imams","which","clear","conflict","with","that","places","twelve","imams","category","behavior","example","higher","than","that","muhammad","that","shows","that","prophet","clearly","fallible","well","appears","given","your","abstruse","theological","statment","regarding","natures","twelve","imams","placing","them","different","metaphysical","category","than","remainder","humanity","with","possible","exception","muhammad","something","which","verges","association","salam","laikum","alaikum","wassalam","gregg"]
["article","93112","164435j5j","psuvm","john","johnson","psuvm","writes","article","1r39kh","horus","mchp","frank","d012s658","uucp","frank","dwyer","says","specifically","like","know","what","relativism","concludes","when","people","grotesquely","disagree","both","right","them","wrong","sometimes","though","perhaps","rarely","have","pretty","good","idea","them","wrong","never","have","information","make","best","guess","really","must","make","decision","idea","right","moral","judgement","meaningless","implying","that","whether","peace","better","than","meaningless","question","need","discussed","correct","answer","something","else","short","positive","assertion","would","nice","hope","tell","actually","predicated","assumption","that","values","real","statements","like","these","consistently","derive","from","relativist","assumption","that","values","aren","part","objective","reality","relativist","would","like","answer","your","question","phrase","question","makes","unanswerable","concepts","right","wrong","correct","incorrect","true","false","belong","domain","epistemological","rather","than","moral","questions","makes","sense","moral","position","right","wrong","although","legitimate","good","better","than","another","position","illustrate","this","point","looking","psychological","derivatives","epistemology","ethics","perception","motivation","respectively","certainly","percept","right","correct","true","veridical","wrong","incorrect","false","illusory","makes","little","sense","motive","true","false","other","hand","strange","whether","percept","morally","good","evil","certainly","that","question","about","motives","therefore","your","suggested","answers","simply","considered","they","assume","judge","correctness","moral","judgment","true","correct","mean","thing","valued","really","good","should","evaluative","terms","always","sorry","sloppy","phrasing","answer","betterness","used","place","correctness","problem","with","that","double","barrelled","agree","with","first","part","that","rightness","moral","position","meaningless","question","reasons","stated","above","that","irrelevant","alleged","implication","implication","that","cannot","feel","peace","better","than","certainly","make","value","judgments","better","best","without","asserting","correctness","position","never","that","thing","really","better","more","likely","better","from","realistic","frames","reference","sorry","lengthy","dismissal","short","answer","that","when","individuals","grotesquely","disagree","moral","issue","neither","right","correct","wrong","incorrect","they","simply","hold","different","moral","values","feelings","this","where","difficulty","arises","though","starting","thing","there","anything","simple","about","different","moral","values","when","those","values","human","rights","ruthless","doctrinaire","avoidance","degeneracy","degeneracy","another","sort","getting","drunk","picking","ladies","writing","metaphysics","part","life","from","lila","pirsig","peculiar","getting","relativism","from","this","getting","objectivism","good","book","though","good","quote","frank","dwyer","hatching","that","odwyer","from","hens","evelyn","conlon"]
["article","30151","ursa","bear","halat","pooh","bears","halat","writes","article","c5sncl","usenet","indiana","adpeters","sunflower","indiana","andy","peters","writes","evolution","have","said","before","theory","fact","exactly","same","amount","each","existence","atoms","existence","gravity","accept","existence","atoms","gravity","fact","then","should","also","accept","existence","evolution","fact","accept","atoms","gravity","fact","either","deletions","essentially","agree","except","about","definition","fact","scientific","definition","fact","ultimate","truth","rather","theory","which","supported","evidence","predictive","that","pointless","test","anymore","have","fact","evolution","have","theories","evolution","just","have","fact","gravity","theories","gravity","fact","atomic","nature","matter","atomic","theory","fact","evolution","that","current","diversity","life","arose","through","common","descent","this","supported","evidence","that","ever","bothers","test","anymore","theories","evolution","include","theories","regarding","mechanism","common","descent","natural","selection","drift","actual","pathways","evolution","number","other","things","these","constantly","being","tested","because","actual","mechanisms","behind","fact","common","descent","still","question","note","that","fact","evolution","still","theory","other","words","could","theoretically","still","falsified","rejected","since","predictive","consistently","supported","evidence","seems","pointless","explicitly","falsify","anymore","description","atomic","theory","alternative","theories","gravity","deleted","both","very","useful","models","that","have","religious","overtones","requirements","faith","unless","course","want","demand","that","factual","physical","entity","described","exactly","theory","formulated","talks","about","here","where","fail","make","important","distinction","have","shoehorned","facts","existence","gravity","atoms","evolution","into","category","with","theories","which","have","been","proposed","explain","mechanisms","existence","these","things","predictive","considered","fact","mechanisms","other","hand","still","worth","discussing","halat","halat","bear","andy","real","estate","developer","with","offices","around","nation","they","liquidate","holdings","high","speculation","michelle","shocked"]
["article","1993apr20","191048","6139","cnsvax","uwec","nyeda","cnsvax","uwec","david","writes","reply","frank","d012s658","uucp","frank","dwyer","problem","objectivist","determine","status","moral","truths","method","which","they","established","accept","that","such","judgements","reports","what","only","relate","what","ought","naturalistic","fallacy","then","they","cannot","proved","facts","about","nature","world","this","avoided","least","ways","leaving","good","undefined","since","anyone","claims","that","they","know","what","either","lying","touch","with","humanity","undeserving","reply","good","undefined","undefinable","require","everyone","that","they","know","innately","what","right","back","subjectivism","begging","question","below","defining","good","solely","terms","evaluative","terms","ditto","here","evaluative","statement","implies","value","judgement","part","person","making","again","incorrect","question","begging","below","this","point","objectivist","talk","self","evident","truths","pretty","perceptive","that","prof","flew","deny","subjectivist","claim","that","self","evidence","mind","beholder","course","denying","that","subject","object","true","dichotomy","please","explain","this","helps","your","argument","yours","seems","rest","assertion","that","everything","either","subject","object","there","nothing","compelling","about","that","dichotomy","might","just","well","divide","world","into","subject","object","event","even","seems","more","sensible","causation","example","event","subject","object","furthermore","subject","object","were","true","dichotomy","everything","either","subject","object","then","that","statement","self","evident","truth","then","mind","beholder","according","relativist","hardly","compelling","that","fact","that","world","quickly","shoved","entirety","into","subjective","category","idealist","solipsist","argument","that","have","this","perfectly","good","alternate","categories","subject","object","event","which","reduced","subject","object","quality","without","logical","difficulty","guess","denying","that","self","evident","truths","mind","beholder","what","left","claim","that","some","moral","judgements","true","nothing","then","moral","judgements","true","this","thing","that","commonly","referred","nihilism","entails","that","science","value","irrepective","fact","that","some","people","find","useful","anyone","arrives","relativism","subjectivism","from","this","argument","beats","this","makes","sense","either","flew","arguing","that","this","where","objectivist","winds","subjectivist","furthermore","nihilists","believed","nothing","except","science","materialism","revolution","people","referring","ethical","nihilism","subjectivist","well","feel","that","that","remains","that","there","some","moral","judgements","with","which","would","wish","associate","himself","hold","moral","opinion","suggests","know","something","true","have","preferences","regarding","human","activity","those","preferences","should","include","terrorism","that","moral","opinion","true","likewise","preferences","should","include","noterrorism","that","moral","opinion","true","should","choose","preferences","which","include","terrorisim","over","which","includes","noterrorism","reason","this","patently","absurd","also","position","subjectivist","been","pointed","already","others","ditch","strawman","already","reply","mike","cobb","root","message","thread","societal","basis","morality","responded","over","there","intend","this","strawman","something","logically","entailed","relativism","really","ethical","system","where","values","assumed","unreal","different","relativists","than","relativism","implies","frank","dwyer","hatching","that","odwyer","from","hens","evelyn","conlon"]
["article","1993apr16","223250","15242","ncsu","aiken","news","ncsu","wayne","aiken","writes","jsn104","psuvm","wrote","blashephemers","will","hell","believing","prepared","your","eternal","damnation","someone","leave","their","terminal","unattended","again","holy","temple","mass","slack","ncsu","used","underwear","consumption","legal","tender","30904","3095","countries","raleigh","27622","warning","hoard","pennies","probably","jesus","freak","post","probably","jsn104","psuvm","penn","state","just","loaded","hilt","with","bible","bangers","there","vomit","reason","left","they","even","group","stop","playing","rock","music","dining","halls","year","they","deemed","satanic","kampus","krusade","khrist","people","damn","place","most","part","except","liberal","arts","departments","they","safe","havens","rock","music","dining"]
["article","30160","ursa","bear","halat","pooh","bears","halat","writes","speed","quantifiable","measure","resulting","from","methods","that","will","result","same","value","measured","matter","reference","hmmm","bullet","with","zero","velocity","sitting","table","train","moving","60mph","will","moving","speed","0mph","someone","train","60mph","someone","stationary","next","train","what","coincidence","that","exactly","experienced","trouble","knowing","just","coincidence","that","appears","have","been","that","measurements","date","wouldn","saying","that","will","always","that","need","always","reference","frame","makes","speed","relative","what","interesting","here","that","every","person","train","will","stationary","bullet","every","person","bullet","moving","60mph","more","coincidence","still","wish","could","sure","that","always","going","like","that","tommy","definition","physics","short","story","long"]
["article","30192","ursa","bear","halat","pooh","bears","halat","writes","there","objective","physics","einstein","bohr","have","told","that","speaking","knows","relativity","quantum","mechanics","bullshit","speaking","someone","also","knows","relativity","quantum","mechanics","ahead","punk","make","degree","beat","your","degree","simple","take","some","physics","books","start","looking","statements","which","that","there","objective","physics","doubt","will","find","might","find","statements","that","there","objective","length","objective","location","objective","physics","consider","instance","that","speed","light","vacuum","invariant","this","sounds","awful","like","objective","speed","light","vacuum","confuse","construct","with","constructor","take","look","quantum","mechanics","many","objective","observations","made","well","however","physics","objective","bohr","said","randomness","atomic","motion","inherent","motion","itself","einstein","said","that","nature","deterministic","method","observation","that","inserts","randomness","they","were","talking","about","exact","same","results","that","some","results","objective","means","neither","that","results","objective","that","physics","objective","first","after","christmas","truelove","served","leftover","turkey","second","after","christmas","truelove","served","turkey","casserole","that","made","from","leftover","turkey","days","deleted","flaming","turkey","wings","pizza","commercial","bait","arromdee","arromdee","jyusenkyou"]
["here","suggestion","logical","argument","think","covered","though","fallacy","probably","better","name","than","used","about","mathew","inconsistency","counterexample","this","occurs","when","party","points","that","some","source","information","takes","stand","which","inconsistent","with","there","variations","which","either","mutually","agreed","premise","else","stand","elsewhere","from","same","source","second","party","fallaciously","responds","saying","source","really","does","right","here","this","reply","does","refute","allegation","inconsistency","because","does","show","that","source","only","says","example","first","type","koran","says","unbelievers","should","treated","these","ways","both","agree","these","immoral","koran","clearly","says","this","other","passage","that","unbelievers","treated","that","example","second","type","there","biblical","creation","stories","wrong","since","bible","clearly","describes","creation","description","first","after","christmas","truelove","served","leftover","turkey","second","after","christmas","truelove","served","turkey","casserole","that","made","from","leftover","turkey","days","deleted","flaming","turkey","wings","pizza","commercial","bait","arromdee","arromdee","jyusenkyou"]
["article","115621","jaeger","buphy","gregg","jaeger","writes","article","1993apr15","135650","28926","andrews","andrews","norman","paterson","writes","think","right","about","germany","daughter","born","there","think","german","rights","vote","live","there","beyond","rights","citizens","british","citizen","virtue","parentage","that","full","citizenship","example","think","children","could","british","virtue","same","fairly","sure","that","could","obtain","citizenship","making","application","might","require","immigration","germany","almost","certain","that","once","applied","citizenship","inevitable","this","case","nope","germany","extremely","restrictive","citizenship","laws","ethnic","germans","have","lived","russia","over","years","automatically","become","citizens","they","move","germany","turks","their","third","generation","germany","very","good","example","show","citizenship","without","descent","karl","lastly","come","china","hope","touch","fulfilling","lifelong","ambition","your","life","will","ever","dropping","acid","great","wall","duke","pink","floyd","still","even","billion","people","believe"]
["date","1993","0100","from","mathew","mathew","mantis","latest","news","seems","that","koresh","will","give","himself","once","finished","writing","sequel","bible","mathew","writing","seven","seals","something","along","those","lines","already","written","first","seven","which","around","pages","handed","over","assistant","proofreading","would","expect","decent","messiah","have","built","spellchecker","maybe","koresh","will","come","with"]
["date","1993","from","umar","khan","khan","navy","conclusion","that","while","impressed","that","what","little","holy","about","science","accurate","more","impressed","that","holy","contain","same","rampant","errors","evidenced","traditions","would","century","arabia","have","known","what","include","holy","assuming","authored","well","looks","like","folks","religion","islam","have","loosened","discussing","this","topic","well","banking","interest","topic","books","subject","have","also","been","mentioned","addition","mentioned","these","hard","find","think","take","stab","curiosity","know","film","this","subject","pretty","weak","only","quotes","have","seen","which","were","used","show","science","koran","which","posted","here","were","also","pretty","vague","suspect","that","these","books","will","extrapolate","awful","quotes","they","have","least","poster","islam","channel","seems","have","some","misgivings","about","practice","using","koran","decide","what","good","science","wonder","islam","ever","come","with","equivalent","christians","creation","science","topic","would","interesting","find","history","scientific","interpretations","koran","anyone","used","koran","support","earlier","science","which","since","been","discarded","easy","look","science","exists","today","then","interpret","passages","match","those","findings","people","similar","things","with","sayings","nostradamus","time","anyway","rather","unique","claim","islam","worth","checking"]
["article","kmr4","1718","735827952","cwru","kmr4","cwru","keith","ryan","writes","article","c63aec","cbnewsj","decay","cbnewsj","dean","kaflowitz","writes","thing","trademarked","know","charles","lazarus","dead","alive","careful","because","with","name","like","lazarus","might","rise","again","just","start","lawsuit","trademarked","backwards","believe","think","right","mistake","make","backwards","using","computer","keyboard","gods","know","this","atheism","after","tell","what","start","coming","backwards","when","type","from","become","believer","that","asking","miracles","asked","miracle","real","miracle","like","buchanan","become","closet","drag","queen","well","maybe","that","wouldn","miraculous","think","look","fabulous","feather","sequined","like","farrow","wore","gatsby","dean","kaflowitz"]
["article","1r39kh","horus","mchp","frank","d012s658","uucp","frank","dwyer","writes","specifically","like","know","what","relativism","concludes","when","people","grotesquely","disagree","both","right","them","wrong","sometimes","though","perhaps","rarely","have","pretty","good","idea","them","wrong","never","have","information","make","best","guess","really","must","make","decision","idea","right","moral","judgement","meaningless","implying","that","whether","peace","better","than","meaningless","question","need","discussed","correct","answer","something","else","short","positive","assertion","would","nice","close","misses","point","that","there","third","party","with","their","subjective","viewpoint","asked","make","decision","peace","better","than","unless","view","situation","subjectively","humans","course","humans","view","situations","from","perspective","meaningful","discuss","questions","peace","perry","perry","dsinc","decision","support","matthews","these","opinions","nominal","they","yours"]
["article","1993apr25","165315","1190","monu6","monash","darice","yoyo","monash","fred","rice","writes","deletion","created","night","moon","each","travelling","orbit","with","motion","deletion","well","that","certainly","different","looks","there","translation","found","everything","most","surprised","hear","that","night","move","orbit","thought","about","this","some","translations","refer","only","latter","objects","being","orbit","bucaille","translation","seems","indicate","night","travelling","orbit","perhaps","this","understood","when","looks","from","earth","reference","frame","from","this","reference","frame","night","would","appear","orbit","earth","travelling","from","east","west","this","from","reference","frame","when","earth","still","well","that","belongs","other","group","there","interpretation","found","everything","however","allowing","form","interpretation","reduces","information","text","interprteted","zero","have","checked","quote","think","lines","preceding","those","quoted","above","more","interesting","where","mountains","earth","order","immobilize","earth","where","skies","heavens","referred","well","supported","lines","given","above","after","edition","maybe","this","what","meant","above","just","possibility","that","travels","orbit","without","saying","that","earth","does","sounds","geocentric","will","find","more","about","this","still","geocentric","that","moon","move","earth","immobile","sounds","geocentric","benedikt"]
["article","1r5emjinnmk","caltech","keith","caltech","keith","allan","schneider","writes","kmr4","cwru","keith","ryan","writes","this","just","shows","then","that","painful","execution","considered","cruel","unusual","punishment","this","shows","that","cruel","used","constitution","does","refer","whether","punishment","causes","physical","pain","rather","must","different","meaning","think","although","some","forms","execution","painful","electric","chair","looks","particularly","think","pain","relatively","short","lived","drawing","quartering","other","hand","looks","very","painful","victim","wouldn","right","away","bleed","death","imagine","what","have","integral","over","pain","time","lash","with","noodle","ever","only","with","power","quick","about"]
["sandvik","250493163828","sandvik","kent","apple","sandvik","newton","apple","kent","sandvik","writes","article","markp","735580401","avignon","markp","avignon","mark","pundurs","wrote","take","skews","your","perception","reality","come","down","your","perceptions","unskew","never","taken","read","about","strange","lifes","times","ashbury","heights","culture","something","that","usually","profound","these","trippers","mentioned","that","after","their","first","trip","they","changed","their","view","world","other","words","taking","would","change","their","reference","frames","which","would","indicate","that","deep","changes","rewiring","brain","temporarily","will","indeed","change","frames","this","leads","statement","that","there","solid","reference","frame","trippers","modified","their","relative","view","much","haight","ashbury","crowd","probably","existing","dissatisfactions","with","their","lives","dissatisfactions","ameliorated","mumbo","jumbo","about","realities","only","change","experienced","after","gain","knowledge","that","didn","enjoy","twisted","perception","mark","pundurs","resemblance","between","opinions","those","wolfram","research","purely","coincidental"]
["bobbe","vice","robert","beauchaine","6086","yayg","said","execellent","examples","luther","insane","rantings","deleted","sooooo","surprised","that","they","teach","this","part","ideology","high","schools","today","mccreary","twisto","compaq","were","laughter","there","would"]
["article","1qla0g","fido","livesey","solntze","livesey","writes","article","115565","jaeger","buphy","gregg","jaeger","writes","hope","islamic","bank","something","other","than","bcci","which","ripped","many","small","depositors","among","muslim","community","elsewhere","grow","childish","propagandist","gregg","really","sorry","having","pointed","that","practice","things","aren","quite","wonderful","utopia","folks","seem","claim","them","upsets","have","done","such","thing","bbci","example","islamically","owned","operated","bank","what","will","someone","they","weren","real","islamic","owners","operators","islamic","bank","bank","which","operates","according","rules","islam","regard","banking","this","done","explicitly","bank","this","case","with","bcci","these","naive","depositors","their","life","savings","into","bcci","rather","than","nasty","interest","motivated","western","bank","down","street","this","crap","bcci","motivated","same","motives","other","international","banks","with","perhaps","emphasis","dealing","with","outlaws","intelligence","services","various","governments","please","into","thinking","that","will","work","right","next","time","back","childish","propaganda","again","really","ought","life","rather","than","wasting","bandwith","such","empty","typing","there","thousands","islamic","banks","operating","throughout","world","which","ever","hears","about","want","talk","about","corrupted","banks","talk","about","people","been","robbed","american","banks","gregg"]
["article","1993apr22","130421","113279","zeus","calpoly","dmcaloon","tuba","calpoly","david","mcaloon","writes","remember","einstien","said","imagination","greater","than","knowledge","then","einstein","should","have","lunch","with","tien","castro","street","yesterday","when","they","handed","fortune","cookie","that","said","imagination","knowledge","wings","feet"]
["article","1993apr22","015922","7418","daffy","wisc","mccullou","snake2","wisc","mark","mccullough","writes","article","37501","optima","arizona","sham","arizona","shamim","zvonko","mohamed","writes","bullshit","gulf","massacre","ordnance","used","smart","rest","that","just","regular","dumb","iron","bombs","stuff","have","heard","figures","closer","that","smart","stuff","again","follow","here","that","means","this","smart","arsenal","missed","most","figures","have","seen","place","ratio","close","which","still","higher","than","your","have","source","that","says","that","date","civilian","death","count","excuse","mean","collateral","damage","about","have","never","seen","source","that","claiming","such","figure","please","post","source","reliability","judged","obviously","have","different","sources","bill","moyers","happens","theist","this","atheism","documentary","after","main","source","think","still","have","videotape","others","include","nation","progressive","rest","article","mere","rationalisation","claim","that","sanitation","plants","strategic","legitimate","targets","what","happens","civilians","city","with","sewer","system","what","happens","civilians","when","destroy","water","purification","plants","when","hospitals","handle","resultant","epidemics","because","there","more","electricity","what","exactly","your","sources","have","sure","seen","postol","interviews","media","where","demostrates","pentagon","lied","about","patriot","effectiveness","what","your","source","effectiveness","claim","case","know","this","relevant","atheism","about","move","somewhere","else","shamim","mohamed","uunet","noao","cmcl2","arizona","shamim","shamim","arizona","take","this","cross","garlic","here","mezuzah","jewish","page","koran","muslim","buddhist","your","member","league","programming","freedom","write","uunet"]
["default","followups","newsgroups","short","excerpt","brookfield","mother","filed","complaint","with","elmbrook","school","board","alleging","elementary","school","parent","teacher","organization","show","discrimination","supporting","scouts","gisele","klemp","said","wednesday","sponsorship","scout","troop","scout","pack","that","meet","hillside","elementary","school","surbarban","milwaukee","discrimination","because","scouts","homosexuals","president","gail","pludeman","disputed","charges","discrimination","said","believes","scouts","beneficial","carl","kadie","represent","organization","this","just","kadie","uiuc"]
["article","1993apr21","144114","8057","willdb","william","david","battles","says","article","1993apr16","223250","15242","ncsu","aiken","news","ncsu","wayne","aiken","writes","jsn104","psuvm","wrote","blashephemers","will","hell","believing","prepared","your","eternal","damnation","someone","leave","their","terminal","unattended","again","probably","jesus","freak","post","probably","jsn104","psuvm","penn","state","just","loaded","hilt","with","bible","bangers","there","vomit","reason","left","they","even","group","stop","playing","rock","music","dining","halls","year","they","deemed","satanic","kampus","krusade","khrist","people","damn","place","most","part","except","liberal","arts","departments","they","safe","havens","sounds","like","were","going","different","penn","state","something","kampus","krusade","khrist","very","vocal","here","they","really","have","little","power","anything","done","sometimes","seems","like","there","them","because","they","generally","more","vocal","than","their","opposition","there","really","aren","that","many","krusaders","liberals","tend","keep","themselves","they","help","since","they","really","want","allowed","about","their","lives","they","want","hear","from","about","most","them","bible","bangers","stand","because","they","want","everyone","forced","live","according","bible","banger","rules","krusaders","certainly","this","place","rather","average","people","here","much","like","rest","just","like","everywhere","else","some","factions","louder","than","others","andrew"]
["dean","kaflowitz","decay","cbnewsj","wrote","think","letting","atheist","mythology","great","start","realize","immediately","that","interested","discussion","going","thump","your","babble","would","much","prefer","answer","from","healy","seems","have","reasonable","reasoned","approach","things","aren","creationist","made","silly","statements","about","evolution","some","time","then","must","talking","christian","mythology","hoping","discuss","something","with","reasonable","logical","person","seem","have","your","side","repetition","same","boring","mythology","seen","thousand","times","before","deleting","rest","your","remarks","unless","spot","something","that","approaches","answer","because","they","merely","repetition","some","uninteresting","doctrine","other","contain","thought","have","congratulate","though","bill","wouldn","know","logical","argument","balls","such","persistent","lack","function","face","repeated","attempts","assist","learning","which","have","seen","this","forum","others","past","speaks","talent","that","goes","well","beyond","meager","abilities","just","seem","have","that","capacity","ignoring","outside","influences","dean","kaflowitz","dean","read","your","comments","think","that","merely","characterizing","argument","same","refuting","think","that","hominum","attacks","sufficient","make","point","other","than","disapproval","have","contribution","make","bill"]
["schaertel","nasa","kodak","wrote","back","1000","years","whatever","pretend","someone","says","someday","there","will","moon","remember","still","think","world","flat","this","quite","extraordinary","claim","think","lewis","argued","that","medieval","people","think","world","flat","however","this","argument","goes","both","ways","pretend","someone","telling","plato","that","highly","probable","that","people","really","have","souls","their","minds","their","consciousness","just","something","their","brains","make","their","brains","their","body","actually","ahead","their","mind","even","voluntarly","actions","think","plato","would","have","been","happy","with","this","neither","would","paul","although","paul","ideas","were","quite","different","however","would","read","what","discuss","this","group","just","preach","would","that","there","currently","much","evidence","favour","these","statements","same","applies","theory","natural","selection","other","sacred","cows","christianity","origins","human","nature","believe","spirits","devils","immortal","souls","more","than","gods","fact","argue","existence","until","time","there","really","either","prove","disprove","there","will","time","when","know","truth","hope","believe","right","hope","pray","that","find","your","said","believe","what","want","this","what","assumed","along","maybe","shouldn","have","said","guess","really","believe","there","plant","seeds","either","they","grow","they","might","well","planting","satan","seeds","ever","thought","this","besides","haven","explained","must","believe","blindly","without","guiding","light","least","haven","noticed","think","this","fair","play","part","your","argument","sounds","like","version","pascal","wager","please","read","this","fallacy","discussed","there","they","they","planted","holy","spirit","nurishment","that","helps","them","grow","that","comes","from","failed","help","from","because","wrong","attitude","sorry","think","this","spirit","exists","people","claim","have","access","just","look","badly","deluded","gifted","petri","petri","pihko","mathematics","truth","pihatie","finou","oulu","physics","rule","90650","oulu","kempmp","game","finland","phoenix","oulu","chemistry","game"]
["okcforum","osrhe","bill","conner","writes","that","still","grasp","obvious","because","your","devious","nature","only","find","fault","with","argument","misrepresenting","since","ignored","entire","substance","substantial","post","nerve","claiming","that","understand","what","being","talked","about","respond","previous","post","shut","fuck","really","annoying","maddi","hausmann","madhaus","netcom","centigram","communications","corp","jose","california","3553","kids","please","this","home","remember","post","professionally"]
["article","1993apr26","070405","3615","doug","wisc","kahraman","hprisc","wisc","gokalp","kahraman","writes","this","respect","since","atheists","dominantly","arrogant","claim","self","control","self","ownership","they","would","make","pharoahs","look","like","very","humble","decent","people","comparison","logic","this","since","myself","others","like","should","also","themselves","going","further","things","self","existent","self","standing","self","living","atheists","tend","claim","self","control","self","ownership","saying","that","theists","claim","have","self","control","think","atheists","dominantly","arrogant","they","claim","some","that","supremacy","over","mankind","this","claim","would","arrogant","atheists","claim","most","atheists","claim","themselves","think","disagreement","with","this","claim","self","ownership","would","supremely","arrogant","john","david","munch","jmunch","hertz","elee","calpoly","heart","change","full","hate","love","people","allowed","base","their","lives","through","their","hearts","anything","happen","dangerous","situation","opinion","bobby","mozumder","describing","problems","with","atheism"]
["article","1993apr27","073723","18577","csis","csiro","csis","csiro","peter","lamb","writes","king","ctron","john","king","writes","again","context","quotes","from","king","deleted","along","with","context","thoughtfully","provided","lamb","john","there","commandments","that","says","something","like","shall","bear","false","witness","doesn","quoting","someone","that","completely","inverts","what","they","were","trying","constitute","bearing","false","witness","doesn","this","cause","internal","conflict","this","because","christian","very","perturbed","creation","science","camp","what","would","characterize","sleazy","tactics","order","debate","there","long","tradition","christian","thought","that","maintains","that","essential","christian","ethic","that","does","justify","means","other","words","something","important","what","intended","accomplish","think","that","using","misquoted","excerpts","from","people","disagree","with","brings","very","much","glory","dave","david","knapp","imager","llnl","1023","statistics","made"]

Showing the first 880 rows.

Step 4. Remove Stopwords

We can easily remove stopwords using the StopWordsRemover().

If a list of stopwords is not provided, the StopWordsRemover() will use this list of stopwords, also shown below, by default.

are,around,as,at,back,be,became,because,become,becomes,becoming,been,before,beforehand,behind,being,below,beside,besides,between,beyond,bill,both,bottom,but,by,call,can,cannot,cant,co,computer,con,could,
couldnt,cry,de,describe,detail,do,done,down,due,during,each,eg,eight,either,eleven,else,elsewhere,empty,enough,etc,even,ever,every,everyone,everything,everywhere,except,few,fifteen,fify,fill,find,fire,first,
five,for,former,formerly,forty,found,four,from,front,full,further,get,give,go,had,has,hasnt,have,he,hence,her,here,hereafter,hereby,herein,hereupon,hers,herself,him,himself,his,how,however,hundred,i,ie,if,
in,inc,indeed,interest,into,is,it,its,itself,keep,last,latter,latterly,least,less,ltd,made,many,may,me,meanwhile,might,mill,mine,more,moreover,most,mostly,move,much,must,my,myself,name,namely,neither,never,
nevertheless,next,nine,no,nobody,none,noone,nor,not,nothing,now,nowhere,of,off,often,on,once,one,only,onto,or,other,others,otherwise,our,ours,ourselves,out,over,own,part,per,perhaps,please,put,rather,re,same,
see,seem,seemed,seeming,seems,serious,several,she,should,show,side,since,sincere,six,sixty,so,some,somehow,someone,something,sometime,sometimes,somewhere,still,such,system,take,ten,than,that,the,their,them,
themselves,then,thence,there,thereafter,thereby,therefore,therein,thereupon,these,they,thick,thin,third,this,those,though,three,through,throughout,thru,thus,to,together,too,top,toward,towards,twelve,twenty,two,
un,under,until,up,upon,us,very,via,was,we,well,were,what,whatever,when,whence,whenever,where,whereafter,whereas,whereby,wherein,whereupon,wherever,whether,which,while,whither,who,whoever,whole,whom,whose,why,will,
with,within,without,would,yet,you,your,yours,yourself,yourselves

You can use getStopWords() to see the list of stopwords that will be used.

In this example, we will specify a list of stopwords for the StopWordsRemover() to use. We do this so that we can add on to the list later on.

display(dbutils.fs.ls("dbfs:/tmp/stopwords")) // check if the file already exists from earlier wget and dbfs-load
dbfs:/tmp/stopwordsstopwords2237

If the file dbfs:/tmp/stopwords already exists then skip the next two cells, otherwise download and load it into DBFS by uncommenting and evaluating the next two cells.

//%sh wget http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words -O /tmp/stopwords # uncomment '//' at the beginning and repeat only if needed again
//%fs cp file:/tmp/stopwords dbfs:/tmp/stopwords # uncomment '//' at the beginning and repeat only if needed again
// List of stopwords
val stopwords = sc.textFile("/tmp/stopwords").collect()
stopwords: Array[String] = Array(a, about, above, across, after, afterwards, again, against, all, almost, alone, along, already, also, although, always, am, among, amongst, amoungst, amount, an, and, another, any, anyhow, anyone, anything, anyway, anywhere, are, around, as, at, back, be, became, because, become, becomes, becoming, been, before, beforehand, behind, being, below, beside, besides, between, beyond, bill, both, bottom, but, by, call, can, cannot, cant, co, computer, con, could, couldnt, cry, de, describe, detail, do, done, down, due, during, each, eg, eight, either, eleven, else, elsewhere, empty, enough, etc, even, ever, every, everyone, everything, everywhere, except, few, fifteen, fify, fill, find, fire, first, five, for, former, formerly, forty, found, four, from, front, full, further, get, give, go, had, has, hasnt, have, he, hence, her, here, hereafter, hereby, herein, hereupon, hers, herself, him, himself, his, how, however, hundred, i, ie, if, in, inc, indeed, interest, into, is, it, its, itself, keep, last, latter, latterly, least, less, ltd, made, many, may, me, meanwhile, might, mill, mine, more, moreover, most, mostly, move, much, must, my, myself, name, namely, neither, never, nevertheless, next, nine, no, nobody, none, noone, nor, not, nothing, now, nowhere, of, off, often, on, once, one, only, onto, or, other, others, otherwise, our, ours, ourselves, out, over, own, part, per, perhaps, please, put, rather, re, same, see, seem, seemed, seeming, seems, serious, several, she, should, show, side, since, sincere, six, sixty, so, some, somehow, someone, something, sometime, sometimes, somewhere, still, such, system, take, ten, than, that, the, their, them, themselves, then, thence, there, thereafter, thereby, therefore, therein, thereupon, these, they, thick, thin, third, this, those, though, three, through, throughout, thru, thus, to, together, too, top, toward, towards, twelve, twenty, two, un, under, until, up, upon, us, very, via, was, we, well, were, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, whither, who, whoever, whole, whom, whose, why, will, with, within, without, would, yet, you, your, yours, yourself, yourselves)
stopwords.length // find the number of stopwords in the scala Array[String]
res17: Int = 319

Finally, we can just remove the stopwords using the StopWordsRemover as follows:

import org.apache.spark.ml.feature.StopWordsRemover

// Set params for StopWordsRemover
val remover = new StopWordsRemover()
.setStopWords(stopwords) // This parameter is optional
.setInputCol("tokens")
.setOutputCol("filtered")

// Create new DF with Stopwords removed
val filtered_df = remover.transform(tokenized_df)
import org.apache.spark.ml.feature.StopWordsRemover remover: org.apache.spark.ml.feature.StopWordsRemover = stopWords_f8c580d1e0ea filtered_df: org.apache.spark.sql.DataFrame = [corpus: string, id: bigint ... 2 more fields]

Step 5. Vector of Token Counts

LDA takes in a vector of token counts as input. We can use the CountVectorizer() to easily convert our text documents into vectors of token counts.

The CountVectorizer will return (VocabSize, Array(Indexed Tokens), Array(Token Frequency)).

Two handy parameters to note:

  • setMinDF: Specifies the minimum number of different documents a term must appear in to be included in the vocabulary.
  • setMinTF: Specifies the minimum number of times a term has to appear in a document to be included in the vocabulary.
import org.apache.spark.ml.feature.CountVectorizer

// Set params for CountVectorizer
val vectorizer = new CountVectorizer()
.setInputCol("filtered")
.setOutputCol("features")
.setVocabSize(10000) 
.setMinDF(5) // the minimum number of different documents a term must appear in to be included in the vocabulary.
.fit(filtered_df)
import org.apache.spark.ml.feature.CountVectorizer vectorizer: org.apache.spark.ml.feature.CountVectorizerModel = cntVec_7754118c6a03
// Create vector of token counts
val countVectors = vectorizer.transform(filtered_df).select("id", "features")
countVectors: org.apache.spark.sql.DataFrame = [id: bigint, features: vector]
// see the first countVectors
countVectors.take(2)
res20: Array[org.apache.spark.sql.Row] = Array([0,(6139,[0,1,147,229,317,499,500,534,571,596,775,843,850,853,959,1166,1662,1924,2063,2481,2806,3044,3437,3843,4720,5403,5712],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])], [1,(6139,[0,2,43,135,188,243,720,778,923,949,1389,2194,2320,2725,6051,6065],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0])])

To use the LDA algorithm in the MLlib library, we have to convert the DataFrame back into an RDD.

// Convert DF to RDD
import org.apache.spark.ml.linalg.Vector

val lda_countVector = countVectors.map { case Row(id: Long, countVector: Vector) => (id, countVector) }
import org.apache.spark.ml.linalg.Vector lda_countVector: org.apache.spark.sql.Dataset[(Long, org.apache.spark.ml.linalg.Vector)] = [_1: bigint, _2: vector]
// format: Array(id, (VocabSize, Array(indexedTokens), Array(Token Frequency)))
lda_countVector.take(1)
res22: Array[(Long, org.apache.spark.ml.linalg.Vector)] = Array((0,(6139,[0,1,147,229,317,499,500,534,571,596,775,843,850,853,959,1166,1662,1924,2063,2481,2806,3044,3437,3843,4720,5403,5712],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])))

Let's get an overview of LDA in Spark's MLLIB

Create LDA model with Online Variational Bayes

We will now set the parameters for LDA. We will use the OnlineLDAOptimizer() here, which implements Online Variational Bayes.

Choosing the number of topics for your LDA model requires a bit of domain knowledge. As we know that there are 20 unique newsgroups in our dataset, we will set numTopics to be 20.

val numTopics = 20
numTopics: Int = 20

We will set the parameters needed to build our LDA model. We can also setMiniBatchFraction for the OnlineLDAOptimizer, which sets the fraction of corpus sampled and used at each iteration. In this example, we will set this to 0.8.

import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}

// Set LDA params
val lda = new LDA()
.setOptimizer(new OnlineLDAOptimizer().setMiniBatchFraction(0.8))
.setK(numTopics)
.setMaxIterations(3)
.setDocConcentration(-1) // use default values
.setTopicConcentration(-1) // use default values
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer} lda: org.apache.spark.mllib.clustering.LDA = org.apache.spark.mllib.clustering.LDA@40380725

Create the LDA model with Online Variational Bayes.

// convert ML vectors into MLlib vectors
val lda_countVector_mllib = lda_countVector.map { case (id, vector) => (id, org.apache.spark.mllib.linalg.Vectors.fromML(vector)) }.rdd

val ldaModel = lda.run(lda_countVector_mllib)
lda_countVector_mllib: org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.linalg.Vector)] = MapPartitionsRDD[7888] at rdd at command-1805207615647650:2 ldaModel: org.apache.spark.mllib.clustering.LDAModel = org.apache.spark.mllib.clustering.LocalLDAModel@19cac4a1

Watch Online Learning for Latent Dirichlet Allocation in NIPS2010 by Matt Hoffman (right click and open in new tab)

Matt Hoffman's NIPS 2010 Talk Online LDA]

Also see the paper on Online varioational Bayes by Matt linked for more details (from the above URL): http://videolectures.net/site/normal_dl/tag=83534/nips2010_1291.pdf

Note that using the OnlineLDAOptimizer returns us a LocalLDAModel, which stores the inferred topics of your corpus.

Review Topics

We can now review the results of our LDA model. We will print out all 20 topics with their corresponding term probabilities.

Note that you will get slightly different results every time you run an LDA model since LDA includes some randomization.

Let us review results of LDA model with Online Variational Bayes, step by step.

val topicIndices = ldaModel.describeTopics(maxTermsPerTopic = 5)
topicIndices: Array[(Array[Int], Array[Double])] = Array((Array(2, 137, 192, 6, 0),Array(0.001360193122510551, 9.997969337219624E-4, 9.151267747290565E-4, 8.605244610467347E-4, 7.415440144947349E-4)), (Array(0, 1, 125, 2, 92),Array(4.0252690024786584E-4, 3.5896651722410806E-4, 3.461004819482739E-4, 3.275547808421133E-4, 3.117746311459888E-4)), (Array(3, 0, 1, 2, 1786),Array(6.40547098662335E-4, 5.994060831654556E-4, 5.714062311271855E-4, 5.535576359640754E-4, 5.346198886998113E-4)), (Array(259, 451, 235, 719, 20),Array(9.312612236661755E-4, 8.884529725963027E-4, 8.520706418120034E-4, 7.618481436503527E-4, 7.521813283215468E-4)), (Array(0, 1, 3, 4, 6),Array(6.880376441768029E-4, 5.357808751045427E-4, 5.036417628014656E-4, 5.022527362963184E-4, 4.517682900758292E-4)), (Array(102, 0, 1, 5, 3),Array(0.001259288300019392, 0.0012487164774715495, 0.0011493174646660682, 0.0010179666203659974, 0.0010155405867303218)), (Array(132, 34, 4, 5, 33),Array(7.067730951858402E-4, 6.545370665265432E-4, 6.227317831261301E-4, 5.40739770950819E-4, 5.185243035811351E-4)), (Array(2, 898, 38, 459, 41),Array(7.043415748580601E-4, 6.351533412493357E-4, 5.995760219487369E-4, 5.816122948936323E-4, 5.49899856481796E-4)), (Array(184, 771, 77, 19, 1446),Array(8.193252643514662E-4, 6.915793935933891E-4, 6.44633710573037E-4, 6.242749432300566E-4, 4.923742115599386E-4)), (Array(0, 357, 5, 339, 795),Array(9.103299472424208E-4, 6.36360604343079E-4, 6.019624843300716E-4, 5.819178258390477E-4, 5.111887220315569E-4)), (Array(0, 4, 5, 1, 2),Array(0.004344099284108751, 0.004189668657244558, 0.003952619908656169, 0.003743448415527585, 0.0036838944764180427)), (Array(457, 4, 429, 2, 7),Array(4.1181535960353586E-4, 3.913415560443006E-4, 3.412161093997264E-4, 3.1771882364713035E-4, 2.997685637898774E-4)), (Array(206, 279, 647, 11, 227),Array(7.898424704380656E-4, 6.81204339923066E-4, 4.403151884063283E-4, 4.398038745044428E-4, 4.288031551071563E-4)), (Array(0, 1, 3, 429, 463),Array(5.827465353614684E-4, 4.6267807461009117E-4, 4.279308831997346E-4, 4.093463256726439E-4, 3.881473704697335E-4)), (Array(709, 1090, 0, 2, 3),Array(0.0011042737496341044, 9.4903184167095E-4, 5.972014159984239E-4, 5.431187773747703E-4, 5.091620939840125E-4)), (Array(0, 491, 832, 1, 2),Array(9.46311173130666E-4, 8.118917102231931E-4, 7.642966110832318E-4, 7.47748687062401E-4, 6.340033856937127E-4)), (Array(331, 17, 16, 339, 531),Array(5.488534056507825E-4, 3.7143402261938013E-4, 3.436804529912375E-4, 3.3658232741781057E-4, 3.2119290341321315E-4)), (Array(0, 1, 300, 8, 70),Array(0.0026942056027941526, 0.0024101385819321947, 0.0016260574771431863, 9.885349181624383E-4, 8.838203808326238E-4)), (Array(498, 2, 733, 6, 31),Array(6.452057396033239E-4, 5.459150677179816E-4, 3.776044615466835E-4, 3.425151589658119E-4, 3.284099266387307E-4)), (Array(19, 21, 0, 1, 27),Array(0.003720456303782953, 0.0023308228780012084, 0.0015938703779361924, 0.0015500697683953292, 0.001530021217797736)))
val vocabList = vectorizer.vocabulary
vocabList: Array[String] = Array(writes, article, people, just, know, like, think, does, time, good, make, used, windows, want, work, right, problem, need, really, image, said, data, going, information, better, believe, using, software, years, year, mail, sure, point, thanks, drive, program, available, space, power, file, help, government, things, question, doesn, number, case, world, look, read, line, version, come, thing, different, long, jpeg, best, fact, university, probably, real, didn, course, state, true, files, high, possible, actually, 1993, list, game, little, news, group, david, send, tell, wrong, graphics, based, support, able, place, free, called, subject, john, post, reason, color, great, second, card, having, public, email, info, following, start, hard, science, says, example, code, means, evidence, person, maybe, note, general, president, heard, mean, quite, problems, source, systems, life, price, standard, order, window, access, jesus, claim, paul, getting, looking, control, trying, disk, seen, simply, times, book, team, local, play, chip, encryption, idea, truth, opinions, issue, given, research, church, wrote, images, large, display, makes, remember, thought, doing, national, format, away, nasa, human, home, change, saying, small, mark, interested, current, today, area, internet, original, word, left, agree, memory, machine, works, microsoft, instead, working, hardware, kind, request, higher, sort, programs, questions, money, entry, later, israel, mike, hand, guess, pretty, include, netcom, address, matter, cause, technology, uiuc, video, speed, wire, type, days, server, usually, view, april, open, package, earth, christian, told, stuff, unless, similar, important, major, size, house, known, provide, faith, michael, rights, ground, phone, body, including, center, health, american, apple, feel, user, cost, text, lines, answer, bible, care, copy, wouldn, understand, check, anybody, security, mind, live, started, certainly, mouse, running, message, women, level, network, study, clinton, making, position, company, came, board, screen, groups, common, white, talking, single, black, special, quality, test, wiring, christians, monitor, likely, effect, nice, light, medical, members, uucp, posted, certain, hope, sources, cars, write, clear, difference, canada, fine, hear, launch, press, police, love, history, couple, build, situation, books, words, particular, jewish, specific, sense, anti, model, religion, stop, posting, unix, talk, private, discussion, school, contact, cable, frank, turkish, keys, built, legal, sound, consider, features, service, taking, simple, reference, argument, tools, comes, short, children, date, night, clipper, applications, jews, application, comments, device, scsi, force, process, theory, doubt, tried, objective, usenet, steve, early, self, experience, expect, games, uses, tape, needed, manager, killed, interesting, value, station, turn, easy, death, exactly, response, needs, ones, amiga, correct, according, wanted, shuttle, considered, language, reading, james, states, drug, strong, goes, koresh, term, insurance, personal, taken, result, future, form, opinion, past, sorry, mentioned, rules, especially, religious, hell, drivers, written, guns, various, author, country, design, happy, went, society, plus, gets, latest, longer, haven, asked, results, analysis, previous, cases, york, laws, main, section, parts, advance, aren, christ, weapons, required, mode, input, looks, week, accept, community, washington, option, series, circuit, robert, numbers, disease, head, fast, israeli, range, exist, venus, andrew, period, offer, macintosh, driver, office, moral, allow, organization, toronto, involved, clock, players, runs, values, department, half, months, choice, knows, picture, colors, brian, sell, wasn, hockey, object, took, includes, individual, cards, federal, dave, armenians, takes, currently, suggest, protect, follow, americans, candida, policy, directly, total, title, statement, present, devices, happened, equipment, assume, close, food, purpose, recently, scientific, christianity, require, reasons, deal, users, media, provides, happen, couldn, goal, bike, save, george, wants, city, shall, dead, lost, action, speak, road, condition, complete, court, uunet, easily, terms, batf, engineering, league, details, california, mission, voice, useful, baseball, lead, obviously, completely, algorithm, water, disclaimer, output, responsible, administration, ways, international, compatible, sent, clearly, rest, pass, hours, appreciated, freedom, digital, kill, issues, business, coming, operating, average, project, deleted, context, processing, companies, story, figure, error, fans, newsgroup, appropriate, events, leave, port, berkeley, carry, season, face, trade, convert, political, page, lower, environment, player, king, points, armenian, basis, final, requires, building, heart, performance, difficult, addition, related, stanford, suppose, site, sale, volume, actual, resolution, field, willing, knowledge, apply, claims, supposed, designed, explain, advice, directory, anonymous, commercial, sounds, worth, orbit, lots, limited, defense, entries, basic, radio, necessary, programming, wonder, suspect, wait, changes, neutral, forget, handle, inside, ability, included, signal, young, turkey, family, reply, enforcement, homosexuality, natural, morality, russian, finally, land, services, shot, greek, month, create, installed, printer, paper, friend, thinking, understanding, population, hold, break, comment, homosexual, normal, interface, eric, formats, names, machines, report, peter, setting, product, communications, comp, percent, escrow, avoid, room, east, supply, types, lives, colorado, secure, miles, rutgers, logic, reasonable, arab, library, cubs, expensive, agencies, cheap, recent, gary, million, soon, developed, peace, cancer, multiple, allowed, event, technical, street, caused, gives, soviet, physics, happens, looked, mention, suggestions, doctor, supported, release, obvious, outside, entire, friends, treatment, bitnet, radar, install, chance, mass, folks, table, return, archive, choose, development, print, generally, muslim, jack, meaning, united, wish, smith, trouble, weeks, social, member, electrical, illegal, diet, ideas, exists, areas, concept, requests, straight, child, learn, supports, behavior, morning, asking, appear, provided, pick, studies, possibly, practice, answers, drives, attempt, motif, west, engine, bring, thank, worked, unit, reality, remove, stand, middle, belief, compound, continue, errors, false, modem, henry, trust, bits, existence, changed, decided, near, yeah, safe, facts, loss, contains, extra, guys, arguments, proper, congress, particularly, class, command, drugs, wide, stupid, nature, constitution, institute, frame, armenia, function, manual, attack, fonts, aware, privacy, andy, pages, operations, appears, worse, heat, thread, edge, division, shouldn, knew, effective, wall, distribution, approach, hands, speaking, unfortunately, conference, independent, bought, 1990, turks, modern, civil, ethernet, solution, 1992, serial, added, compression, safety, crime, shows, indiana, virginia, wondering, germany, simms, gave, operation, record, internal, faster, arms, cramer, blood, blue, letter, plastic, spend, allows, hello, utility, rate, appreciate, regular, writing, floppy, abortion, atheism, additional, method, described, base, concerned, stated, surface, kids, played, articles, scott, actions, font, giving, views, switch, tool, decision, playing, step, highly, military, considering, keith, resources, cover, levels, connected, north, capability, places, products, attitude, costs, patients, prevent, controller, fair, rule, buying, late, quote, brought, functions, account, received, creation, watch, majority, cwru, driving, released, authority, committee, chips, quick, forward, student, protection, hate, calls, richard, boston, countries, excellent, poor, market, necessarily, wires, created, shell, western, america, valid, turned, apparently, plan, moon, minutes, lord, arabs, properly, fairly, boxes, murder, keyboard, complex, visual, absolutely, sold, arizona, produce, notice, intelligence, acts, greatly, begin, tests, living, electronics)
val topics = topicIndices.map { case (terms, termWeights) =>
  terms.map(vocabList(_)).zip(termWeights)
}
topics: Array[Array[(String, Double)]] = Array(Array((people,0.001360193122510551), (team,9.997969337219624E-4), (israel,9.151267747290565E-4), (think,8.605244610467347E-4), (writes,7.415440144947349E-4)), Array((writes,4.0252690024786584E-4), (article,3.5896651722410806E-4), (jesus,3.461004819482739E-4), (people,3.275547808421133E-4), (great,3.117746311459888E-4)), Array((just,6.40547098662335E-4), (writes,5.994060831654556E-4), (article,5.714062311271855E-4), (people,5.535576359640754E-4), (votes,5.346198886998113E-4)), Array((women,9.312612236661755E-4), (disease,8.884529725963027E-4), (health,8.520706418120034E-4), (cancer,7.618481436503527E-4), (said,7.521813283215468E-4)), Array((writes,6.880376441768029E-4), (article,5.357808751045427E-4), (just,5.036417628014656E-4), (know,5.022527362963184E-4), (think,4.517682900758292E-4)), Array((science,0.001259288300019392), (writes,0.0012487164774715495), (article,0.0011493174646660682), (like,0.0010179666203659974), (just,0.0010155405867303218)), Array((disk,7.067730951858402E-4), (drive,6.545370665265432E-4), (know,6.227317831261301E-4), (like,5.40739770950819E-4), (thanks,5.185243035811351E-4)), Array((people,7.043415748580601E-4), (abortion,6.351533412493357E-4), (power,5.995760219487369E-4), (period,5.816122948936323E-4), (government,5.49899856481796E-4)), Array((request,8.193252643514662E-4), (requests,6.915793935933891E-4), (send,6.44633710573037E-4), (image,6.242749432300566E-4), (sequence,4.923742115599386E-4)), Array((writes,9.103299472424208E-4), (objective,6.36360604343079E-4), (like,6.019624843300716E-4), (tools,5.819178258390477E-4), (reality,5.111887220315569E-4)), Array((writes,0.004344099284108751), (know,0.004189668657244558), (like,0.003952619908656169), (article,0.003743448415527585), (people,0.0036838944764180427)), Array((venus,4.1181535960353586E-4), (know,3.913415560443006E-4), (york,3.412161093997264E-4), (people,3.1771882364713035E-4), (does,2.997685637898774E-4)), Array((wire,7.898424704380656E-4), (wiring,6.81204339923066E-4), (neutral,4.403151884063283E-4), (used,4.398038745044428E-4), (faith,4.288031551071563E-4)), Array((writes,5.827465353614684E-4), (article,4.6267807461009117E-4), (just,4.279308831997346E-4), (york,4.093463256726439E-4), (office,3.881473704697335E-4)), Array((cubs,0.0011042737496341044), (suck,9.4903184167095E-4), (writes,5.972014159984239E-4), (people,5.431187773747703E-4), (just,5.091620939840125E-4)), Array((writes,9.46311173130666E-4), (armenians,8.118917102231931E-4), (armenia,7.642966110832318E-4), (article,7.47748687062401E-4), (people,6.340033856937127E-4)), Array((sound,5.488534056507825E-4), (need,3.7143402261938013E-4), (problem,3.436804529912375E-4), (tools,3.3658232741781057E-4), (lost,3.2119290341321315E-4)), Array((writes,0.0026942056027941526), (article,0.0024101385819321947), (launch,0.0016260574771431863), (time,9.885349181624383E-4), (1993,8.838203808326238E-4)), Array((candida,6.452057396033239E-4), (people,5.459150677179816E-4), (doctor,3.776044615466835E-4), (think,3.425151589658119E-4), (sure,3.284099266387307E-4)), Array((image,0.003720456303782953), (data,0.0023308228780012084), (writes,0.0015938703779361924), (article,0.0015500697683953292), (software,0.001530021217797736)))

Feel free to take things apart to understand!

topicIndices(0)
res23: (Array[Int], Array[Double]) = (Array(2, 137, 192, 6, 0),Array(0.001360193122510551, 9.997969337219624E-4, 9.151267747290565E-4, 8.605244610467347E-4, 7.415440144947349E-4))
topicIndices(0)._1
res24: Array[Int] = Array(2, 137, 192, 6, 0)
topicIndices(0)._1(0)
res25: Int = 2
vocabList(topicIndices(0)._1(0))
res26: String = people

Review Results of LDA model with Online Variational Bayes - Doing all four steps earlier at once.

val topicIndices = ldaModel.describeTopics(maxTermsPerTopic = 5)
val vocabList = vectorizer.vocabulary
val topics = topicIndices.map { case (terms, termWeights) =>
  terms.map(vocabList(_)).zip(termWeights)
}
println(s"$numTopics topics:")
topics.zipWithIndex.foreach { case (topic, i) =>
  println(s"TOPIC $i")
  topic.foreach { case (term, weight) => println(s"$term\t$weight") }
  println(s"==========")
}
20 topics: TOPIC 0 people 0.001360193122510551 team 9.997969337219624E-4 israel 9.151267747290565E-4 think 8.605244610467347E-4 writes 7.415440144947349E-4 ========== TOPIC 1 writes 4.0252690024786584E-4 article 3.5896651722410806E-4 jesus 3.461004819482739E-4 people 3.275547808421133E-4 great 3.117746311459888E-4 ========== TOPIC 2 just 6.40547098662335E-4 writes 5.994060831654556E-4 article 5.714062311271855E-4 people 5.535576359640754E-4 votes 5.346198886998113E-4 ========== TOPIC 3 women 9.312612236661755E-4 disease 8.884529725963027E-4 health 8.520706418120034E-4 cancer 7.618481436503527E-4 said 7.521813283215468E-4 ========== TOPIC 4 writes 6.880376441768029E-4 article 5.357808751045427E-4 just 5.036417628014656E-4 know 5.022527362963184E-4 think 4.517682900758292E-4 ========== TOPIC 5 science 0.001259288300019392 writes 0.0012487164774715495 article 0.0011493174646660682 like 0.0010179666203659974 just 0.0010155405867303218 ========== TOPIC 6 disk 7.067730951858402E-4 drive 6.545370665265432E-4 know 6.227317831261301E-4 like 5.40739770950819E-4 thanks 5.185243035811351E-4 ========== TOPIC 7 people 7.043415748580601E-4 abortion 6.351533412493357E-4 power 5.995760219487369E-4 period 5.816122948936323E-4 government 5.49899856481796E-4 ========== TOPIC 8 request 8.193252643514662E-4 requests 6.915793935933891E-4 send 6.44633710573037E-4 image 6.242749432300566E-4 sequence 4.923742115599386E-4 ========== TOPIC 9 writes 9.103299472424208E-4 objective 6.36360604343079E-4 like 6.019624843300716E-4 tools 5.819178258390477E-4 reality 5.111887220315569E-4 ========== TOPIC 10 writes 0.004344099284108751 know 0.004189668657244558 like 0.003952619908656169 article 0.003743448415527585 people 0.0036838944764180427 ========== TOPIC 11 venus 4.1181535960353586E-4 know 3.913415560443006E-4 york 3.412161093997264E-4 people 3.1771882364713035E-4 does 2.997685637898774E-4 ========== TOPIC 12 wire 7.898424704380656E-4 wiring 6.81204339923066E-4 neutral 4.403151884063283E-4 used 4.398038745044428E-4 faith 4.288031551071563E-4 ========== TOPIC 13 writes 5.827465353614684E-4 article 4.6267807461009117E-4 just 4.279308831997346E-4 york 4.093463256726439E-4 office 3.881473704697335E-4 ========== TOPIC 14 cubs 0.0011042737496341044 suck 9.4903184167095E-4 writes 5.972014159984239E-4 people 5.431187773747703E-4 just 5.091620939840125E-4 ========== TOPIC 15 writes 9.46311173130666E-4 armenians 8.118917102231931E-4 armenia 7.642966110832318E-4 article 7.47748687062401E-4 people 6.340033856937127E-4 ========== TOPIC 16 sound 5.488534056507825E-4 need 3.7143402261938013E-4 problem 3.436804529912375E-4 tools 3.3658232741781057E-4 lost 3.2119290341321315E-4 ========== TOPIC 17 writes 0.0026942056027941526 article 0.0024101385819321947 launch 0.0016260574771431863 time 9.885349181624383E-4 1993 8.838203808326238E-4 ========== TOPIC 18 candida 6.452057396033239E-4 people 5.459150677179816E-4 doctor 3.776044615466835E-4 think 3.425151589658119E-4 sure 3.284099266387307E-4 ========== TOPIC 19 image 0.003720456303782953 data 0.0023308228780012084 writes 0.0015938703779361924 article 0.0015500697683953292 software 0.001530021217797736 ========== topicIndices: Array[(Array[Int], Array[Double])] = Array((Array(2, 137, 192, 6, 0),Array(0.001360193122510551, 9.997969337219624E-4, 9.151267747290565E-4, 8.605244610467347E-4, 7.415440144947349E-4)), (Array(0, 1, 125, 2, 92),Array(4.0252690024786584E-4, 3.5896651722410806E-4, 3.461004819482739E-4, 3.275547808421133E-4, 3.117746311459888E-4)), (Array(3, 0, 1, 2, 1786),Array(6.40547098662335E-4, 5.994060831654556E-4, 5.714062311271855E-4, 5.535576359640754E-4, 5.346198886998113E-4)), (Array(259, 451, 235, 719, 20),Array(9.312612236661755E-4, 8.884529725963027E-4, 8.520706418120034E-4, 7.618481436503527E-4, 7.521813283215468E-4)), (Array(0, 1, 3, 4, 6),Array(6.880376441768029E-4, 5.357808751045427E-4, 5.036417628014656E-4, 5.022527362963184E-4, 4.517682900758292E-4)), (Array(102, 0, 1, 5, 3),Array(0.001259288300019392, 0.0012487164774715495, 0.0011493174646660682, 0.0010179666203659974, 0.0010155405867303218)), (Array(132, 34, 4, 5, 33),Array(7.067730951858402E-4, 6.545370665265432E-4, 6.227317831261301E-4, 5.40739770950819E-4, 5.185243035811351E-4)), (Array(2, 898, 38, 459, 41),Array(7.043415748580601E-4, 6.351533412493357E-4, 5.995760219487369E-4, 5.816122948936323E-4, 5.49899856481796E-4)), (Array(184, 771, 77, 19, 1446),Array(8.193252643514662E-4, 6.915793935933891E-4, 6.44633710573037E-4, 6.242749432300566E-4, 4.923742115599386E-4)), (Array(0, 357, 5, 339, 795),Array(9.103299472424208E-4, 6.36360604343079E-4, 6.019624843300716E-4, 5.819178258390477E-4, 5.111887220315569E-4)), (Array(0, 4, 5, 1, 2),Array(0.004344099284108751, 0.004189668657244558, 0.003952619908656169, 0.003743448415527585, 0.0036838944764180427)), (Array(457, 4, 429, 2, 7),Array(4.1181535960353586E-4, 3.913415560443006E-4, 3.412161093997264E-4, 3.1771882364713035E-4, 2.997685637898774E-4)), (Array(206, 279, 647, 11, 227),Array(7.898424704380656E-4, 6.81204339923066E-4, 4.403151884063283E-4, 4.398038745044428E-4, 4.288031551071563E-4)), (Array(0, 1, 3, 429, 463),Array(5.827465353614684E-4, 4.6267807461009117E-4, 4.279308831997346E-4, 4.093463256726439E-4, 3.881473704697335E-4)), (Array(709, 1090, 0, 2, 3),Array(0.0011042737496341044, 9.4903184167095E-4, 5.972014159984239E-4, 5.431187773747703E-4, 5.091620939840125E-4)), (Array(0, 491, 832, 1, 2),Array(9.46311173130666E-4, 8.118917102231931E-4, 7.642966110832318E-4, 7.47748687062401E-4, 6.340033856937127E-4)), (Array(331, 17, 16, 339, 531),Array(5.488534056507825E-4, 3.7143402261938013E-4, 3.436804529912375E-4, 3.3658232741781057E-4, 3.2119290341321315E-4)), (Array(0, 1, 300, 8, 70),Array(0.0026942056027941526, 0.0024101385819321947, 0.0016260574771431863, 9.885349181624383E-4, 8.838203808326238E-4)), (Array(498, 2, 733, 6, 31),Array(6.452057396033239E-4, 5.459150677179816E-4, 3.776044615466835E-4, 3.425151589658119E-4, 3.284099266387307E-4)), (Array(19, 21, 0, 1, 27),Array(0.003720456303782953, 0.0023308228780012084, 0.0015938703779361924, 0.0015500697683953292, 0.001530021217797736))) vocabList: Array[String] = Array(writes, article, people, just, know, like, think, does, time, good, make, used, windows, want, work, right, problem, need, really, image, said, data, going, information, better, believe, using, software, years, year, mail, sure, point, thanks, drive, program, available, space, power, file, help, government, things, question, doesn, number, case, world, look, read, line, version, come, thing, different, long, jpeg, best, fact, university, probably, real, didn, course, state, true, files, high, possible, actually, 1993, list, game, little, news, group, david, send, tell, wrong, graphics, based, support, able, place, free, called, subject, john, post, reason, color, great, second, card, having, public, email, info, following, start, hard, science, says, example, code, means, evidence, person, maybe, note, general, president, heard, mean, quite, problems, source, systems, life, price, standard, order, window, access, jesus, claim, paul, getting, looking, control, trying, disk, seen, simply, times, book, team, local, play, chip, encryption, idea, truth, opinions, issue, given, research, church, wrote, images, large, display, makes, remember, thought, doing, national, format, away, nasa, human, home, change, saying, small, mark, interested, current, today, area, internet, original, word, left, agree, memory, machine, works, microsoft, instead, working, hardware, kind, request, higher, sort, programs, questions, money, entry, later, israel, mike, hand, guess, pretty, include, netcom, address, matter, cause, technology, uiuc, video, speed, wire, type, days, server, usually, view, april, open, package, earth, christian, told, stuff, unless, similar, important, major, size, house, known, provide, faith, michael, rights, ground, phone, body, including, center, health, american, apple, feel, user, cost, text, lines, answer, bible, care, copy, wouldn, understand, check, anybody, security, mind, live, started, certainly, mouse, running, message, women, level, network, study, clinton, making, position, company, came, board, screen, groups, common, white, talking, single, black, special, quality, test, wiring, christians, monitor, likely, effect, nice, light, medical, members, uucp, posted, certain, hope, sources, cars, write, clear, difference, canada, fine, hear, launch, press, police, love, history, couple, build, situation, books, words, particular, jewish, specific, sense, anti, model, religion, stop, posting, unix, talk, private, discussion, school, contact, cable, frank, turkish, keys, built, legal, sound, consider, features, service, taking, simple, reference, argument, tools, comes, short, children, date, night, clipper, applications, jews, application, comments, device, scsi, force, process, theory, doubt, tried, objective, usenet, steve, early, self, experience, expect, games, uses, tape, needed, manager, killed, interesting, value, station, turn, easy, death, exactly, response, needs, ones, amiga, correct, according, wanted, shuttle, considered, language, reading, james, states, drug, strong, goes, koresh, term, insurance, personal, taken, result, future, form, opinion, past, sorry, mentioned, rules, especially, religious, hell, drivers, written, guns, various, author, country, design, happy, went, society, plus, gets, latest, longer, haven, asked, results, analysis, previous, cases, york, laws, main, section, parts, advance, aren, christ, weapons, required, mode, input, looks, week, accept, community, washington, option, series, circuit, robert, numbers, disease, head, fast, israeli, range, exist, venus, andrew, period, offer, macintosh, driver, office, moral, allow, organization, toronto, involved, clock, players, runs, values, department, half, months, choice, knows, picture, colors, brian, sell, wasn, hockey, object, took, includes, individual, cards, federal, dave, armenians, takes, currently, suggest, protect, follow, americans, candida, policy, directly, total, title, statement, present, devices, happened, equipment, assume, close, food, purpose, recently, scientific, christianity, require, reasons, deal, users, media, provides, happen, couldn, goal, bike, save, george, wants, city, shall, dead, lost, action, speak, road, condition, complete, court, uunet, easily, terms, batf, engineering, league, details, california, mission, voice, useful, baseball, lead, obviously, completely, algorithm, water, disclaimer, output, responsible, administration, ways, international, compatible, sent, clearly, rest, pass, hours, appreciated, freedom, digital, kill, issues, business, coming, operating, average, project, deleted, context, processing, companies, story, figure, error, fans, newsgroup, appropriate, events, leave, port, berkeley, carry, season, face, trade, convert, political, page, lower, environment, player, king, points, armenian, basis, final, requires, building, heart, performance, difficult, addition, related, stanford, suppose, site, sale, volume, actual, resolution, field, willing, knowledge, apply, claims, supposed, designed, explain, advice, directory, anonymous, commercial, sounds, worth, orbit, lots, limited, defense, entries, basic, radio, necessary, programming, wonder, suspect, wait, changes, neutral, forget, handle, inside, ability, included, signal, young, turkey, family, reply, enforcement, homosexuality, natural, morality, russian, finally, land, services, shot, greek, month, create, installed, printer, paper, friend, thinking, understanding, population, hold, break, comment, homosexual, normal, interface, eric, formats, names, machines, report, peter, setting, product, communications, comp, percent, escrow, avoid, room, east, supply, types, lives, colorado, secure, miles, rutgers, logic, reasonable, arab, library, cubs, expensive, agencies, cheap, recent, gary, million, soon, developed, peace, cancer, multiple, allowed, event, technical, street, caused, gives, soviet, physics, happens, looked, mention, suggestions, doctor, supported, release, obvious, outside, entire, friends, treatment, bitnet, radar, install, chance, mass, folks, table, return, archive, choose, development, print, generally, muslim, jack, meaning, united, wish, smith, trouble, weeks, social, member, electrical, illegal, diet, ideas, exists, areas, concept, requests, straight, child, learn, supports, behavior, morning, asking, appear, provided, pick, studies, possibly, practice, answers, drives, attempt, motif, west, engine, bring, thank, worked, unit, reality, remove, stand, middle, belief, compound, continue, errors, false, modem, henry, trust, bits, existence, changed, decided, near, yeah, safe, facts, loss, contains, extra, guys, arguments, proper, congress, particularly, class, command, drugs, wide, stupid, nature, constitution, institute, frame, armenia, function, manual, attack, fonts, aware, privacy, andy, pages, operations, appears, worse, heat, thread, edge, division, shouldn, knew, effective, wall, distribution, approach, hands, speaking, unfortunately, conference, independent, bought, 1990, turks, modern, civil, ethernet, solution, 1992, serial, added, compression, safety, crime, shows, indiana, virginia, wondering, germany, simms, gave, operation, record, internal, faster, arms, cramer, blood, blue, letter, plastic, spend, allows, hello, utility, rate, appreciate, regular, writing, floppy, abortion, atheism, additional, method, described, base, concerned, stated, surface, kids, played, articles, scott, actions, font, giving, views, switch, tool, decision, playing, step, highly, military, considering, keith, resources, cover, levels, connected, north, capability, places, products, attitude, costs, patients, prevent, controller, fair, rule, buying, late, quote, brought, functions, account, received, creation, watch, majority, cwru, driving, released, authority, committee, chips, quick, forward, student, protection, hate, calls, richard, boston, countries, excellent, poor, market, necessarily, wires, created, shell, western, america, valid, turned, apparently, plan, moon, minutes, lord, arabs, properly, fairly, boxes, murder, keyboard, complex, visual, absolutely, sold, arizona, produce, notice, intelligence, acts, greatly, begin, tests, living, electronics) topics: Array[Array[(String, Double)]] = Array(Array((people,0.001360193122510551), (team,9.997969337219624E-4), (israel,9.151267747290565E-4), (think,8.605244610467347E-4), (writes,7.415440144947349E-4)), Array((writes,4.0252690024786584E-4), (article,3.5896651722410806E-4), (jesus,3.461004819482739E-4), (people,3.275547808421133E-4), (great,3.117746311459888E-4)), Array((just,6.40547098662335E-4), (writes,5.994060831654556E-4), (article,5.714062311271855E-4), (people,5.535576359640754E-4), (votes,5.346198886998113E-4)), Array((women,9.312612236661755E-4), (disease,8.884529725963027E-4), (health,8.520706418120034E-4), (cancer,7.618481436503527E-4), (said,7.521813283215468E-4)), Array((writes,6.880376441768029E-4), (article,5.357808751045427E-4), (just,5.036417628014656E-4), (know,5.022527362963184E-4), (think,4.517682900758292E-4)), Array((science,0.001259288300019392), (writes,0.0012487164774715495), (article,0.0011493174646660682), (like,0.0010179666203659974), (just,0.0010155405867303218)), Array((disk,7.067730951858402E-4), (drive,6.545370665265432E-4), (know,6.227317831261301E-4), (like,5.40739770950819E-4), (thanks,5.185243035811351E-4)), Array((people,7.043415748580601E-4), (abortion,6.351533412493357E-4), (power,5.995760219487369E-4), (period,5.816122948936323E-4), (government,5.49899856481796E-4)), Array((request,8.193252643514662E-4), (requests,6.915793935933891E-4), (send,6.44633710573037E-4), (image,6.242749432300566E-4), (sequence,4.923742115599386E-4)), Array((writes,9.103299472424208E-4), (objective,6.36360604343079E-4), (like,6.019624843300716E-4), (tools,5.819178258390477E-4), (reality,5.111887220315569E-4)), Array((writes,0.004344099284108751), (know,0.004189668657244558), (like,0.003952619908656169), (article,0.003743448415527585), (people,0.0036838944764180427)), Array((venus,4.1181535960353586E-4), (know,3.913415560443006E-4), (york,3.412161093997264E-4), (people,3.1771882364713035E-4), (does,2.997685637898774E-4)), Array((wire,7.898424704380656E-4), (wiring,6.81204339923066E-4), (neutral,4.403151884063283E-4), (used,4.398038745044428E-4), (faith,4.288031551071563E-4)), Array((writes,5.827465353614684E-4), (article,4.6267807461009117E-4), (just,4.279308831997346E-4), (york,4.093463256726439E-4), (office,3.881473704697335E-4)), Array((cubs,0.0011042737496341044), (suck,9.4903184167095E-4), (writes,5.972014159984239E-4), (people,5.431187773747703E-4), (just,5.091620939840125E-4)), Array((writes,9.46311173130666E-4), (armenians,8.118917102231931E-4), (armenia,7.642966110832318E-4), (article,7.47748687062401E-4), (people,6.340033856937127E-4)), Array((sound,5.488534056507825E-4), (need,3.7143402261938013E-4), (problem,3.436804529912375E-4), (tools,3.3658232741781057E-4), (lost,3.2119290341321315E-4)), Array((writes,0.0026942056027941526), (article,0.0024101385819321947), (launch,0.0016260574771431863), (time,9.885349181624383E-4), (1993,8.838203808326238E-4)), Array((candida,6.452057396033239E-4), (people,5.459150677179816E-4), (doctor,3.776044615466835E-4), (think,3.425151589658119E-4), (sure,3.284099266387307E-4)), Array((image,0.003720456303782953), (data,0.0023308228780012084), (writes,0.0015938703779361924), (article,0.0015500697683953292), (software,0.001530021217797736)))

Going through the results, you may notice that some of the topic words returned are actually stopwords that are specific to our dataset (for eg: "writes", "article"...). Let's try improving our model.

Step 8. Model Tuning - Refilter Stopwords

We will try to improve the results of our model by identifying some stopwords that are specific to our dataset. We will filter these stopwords out and rerun our LDA model to see if we get better results.

val add_stopwords = Array("article", "writes", "entry", "date", "udel", "said", "tell", "think", "know", "just", "newsgroup", "line", "like", "does", "going", "make", "thanks")
add_stopwords: Array[String] = Array(article, writes, entry, date, udel, said, tell, think, know, just, newsgroup, line, like, does, going, make, thanks)
// Combine newly identified stopwords to our exising list of stopwords
val new_stopwords = stopwords.union(add_stopwords)
new_stopwords: Array[String] = Array(a, about, above, across, after, afterwards, again, against, all, almost, alone, along, already, also, although, always, am, among, amongst, amoungst, amount, an, and, another, any, anyhow, anyone, anything, anyway, anywhere, are, around, as, at, back, be, became, because, become, becomes, becoming, been, before, beforehand, behind, being, below, beside, besides, between, beyond, bill, both, bottom, but, by, call, can, cannot, cant, co, computer, con, could, couldnt, cry, de, describe, detail, do, done, down, due, during, each, eg, eight, either, eleven, else, elsewhere, empty, enough, etc, even, ever, every, everyone, everything, everywhere, except, few, fifteen, fify, fill, find, fire, first, five, for, former, formerly, forty, found, four, from, front, full, further, get, give, go, had, has, hasnt, have, he, hence, her, here, hereafter, hereby, herein, hereupon, hers, herself, him, himself, his, how, however, hundred, i, ie, if, in, inc, indeed, interest, into, is, it, its, itself, keep, last, latter, latterly, least, less, ltd, made, many, may, me, meanwhile, might, mill, mine, more, moreover, most, mostly, move, much, must, my, myself, name, namely, neither, never, nevertheless, next, nine, no, nobody, none, noone, nor, not, nothing, now, nowhere, of, off, often, on, once, one, only, onto, or, other, others, otherwise, our, ours, ourselves, out, over, own, part, per, perhaps, please, put, rather, re, same, see, seem, seemed, seeming, seems, serious, several, she, should, show, side, since, sincere, six, sixty, so, some, somehow, someone, something, sometime, sometimes, somewhere, still, such, system, take, ten, than, that, the, their, them, themselves, then, thence, there, thereafter, thereby, therefore, therein, thereupon, these, they, thick, thin, third, this, those, though, three, through, throughout, thru, thus, to, together, too, top, toward, towards, twelve, twenty, two, un, under, until, up, upon, us, very, via, was, we, well, were, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, whither, who, whoever, whole, whom, whose, why, will, with, within, without, would, yet, you, your, yours, yourself, yourselves, article, writes, entry, date, udel, said, tell, think, know, just, newsgroup, line, like, does, going, make, thanks)
import org.apache.spark.ml.feature.StopWordsRemover

// Set Params for StopWordsRemover with new_stopwords
val remover = new StopWordsRemover()
.setStopWords(new_stopwords)
.setInputCol("tokens")
.setOutputCol("filtered")

// Create new df with new list of stopwords removed
val new_filtered_df = remover.transform(tokenized_df)
import org.apache.spark.ml.feature.StopWordsRemover remover: org.apache.spark.ml.feature.StopWordsRemover = stopWords_d2016c08c27d new_filtered_df: org.apache.spark.sql.DataFrame = [corpus: string, id: bigint ... 2 more fields]
// Set Params for CountVectorizer
val vectorizer = new CountVectorizer()
.setInputCol("filtered")
.setOutputCol("features")
.setVocabSize(10000)
.setMinDF(5)
.fit(new_filtered_df)

// Create new df of countVectors
val new_countVectors = vectorizer.transform(new_filtered_df).select("id", "features")
vectorizer: org.apache.spark.ml.feature.CountVectorizerModel = cntVec_c4d2fcacf493 new_countVectors: org.apache.spark.sql.DataFrame = [id: bigint, features: vector]
// Convert DF to RDD
val new_lda_countVector = new_countVectors.map { case Row(id: Long, countVector: Vector) => (id, countVector) }
new_lda_countVector: org.apache.spark.sql.Dataset[(Long, org.apache.spark.ml.linalg.Vector)] = [_1: bigint, _2: vector]

We will also increase MaxIterations to 10 to see if we get better results.

// Set LDA parameters

val new_lda = new LDA()
.setOptimizer(new OnlineLDAOptimizer().setMiniBatchFraction(0.8))
.setK(numTopics)
.setMaxIterations(10) // more than 3 this time
.setDocConcentration(-1) // use default values
.setTopicConcentration(-1) // use default values
new_lda: org.apache.spark.mllib.clustering.LDA = org.apache.spark.mllib.clustering.LDA@307777a9

How to find what the default values are?

Dive into the source!!!

  1. Let's find the default value for docConcentration now.
  2. Got to Apache Spark package Root: https://spark.apache.org/docs/latest/api/scala/#package
  3. search for 'ml' in the search box on the top left (ml is for ml library)
  4. Then find the LDA by scrolling below on the left to mllib's clustering methods and click on LDA
  5. Then click on the source code link which should take you here:

    /**
     * Concentration parameter (commonly named "alpha") for the prior placed on documents'
     * distributions over topics ("theta").
     *
     * This is the parameter to a Dirichlet distribution, where larger values mean more smoothing
     * (more regularization).
     *
     * If not set by the user, then docConcentration is set automatically. If set to
     * singleton vector [alpha], then alpha is replicated to a vector of length k in fitting.
     * Otherwise, the [[docConcentration]] vector must be length k.
     * (default = automatic)
     *
     * Optimizer-specific parameter settings:
     *  - EM
     *     - Currently only supports symmetric distributions, so all values in the vector should be
     *       the same.
     *     - Values should be > 1.0
     *     - default = uniformly (50 / k) + 1, where 50/k is common in LDA libraries and +1 follows
     *       from Asuncion et al. (2009), who recommend a +1 adjustment for EM.
     *  - Online
     *     - Values should be >= 0
     *     - default = uniformly (1.0 / k), following the implementation from
     *       [[https://github.com/Blei-Lab/onlineldavb]].
     * @group param
     */
    

HOMEWORK: Try to find the default value for TopicConcentration.

// convert ML vectors into MLlib vectors
val new_lda_countVector_mllib = new_lda_countVector.map { case (id, vector) => (id, org.apache.spark.mllib.linalg.Vectors.fromML(vector)) }.rdd

// Create LDA model with stopwords refiltered
val new_ldaModel = new_lda.run(new_lda_countVector_mllib)
new_lda_countVector_mllib: org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.linalg.Vector)] = MapPartitionsRDD[7912] at rdd at command-1805207615647675:2 new_ldaModel: org.apache.spark.mllib.clustering.LDAModel = org.apache.spark.mllib.clustering.LocalLDAModel@1f7a447f
val topicIndices = new_ldaModel.describeTopics(maxTermsPerTopic = 5)
val vocabList = vectorizer.vocabulary
val topics = topicIndices.map { case (terms, termWeights) =>
  terms.map(vocabList(_)).zip(termWeights)
}
println(s"$numTopics topics:")
topics.zipWithIndex.foreach { case (topic, i) =>
  println(s"TOPIC $i")
  topic.foreach { case (term, weight) => println(s"$term\t$weight") }
  println(s"==========")
}
20 topics: TOPIC 0 israeli 0.0012682775352141438 israel 0.0012334266012252358 lebanese 0.0012331470362663087 moral 0.0010217957072046744 villages 8.124497941173431E-4 ========== TOPIC 1 files 6.083532042392117E-4 ozonehole 5.979170539482378E-4 entries 5.879548658639909E-4 file 5.706131358599424E-4 anthony 5.576251166903767E-4 ========== TOPIC 2 sequence 0.0013318016027935183 frank 0.0012874484883529601 protein 0.0012228654485564386 molecular 0.0010710052205935403 biology 9.784475687066357E-4 ========== TOPIC 3 food 8.707413259996776E-4 absolute 5.653607201998066E-4 bible 4.7492295210548567E-4 believe 4.080945020960394E-4 people 3.843085112490988E-4 ========== TOPIC 4 windows 9.124016944246823E-4 time 6.193706174198874E-4 card 5.849305068481676E-4 rendering 5.514611376379115E-4 problem 5.440257463307185E-4 ========== TOPIC 5 temperature 5.845737910804102E-4 company 5.748008941928583E-4 pettefar 4.801991541532276E-4 battery 4.581058351595934E-4 nick 4.2380072462009055E-4 ========== TOPIC 6 cubs 0.0022230164981559128 suck 0.0014596915175869088 jake 7.269034745460029E-4 bony 6.698453043080173E-4 bony1 6.515092909910678E-4 ========== TOPIC 7 people 0.005539091431192967 time 0.0036917115495561 good 0.003256420703496491 windows 0.0031405377368962963 used 0.002628979268884021 ========== TOPIC 8 simms 5.926347685770616E-4 paul 5.653392409102178E-4 people 4.4425316705052297E-4 disc 4.256295198010862E-4 512k 3.891396378274092E-4 ========== TOPIC 9 people 5.891624367782062E-4 gregg 5.376789532292662E-4 jaeger 5.19939219676995E-4 attendance 4.572989608847126E-4 gretzky 4.522975039422793E-4 ========== TOPIC 10 right 5.773985518374905E-4 armenian 4.8186346642645325E-4 used 4.7424120967366957E-4 left 4.049523298750746E-4 jesus 4.029939637348875E-4 ========== TOPIC 11 turkish 0.0023615202482913412 armenians 0.002336996332944892 police 0.0020605522990715494 government 0.002049748444864232 people 0.0018470072411200834 ========== TOPIC 12 picture 0.0020933245021566753 period 0.0012925613357139829 power 9.474696404874255E-4 play 9.348138314316739E-4 boys 6.232448160203388E-4 ========== TOPIC 13 pitt 7.732278946121311E-4 people 4.3503652608725517E-4 tony 4.1542912327723465E-4 morgan 4.112304631294204E-4 disk 3.969272403272078E-4 ========== TOPIC 14 russotto 5.30595525883793E-4 security 4.229098665604208E-4 people 4.11208108402967E-4 work 3.9215975135275345E-4 arranged 3.858475996065931E-4 ========== TOPIC 15 good 0.0014711684655765544 people 8.977926660638134E-4 gatech 8.673672979665585E-4 time 7.425238802010879E-4 mike 7.292301744075631E-4 ========== TOPIC 16 working 5.720696450380848E-4 venus 5.091162320627994E-4 cards 5.020946184701743E-4 space 4.798594755904161E-4 work 4.058760720700143E-4 ========== TOPIC 17 space 0.002495598593157907 venus 0.0023088328007000664 candida 0.002284781260918398 mission 0.0022154135982700826 launch 0.00214785575861375 ========== TOPIC 18 file 4.046959211891645E-4 denning 3.9116187064786875E-4 basketball 3.6358367375394283E-4 georgetown 3.363865146807065E-4 espn 3.2984107399099957E-4 ========== TOPIC 19 lost 5.830232695299712E-4 idle 5.456807860943806E-4 time 4.094774426224454E-4 baseball 3.9732072142373433E-4 church 3.946847164628468E-4 ========== topicIndices: Array[(Array[Int], Array[Double])] = Array((Array(444, 178, 1775, 449, 1576),Array(0.0012682775352141438, 0.0012334266012252358, 0.0012331470362663087, 0.0010217957072046744, 8.124497941173431E-4)), (Array(54, 3312, 622, 27, 1837),Array(6.083532042392117E-4, 5.979170539482378E-4, 5.879548658639909E-4, 5.706131358599424E-4, 5.576251166903767E-4)), (Array(1410, 314, 2128, 1478, 1476),Array(0.0013318016027935183, 0.0012874484883529601, 0.0012228654485564386, 0.0010710052205935403, 9.784475687066357E-4)), (Array(497, 1242, 229, 15, 0),Array(8.707413259996776E-4, 5.653607201998066E-4, 4.7492295210548567E-4, 4.080945020960394E-4, 3.843085112490988E-4)), (Array(4, 1, 81, 1683, 8),Array(9.124016944246823E-4, 6.193706174198874E-4, 5.849305068481676E-4, 5.514611376379115E-4, 5.440257463307185E-4)), (Array(1341, 252, 3779, 1849, 1323),Array(5.845737910804102E-4, 5.748008941928583E-4, 4.801991541532276E-4, 4.581058351595934E-4, 4.2380072462009055E-4)), (Array(698, 1057, 1589, 3159, 3177),Array(0.0022230164981559128, 0.0014596915175869088, 7.269034745460029E-4, 6.698453043080173E-4, 6.515092909910678E-4)), (Array(0, 1, 2, 4, 3),Array(0.005539091431192967, 0.0036917115495561, 0.003256420703496491, 0.0031405377368962963, 0.002628979268884021)), (Array(861, 113, 0, 3083, 4453),Array(5.926347685770616E-4, 5.653392409102178E-4, 4.4425316705052297E-4, 4.256295198010862E-4, 3.891396378274092E-4)), (Array(0, 2463, 2751, 2319, 2768),Array(5.891624367782062E-4, 5.376789532292662E-4, 5.19939219676995E-4, 4.572989608847126E-4, 4.522975039422793E-4)), (Array(7, 585, 3, 162, 114),Array(5.773985518374905E-4, 4.8186346642645325E-4, 4.7424120967366957E-4, 4.049523298750746E-4, 4.029939637348875E-4)), (Array(312, 478, 289, 30, 0),Array(0.0023615202482913412, 0.002336996332944892, 0.0020605522990715494, 0.002049748444864232, 0.0018470072411200834)), (Array(464, 441, 28, 127, 1850),Array(0.0020933245021566753, 0.0012925613357139829, 9.474696404874255E-4, 9.348138314316739E-4, 6.232448160203388E-4)), (Array(1627, 0, 1362, 2593, 119),Array(7.732278946121311E-4, 4.3503652608725517E-4, 4.1542912327723465E-4, 4.112304631294204E-4, 3.969272403272078E-4)), (Array(2921, 237, 0, 6, 4903),Array(5.30595525883793E-4, 4.229098665604208E-4, 4.11208108402967E-4, 3.9215975135275345E-4, 3.858475996065931E-4)), (Array(2, 0, 1220, 1, 179),Array(0.0014711684655765544, 8.977926660638134E-4, 8.673672979665585E-4, 7.425238802010879E-4, 7.292301744075631E-4)), (Array(170, 443, 469, 26, 6),Array(5.720696450380848E-4, 5.091162320627994E-4, 5.020946184701743E-4, 4.798594755904161E-4, 4.058760720700143E-4)), (Array(26, 443, 484, 535, 287),Array(0.002495598593157907, 0.0023088328007000664, 0.002284781260918398, 0.0022154135982700826, 0.00214785575861375)), (Array(27, 2497, 3553, 4304, 2278),Array(4.046959211891645E-4, 3.9116187064786875E-4, 3.6358367375394283E-4, 3.363865146807065E-4, 3.2984107399099957E-4)), (Array(516, 2592, 1, 532, 132),Array(5.830232695299712E-4, 5.456807860943806E-4, 4.094774426224454E-4, 3.9732072142373433E-4, 3.946847164628468E-4))) vocabList: Array[String] = Array(people, time, good, used, windows, want, work, right, problem, need, really, image, data, information, better, believe, using, software, years, year, mail, sure, point, drive, program, available, space, file, power, help, government, things, question, doesn, number, case, world, look, read, version, come, thing, different, long, best, jpeg, fact, university, probably, real, didn, course, state, true, files, high, possible, actually, 1993, list, game, little, news, group, david, send, wrong, based, graphics, support, able, place, called, free, john, subject, post, reason, color, great, second, card, public, having, email, info, following, start, hard, science, example, says, means, code, evidence, person, note, maybe, president, heard, general, mean, problems, quite, source, systems, life, price, standard, order, window, access, claim, paul, jesus, getting, looking, trying, control, disk, seen, simply, times, book, team, local, chip, play, encryption, idea, truth, given, church, issue, research, opinions, wrote, images, large, display, makes, remember, thought, doing, national, format, away, nasa, human, home, change, small, saying, interested, current, mark, area, internet, today, word, original, agree, left, memory, works, microsoft, machine, instead, hardware, kind, working, request, higher, sort, programs, questions, money, later, israel, mike, guess, hand, pretty, include, netcom, address, cause, matter, technology, uiuc, speed, wire, video, type, days, server, view, usually, april, earth, package, open, told, christian, stuff, unless, similar, important, size, major, house, provide, known, faith, ground, rights, michael, phone, body, center, including, health, american, apple, feel, cost, text, user, lines, bible, answer, care, copy, wouldn, understand, check, anybody, security, mind, live, certainly, started, running, message, mouse, level, network, women, study, clinton, making, position, company, came, groups, board, screen, white, common, talking, single, special, quality, black, wiring, test, likely, christians, monitor, nice, effect, light, members, medical, posted, uucp, hope, sources, certain, clear, difference, cars, write, canada, fine, hear, press, launch, build, police, love, history, couple, situation, books, particular, words, jewish, specific, sense, model, religion, anti, stop, posting, unix, talk, private, discussion, school, contact, cable, turkish, keys, frank, built, consider, service, sound, features, legal, taking, simple, comes, reference, argument, tools, children, short, night, jews, applications, clipper, device, application, comments, scsi, process, theory, objective, force, doubt, tried, self, experience, games, early, usenet, expect, steve, needed, tape, uses, interesting, killed, station, exactly, easy, death, value, turn, manager, needs, correct, according, amiga, ones, response, wanted, shuttle, language, states, drug, james, considered, reading, strong, koresh, insurance, personal, term, goes, result, future, taken, past, form, opinion, especially, religious, sorry, mentioned, rules, hell, written, various, author, guns, drivers, went, country, design, plus, happy, society, longer, gets, latest, results, analysis, haven, asked, main, section, laws, previous, cases, york, parts, aren, advance, weapons, christ, mode, required, input, looks, week, accept, community, washington, option, series, circuit, disease, robert, fast, numbers, head, exist, andrew, period, range, venus, israeli, macintosh, driver, office, offer, moral, allow, organization, involved, toronto, clock, players, department, runs, values, months, choice, half, colors, knows, picture, sell, brian, object, took, cards, includes, federal, hockey, individual, wasn, currently, suggest, dave, armenians, takes, protect, follow, americans, directly, candida, title, policy, total, devices, happened, statement, present, purpose, assume, close, recently, equipment, food, require, reasons, scientific, christianity, happen, users, media, provides, deal, wants, city, george, goal, couldn, bike, save, shall, dead, lost, action, speak, road, uunet, terms, batf, court, condition, easily, league, complete, engineering, obviously, details, completely, baseball, california, voice, mission, useful, lead, disclaimer, output, water, algorithm, clearly, administration, ways, compatible, international, sent, rest, responsible, pass, hours, digital, business, appreciated, issues, freedom, kill, project, deleted, companies, coming, operating, average, processing, context, story, figure, error, fans, season, face, port, carry, events, appropriate, leave, berkeley, trade, lower, player, king, page, convert, environment, armenian, political, points, basis, final, requires, heart, addition, performance, building, difficult, site, sale, suppose, related, stanford, resolution, field, willing, volume, actual, apply, knowledge, designed, explain, anonymous, supposed, directory, claims, worth, orbit, lots, basic, defense, advice, commercial, sounds, entries, limited, changes, wonder, suspect, radio, turkey, neutral, forget, wait, necessary, programming, reply, enforcement, inside, family, ability, handle, young, included, signal, homosexuality, natural, morality, finally, land, russian, paper, month, greek, friend, installed, create, thinking, printer, shot, services, understanding, population, hold, break, interface, comment, normal, eric, homosexual, setting, formats, names, peter, machines, report, east, supply, comp, percent, avoid, product, lives, colorado, communications, room, escrow, types, secure, arab, logic, miles, reasonable, rutgers, multiple, gary, soon, agencies, developed, recent, cubs, library, peace, expensive, cheap, cancer, million, allowed, physics, suggestions, doctor, caused, supported, technical, happens, event, looked, obvious, gives, soviet, street, mention, release, outside, table, print, mass, return, radar, archive, chance, install, treatment, bitnet, generally, development, friends, folks, choose, entire, weeks, united, social, wish, smith, trouble, child, straight, learn, supports, behavior, ideas, morning, muslim, member, diet, electrical, illegal, exists, requests, jack, areas, concept, meaning, reality, drives, appear, provided, studies, motif, attempt, possibly, west, answers, asking, pick, practice, engine, worked, stand, bring, thank, unit, remove, near, compound, errors, false, belief, continue, middle, changed, decided, modem, bits, existence, henry, trust, congress, extra, safe, facts, loss, yeah, contains, guys, particularly, arguments, proper, class, manual, frame, command, drugs, stupid, wide, nature, institute, armenia, constitution, thread, pages, function, andy, attack, fonts, privacy, aware, operations, heat, worse, appears, distribution, knew, effective, edge, division, shouldn, wall, approach, speaking, independent, unfortunately, hands, conference, crime, indiana, modern, ethernet, solution, turks, civil, bought, 1992, 1990, compression, safety, serial, added, shows, letter, cramer, faster, simms, operation, arms, internal, germany, gave, record, wondering, virginia, floppy, appreciate, blue, plastic, regular, writing, allows, abortion, utility, hello, rate, blood, spend, views, articles, actions, font, additional, method, described, concerned, scott, played, stated, kids, atheism, surface, base, step, decision, switch, tool, playing, giving, attitude, quote, keith, cover, levels, considering, highly, resources, north, military, connected, buying, places, capability, products, costs, patients, controller, fair, late, prevent, rule, western, poor, brought, functions, received, account, creation, watch, cwru, majority, forward, student, released, driving, authority, committee, protection, richard, boston, quick, calls, chips, valid, hate, shell, excellent, countries, market, necessarily, created, wires, america, apparently, turned, complex, fairly, minutes, murder, boxes, lord, keyboard, properly, plan, moon, arabs, arizona, visual, absolutely, notice, produce, sold, panel, dangerous, killing, begin, property, damage, electronics, living, failed, acts, tests, nation, intelligence, islam, vote, rangers, effort, options, greatly, holy, review, shareware, larry) topics: Array[Array[(String, Double)]] = Array(Array((israeli,0.0012682775352141438), (israel,0.0012334266012252358), (lebanese,0.0012331470362663087), (moral,0.0010217957072046744), (villages,8.124497941173431E-4)), Array((files,6.083532042392117E-4), (ozonehole,5.979170539482378E-4), (entries,5.879548658639909E-4), (file,5.706131358599424E-4), (anthony,5.576251166903767E-4)), Array((sequence,0.0013318016027935183), (frank,0.0012874484883529601), (protein,0.0012228654485564386), (molecular,0.0010710052205935403), (biology,9.784475687066357E-4)), Array((food,8.707413259996776E-4), (absolute,5.653607201998066E-4), (bible,4.7492295210548567E-4), (believe,4.080945020960394E-4), (people,3.843085112490988E-4)), Array((windows,9.124016944246823E-4), (time,6.193706174198874E-4), (card,5.849305068481676E-4), (rendering,5.514611376379115E-4), (problem,5.440257463307185E-4)), Array((temperature,5.845737910804102E-4), (company,5.748008941928583E-4), (pettefar,4.801991541532276E-4), (battery,4.581058351595934E-4), (nick,4.2380072462009055E-4)), Array((cubs,0.0022230164981559128), (suck,0.0014596915175869088), (jake,7.269034745460029E-4), (bony,6.698453043080173E-4), (bony1,6.515092909910678E-4)), Array((people,0.005539091431192967), (time,0.0036917115495561), (good,0.003256420703496491), (windows,0.0031405377368962963), (used,0.002628979268884021)), Array((simms,5.926347685770616E-4), (paul,5.653392409102178E-4), (people,4.4425316705052297E-4), (disc,4.256295198010862E-4), (512k,3.891396378274092E-4)), Array((people,5.891624367782062E-4), (gregg,5.376789532292662E-4), (jaeger,5.19939219676995E-4), (attendance,4.572989608847126E-4), (gretzky,4.522975039422793E-4)), Array((right,5.773985518374905E-4), (armenian,4.8186346642645325E-4), (used,4.7424120967366957E-4), (left,4.049523298750746E-4), (jesus,4.029939637348875E-4)), Array((turkish,0.0023615202482913412), (armenians,0.002336996332944892), (police,0.0020605522990715494), (government,0.002049748444864232), (people,0.0018470072411200834)), Array((picture,0.0020933245021566753), (period,0.0012925613357139829), (power,9.474696404874255E-4), (play,9.348138314316739E-4), (boys,6.232448160203388E-4)), Array((pitt,7.732278946121311E-4), (people,4.3503652608725517E-4), (tony,4.1542912327723465E-4), (morgan,4.112304631294204E-4), (disk,3.969272403272078E-4)), Array((russotto,5.30595525883793E-4), (security,4.229098665604208E-4), (people,4.11208108402967E-4), (work,3.9215975135275345E-4), (arranged,3.858475996065931E-4)), Array((good,0.0014711684655765544), (people,8.977926660638134E-4), (gatech,8.673672979665585E-4), (time,7.425238802010879E-4), (mike,7.292301744075631E-4)), Array((working,5.720696450380848E-4), (venus,5.091162320627994E-4), (cards,5.020946184701743E-4), (space,4.798594755904161E-4), (work,4.058760720700143E-4)), Array((space,0.002495598593157907), (venus,0.0023088328007000664), (candida,0.002284781260918398), (mission,0.0022154135982700826), (launch,0.00214785575861375)), Array((file,4.046959211891645E-4), (denning,3.9116187064786875E-4), (basketball,3.6358367375394283E-4), (georgetown,3.363865146807065E-4), (espn,3.2984107399099957E-4)), Array((lost,5.830232695299712E-4), (idle,5.456807860943806E-4), (time,4.094774426224454E-4), (baseball,3.9732072142373433E-4), (church,3.946847164628468E-4)))

We managed to get better results here. We can easily infer that topic 3 is about space, topic 7 is about religion, etc.

==========
TOPIC 3
station    0.0022184815200582244
launch    0.0020621309179376145
shuttle    0.0019305627762549198
space    0.0017600147075534092
redesign    0.0014972130065346592
==========
TOPIC 7
people    0.0038165245379908675
church    0.0036902650900400543
jesus    0.0029942866750178893
paul    0.0026144777524277044
bible    0.0020476251853453016
==========

Step 9. Create LDA model with Expectation Maximization

Let's try creating an LDA model with Expectation Maximization on the data that has been refiltered for additional stopwords. We will also increase MaxIterations here to 100 to see if that improves results.

import org.apache.spark.mllib.clustering.EMLDAOptimizer

// Set LDA parameters
val em_lda = new LDA()
.setOptimizer(new EMLDAOptimizer())
.setK(numTopics)
.setMaxIterations(100)
.setDocConcentration(-1) // use default values
.setTopicConcentration(-1) // use default values
import org.apache.spark.mllib.clustering.EMLDAOptimizer em_lda: org.apache.spark.mllib.clustering.LDA = org.apache.spark.mllib.clustering.LDA@1ca56e52
val em_ldaModel = em_lda.run(new_lda_countVector_mllib)
em_ldaModel: org.apache.spark.mllib.clustering.LDAModel = org.apache.spark.mllib.clustering.DistributedLDAModel@6e698dd6

Note that the EMLDAOptimizer produces a DistributedLDAModel, which stores not only the inferred topics but also the full training corpus and topic distributions for each document in the training corpus.

val topicIndices = em_ldaModel.describeTopics(maxTermsPerTopic = 5)
topicIndices: Array[(Array[Int], Array[Double])] = Array((Array(0, 178, 444, 297, 354),Array(0.01752505038953989, 0.015012774928968905, 0.008814933415522707, 0.008141649707018096, 0.007906218310657713)), (Array(123, 443, 26, 726, 294),Array(0.011867696558681725, 0.010687276050923861, 0.009040843685617382, 0.0075228865623441655, 0.006675910504553334)), (Array(47, 102, 255, 20, 157),Array(0.014985766432210718, 0.012287189548478161, 0.010969893949852868, 0.010444586166034284, 0.010367417738701503)), (Array(114, 132, 213, 229, 203),Array(0.01449583280942609, 0.01242586502676794, 0.0101360085462943, 0.009102888033371566, 0.009009101715161442)), (Array(302, 0, 484, 8, 349),Array(0.009877278560469278, 0.009680996618978702, 0.009115033183173074, 0.008834283578952261, 0.008500508881330103)), (Array(59, 65, 153, 171, 20),Array(0.017508093502452186, 0.016552675879800884, 0.014047104056430462, 0.013885709401051908, 0.013278653314425648)), (Array(24, 85, 93, 84, 104),Array(0.018988634089288345, 0.01368277720476286, 0.011245696329104049, 0.010768131662118808, 0.010139683322127871)), (Array(134, 372, 273, 18, 221),Array(0.00930155878276897, 0.008792899458241422, 0.008270271605089537, 0.008044178942778759, 0.007917181853950715)), (Array(0, 130, 53, 66, 2),Array(0.013531923610972975, 0.011057486467983364, 0.009101917895671245, 0.008823515211060527, 0.008521430306029728)), (Array(4, 23, 81, 17, 110),Array(0.03893670826645084, 0.023437661829218952, 0.018573896604275872, 0.014945785424002746, 0.014646919443061486)), (Array(0, 30, 52, 249, 7),Array(0.020509341560030914, 0.015755699954310105, 0.009894493582038144, 0.008096889602560069, 0.008042679111450535)), (Array(26, 147, 287, 355, 369),Array(0.02020255295027804, 0.014833102725563393, 0.012758156661562998, 0.010799181289599445, 0.010172267365747301)), (Array(124, 60, 179, 61, 472),Array(0.015045835455065649, 0.01256263952282713, 0.011136225723554929, 0.010103891826195354, 0.009592871348169633)), (Array(19, 2, 60, 127, 14),Array(0.016013832119148336, 0.015676295241325515, 0.015103999409219688, 0.011513063994991753, 0.010812580477065412)), (Array(189, 116, 281, 10, 512),Array(0.012301820581107062, 0.011766319633537829, 0.010884720454484836, 0.009728387347087072, 0.009062029042401385)), (Array(12, 11, 25, 17, 326),Array(0.02024301839425212, 0.019615560037913822, 0.011034781446150338, 0.008762363283594195, 0.006388659543460347)), (Array(128, 126, 30, 188, 313),Array(0.018359610202482144, 0.01631238429319963, 0.012327542606174332, 0.01100039536236385, 0.010929970885815312)), (Array(98, 32, 5, 50, 1),Array(0.017105316947600386, 0.016499567837837802, 0.012832762125743222, 0.012010268170216528, 0.011679691208445088)), (Array(3, 191, 264, 75, 214),Array(0.014563309337766079, 0.014357849998468886, 0.01222345778091846, 0.011972086769461172, 0.011519857061795619)), (Array(45, 27, 11, 54, 78),Array(0.028246137021162587, 0.02317299778436637, 0.019972886275115946, 0.01596784774345554, 0.012658665615719175)))
val vocabList = vectorizer.vocabulary
vocabList: Array[String] = Array(people, time, good, used, windows, want, work, right, problem, need, really, image, data, information, better, believe, using, software, years, year, mail, sure, point, drive, program, available, space, file, power, help, government, things, question, doesn, number, case, world, look, read, version, come, thing, different, long, best, jpeg, fact, university, probably, real, didn, course, state, true, files, high, possible, actually, 1993, list, game, little, news, group, david, send, wrong, based, graphics, support, able, place, called, free, john, subject, post, reason, color, great, second, card, public, having, email, info, following, start, hard, science, example, says, means, code, evidence, person, note, maybe, president, heard, general, mean, problems, quite, source, systems, life, price, standard, order, window, access, claim, paul, jesus, getting, looking, trying, control, disk, seen, simply, times, book, team, local, chip, play, encryption, idea, truth, given, church, issue, research, opinions, wrote, images, large, display, makes, remember, thought, doing, national, format, away, nasa, human, home, change, small, saying, interested, current, mark, area, internet, today, word, original, agree, left, memory, works, microsoft, machine, instead, hardware, kind, working, request, higher, sort, programs, questions, money, later, israel, mike, guess, hand, pretty, include, netcom, address, cause, matter, technology, uiuc, speed, wire, video, type, days, server, view, usually, april, earth, package, open, told, christian, stuff, unless, similar, important, size, major, house, provide, known, faith, ground, rights, michael, phone, body, center, including, health, american, apple, feel, cost, text, user, lines, bible, answer, care, copy, wouldn, understand, check, anybody, security, mind, live, certainly, started, running, message, mouse, level, network, women, study, clinton, making, position, company, came, groups, board, screen, white, common, talking, single, special, quality, black, wiring, test, likely, christians, monitor, nice, effect, light, members, medical, posted, uucp, hope, sources, certain, clear, difference, cars, write, canada, fine, hear, press, launch, build, police, love, history, couple, situation, books, particular, words, jewish, specific, sense, model, religion, anti, stop, posting, unix, talk, private, discussion, school, contact, cable, turkish, keys, frank, built, consider, service, sound, features, legal, taking, simple, comes, reference, argument, tools, children, short, night, jews, applications, clipper, device, application, comments, scsi, process, theory, objective, force, doubt, tried, self, experience, games, early, usenet, expect, steve, needed, tape, uses, interesting, killed, station, exactly, easy, death, value, turn, manager, needs, correct, according, amiga, ones, response, wanted, shuttle, language, states, drug, james, considered, reading, strong, koresh, insurance, personal, term, goes, result, future, taken, past, form, opinion, especially, religious, sorry, mentioned, rules, hell, written, various, author, guns, drivers, went, country, design, plus, happy, society, longer, gets, latest, results, analysis, haven, asked, main, section, laws, previous, cases, york, parts, aren, advance, weapons, christ, mode, required, input, looks, week, accept, community, washington, option, series, circuit, disease, robert, fast, numbers, head, exist, andrew, period, range, venus, israeli, macintosh, driver, office, offer, moral, allow, organization, involved, toronto, clock, players, department, runs, values, months, choice, half, colors, knows, picture, sell, brian, object, took, cards, includes, federal, hockey, individual, wasn, currently, suggest, dave, armenians, takes, protect, follow, americans, directly, candida, title, policy, total, devices, happened, statement, present, purpose, assume, close, recently, equipment, food, require, reasons, scientific, christianity, happen, users, media, provides, deal, wants, city, george, goal, couldn, bike, save, shall, dead, lost, action, speak, road, uunet, terms, batf, court, condition, easily, league, complete, engineering, obviously, details, completely, baseball, california, voice, mission, useful, lead, disclaimer, output, water, algorithm, clearly, administration, ways, compatible, international, sent, rest, responsible, pass, hours, digital, business, appreciated, issues, freedom, kill, project, deleted, companies, coming, operating, average, processing, context, story, figure, error, fans, season, face, port, carry, events, appropriate, leave, berkeley, trade, lower, player, king, page, convert, environment, armenian, political, points, basis, final, requires, heart, addition, performance, building, difficult, site, sale, suppose, related, stanford, resolution, field, willing, volume, actual, apply, knowledge, designed, explain, anonymous, supposed, directory, claims, worth, orbit, lots, basic, defense, advice, commercial, sounds, entries, limited, changes, wonder, suspect, radio, turkey, neutral, forget, wait, necessary, programming, reply, enforcement, inside, family, ability, handle, young, included, signal, homosexuality, natural, morality, finally, land, russian, paper, month, greek, friend, installed, create, thinking, printer, shot, services, understanding, population, hold, break, interface, comment, normal, eric, homosexual, setting, formats, names, peter, machines, report, east, supply, comp, percent, avoid, product, lives, colorado, communications, room, escrow, types, secure, arab, logic, miles, reasonable, rutgers, multiple, gary, soon, agencies, developed, recent, cubs, library, peace, expensive, cheap, cancer, million, allowed, physics, suggestions, doctor, caused, supported, technical, happens, event, looked, obvious, gives, soviet, street, mention, release, outside, table, print, mass, return, radar, archive, chance, install, treatment, bitnet, generally, development, friends, folks, choose, entire, weeks, united, social, wish, smith, trouble, child, straight, learn, supports, behavior, ideas, morning, muslim, member, diet, electrical, illegal, exists, requests, jack, areas, concept, meaning, reality, drives, appear, provided, studies, motif, attempt, possibly, west, answers, asking, pick, practice, engine, worked, stand, bring, thank, unit, remove, near, compound, errors, false, belief, continue, middle, changed, decided, modem, bits, existence, henry, trust, congress, extra, safe, facts, loss, yeah, contains, guys, particularly, arguments, proper, class, manual, frame, command, drugs, stupid, wide, nature, institute, armenia, constitution, thread, pages, function, andy, attack, fonts, privacy, aware, operations, heat, worse, appears, distribution, knew, effective, edge, division, shouldn, wall, approach, speaking, independent, unfortunately, hands, conference, crime, indiana, modern, ethernet, solution, turks, civil, bought, 1992, 1990, compression, safety, serial, added, shows, letter, cramer, faster, simms, operation, arms, internal, germany, gave, record, wondering, virginia, floppy, appreciate, blue, plastic, regular, writing, allows, abortion, utility, hello, rate, blood, spend, views, articles, actions, font, additional, method, described, concerned, scott, played, stated, kids, atheism, surface, base, step, decision, switch, tool, playing, giving, attitude, quote, keith, cover, levels, considering, highly, resources, north, military, connected, buying, places, capability, products, costs, patients, controller, fair, late, prevent, rule, western, poor, brought, functions, received, account, creation, watch, cwru, majority, forward, student, released, driving, authority, committee, protection, richard, boston, quick, calls, chips, valid, hate, shell, excellent, countries, market, necessarily, created, wires, america, apparently, turned, complex, fairly, minutes, murder, boxes, lord, keyboard, properly, plan, moon, arabs, arizona, visual, absolutely, notice, produce, sold, panel, dangerous, killing, begin, property, damage, electronics, living, failed, acts, tests, nation, intelligence, islam, vote, rangers, effort, options, greatly, holy, review, shareware, larry)
vocabList.size
res32: Int = 6122
val topics = topicIndices.map { case (terms, termWeights) =>
  terms.map(vocabList(_)).zip(termWeights)
}
topics: Array[Array[(String, Double)]] = Array(Array((people,0.01752505038953989), (israel,0.015012774928968905), (israeli,0.008814933415522707), (jewish,0.008141649707018096), (killed,0.007906218310657713)), Array((book,0.011867696558681725), (venus,0.010687276050923861), (space,0.009040843685617382), (radar,0.0075228865623441655), (books,0.006675910504553334)), Array((university,0.014985766432210718), (problems,0.012287189548478161), (board,0.010969893949852868), (mail,0.010444586166034284), (internet,0.010367417738701503)), Array((jesus,0.01449583280942609), (church,0.01242586502676794), (faith,0.0101360085462943), (bible,0.009102888033371566), (christian,0.009009101715161442)), Array((anti,0.009877278560469278), (people,0.009680996618978702), (candida,0.009115033183173074), (problem,0.008834283578952261), (steve,0.008500508881330103)), Array((list,0.017508093502452186), (send,0.016552675879800884), (interested,0.014047104056430462), (request,0.013885709401051908), (mail,0.013278653314425648)), Array((program,0.018988634089288345), (info,0.01368277720476286), (code,0.011245696329104049), (email,0.010768131662118808), (source,0.010139683322127871)), Array((research,0.00930155878276897), (drug,0.008792899458241422), (medical,0.008270271605089537), (years,0.008044178942778759), (health,0.007917181853950715)), Array((people,0.013531923610972975), (truth,0.011057486467983364), (true,0.009101917895671245), (wrong,0.008823515211060527), (good,0.008521430306029728)), Array((windows,0.03893670826645084), (drive,0.023437661829218952), (card,0.018573896604275872), (software,0.014945785424002746), (window,0.014646919443061486)), Array((people,0.020509341560030914), (government,0.015755699954310105), (state,0.009894493582038144), (clinton,0.008096889602560069), (right,0.008042679111450535)), Array((space,0.02020255295027804), (nasa,0.014833102725563393), (launch,0.012758156661562998), (station,0.010799181289599445), (shuttle,0.010172267365747301)), Array((team,0.015045835455065649), (game,0.01256263952282713), (mike,0.011136225723554929), (little,0.010103891826195354), (hockey,0.009592871348169633)), Array((year,0.016013832119148336), (good,0.015676295241325515), (game,0.015103999409219688), (play,0.011513063994991753), (better,0.010812580477065412)), Array((uiuc,0.012301820581107062), (looking,0.011766319633537829), (cars,0.010884720454484836), (really,0.009728387347087072), (bike,0.009062029042401385)), Array((data,0.02024301839425212), (image,0.019615560037913822), (available,0.011034781446150338), (software,0.008762363283594195), (tools,0.006388659543460347)), Array((encryption,0.018359610202482144), (chip,0.01631238429319963), (government,0.012327542606174332), (technology,0.01100039536236385), (keys,0.010929970885815312)), Array((president,0.017105316947600386), (question,0.016499567837837802), (want,0.012832762125743222), (didn,0.012010268170216528), (time,0.011679691208445088)), Array((used,0.014563309337766079), (wire,0.014357849998468886), (wiring,0.01222345778091846), (subject,0.011972086769461172), (ground,0.011519857061795619)), Array((jpeg,0.028246137021162587), (file,0.02317299778436637), (image,0.019972886275115946), (files,0.01596784774345554), (color,0.012658665615719175)))
vocabList(47) // 47 is the index of the term 'university' or the first term in topics - this may change due to randomness in algorithm
res33: String = university

This is just doing it all at once.

val topicIndices = em_ldaModel.describeTopics(maxTermsPerTopic = 5)
val vocabList = vectorizer.vocabulary
val topics = topicIndices.map { case (terms, termWeights) =>
  terms.map(vocabList(_)).zip(termWeights)
}
println(s"$numTopics topics:")
topics.zipWithIndex.foreach { case (topic, i) =>
  println(s"TOPIC $i")
  topic.foreach { case (term, weight) => println(s"$term\t$weight") }
  println(s"==========")
}
20 topics: TOPIC 0 people 0.01752505038953989 israel 0.015012774928968905 israeli 0.008814933415522707 jewish 0.008141649707018096 killed 0.007906218310657713 ========== TOPIC 1 book 0.011867696558681725 venus 0.010687276050923861 space 0.009040843685617382 radar 0.0075228865623441655 books 0.006675910504553334 ========== TOPIC 2 university 0.014985766432210718 problems 0.012287189548478161 board 0.010969893949852868 mail 0.010444586166034284 internet 0.010367417738701503 ========== TOPIC 3 jesus 0.01449583280942609 church 0.01242586502676794 faith 0.0101360085462943 bible 0.009102888033371566 christian 0.009009101715161442 ========== TOPIC 4 anti 0.009877278560469278 people 0.009680996618978702 candida 0.009115033183173074 problem 0.008834283578952261 steve 0.008500508881330103 ========== TOPIC 5 list 0.017508093502452186 send 0.016552675879800884 interested 0.014047104056430462 request 0.013885709401051908 mail 0.013278653314425648 ========== TOPIC 6 program 0.018988634089288345 info 0.01368277720476286 code 0.011245696329104049 email 0.010768131662118808 source 0.010139683322127871 ========== TOPIC 7 research 0.00930155878276897 drug 0.008792899458241422 medical 0.008270271605089537 years 0.008044178942778759 health 0.007917181853950715 ========== TOPIC 8 people 0.013531923610972975 truth 0.011057486467983364 true 0.009101917895671245 wrong 0.008823515211060527 good 0.008521430306029728 ========== TOPIC 9 windows 0.03893670826645084 drive 0.023437661829218952 card 0.018573896604275872 software 0.014945785424002746 window 0.014646919443061486 ========== TOPIC 10 people 0.020509341560030914 government 0.015755699954310105 state 0.009894493582038144 clinton 0.008096889602560069 right 0.008042679111450535 ========== TOPIC 11 space 0.02020255295027804 nasa 0.014833102725563393 launch 0.012758156661562998 station 0.010799181289599445 shuttle 0.010172267365747301 ========== TOPIC 12 team 0.015045835455065649 game 0.01256263952282713 mike 0.011136225723554929 little 0.010103891826195354 hockey 0.009592871348169633 ========== TOPIC 13 year 0.016013832119148336 good 0.015676295241325515 game 0.015103999409219688 play 0.011513063994991753 better 0.010812580477065412 ========== TOPIC 14 uiuc 0.012301820581107062 looking 0.011766319633537829 cars 0.010884720454484836 really 0.009728387347087072 bike 0.009062029042401385 ========== TOPIC 15 data 0.02024301839425212 image 0.019615560037913822 available 0.011034781446150338 software 0.008762363283594195 tools 0.006388659543460347 ========== TOPIC 16 encryption 0.018359610202482144 chip 0.01631238429319963 government 0.012327542606174332 technology 0.01100039536236385 keys 0.010929970885815312 ========== TOPIC 17 president 0.017105316947600386 question 0.016499567837837802 want 0.012832762125743222 didn 0.012010268170216528 time 0.011679691208445088 ========== TOPIC 18 used 0.014563309337766079 wire 0.014357849998468886 wiring 0.01222345778091846 subject 0.011972086769461172 ground 0.011519857061795619 ========== TOPIC 19 jpeg 0.028246137021162587 file 0.02317299778436637 image 0.019972886275115946 files 0.01596784774345554 color 0.012658665615719175 ========== topicIndices: Array[(Array[Int], Array[Double])] = Array((Array(0, 178, 444, 297, 354),Array(0.01752505038953989, 0.015012774928968905, 0.008814933415522707, 0.008141649707018096, 0.007906218310657713)), (Array(123, 443, 26, 726, 294),Array(0.011867696558681725, 0.010687276050923861, 0.009040843685617382, 0.0075228865623441655, 0.006675910504553334)), (Array(47, 102, 255, 20, 157),Array(0.014985766432210718, 0.012287189548478161, 0.010969893949852868, 0.010444586166034284, 0.010367417738701503)), (Array(114, 132, 213, 229, 203),Array(0.01449583280942609, 0.01242586502676794, 0.0101360085462943, 0.009102888033371566, 0.009009101715161442)), (Array(302, 0, 484, 8, 349),Array(0.009877278560469278, 0.009680996618978702, 0.009115033183173074, 0.008834283578952261, 0.008500508881330103)), (Array(59, 65, 153, 171, 20),Array(0.017508093502452186, 0.016552675879800884, 0.014047104056430462, 0.013885709401051908, 0.013278653314425648)), (Array(24, 85, 93, 84, 104),Array(0.018988634089288345, 0.01368277720476286, 0.011245696329104049, 0.010768131662118808, 0.010139683322127871)), (Array(134, 372, 273, 18, 221),Array(0.00930155878276897, 0.008792899458241422, 0.008270271605089537, 0.008044178942778759, 0.007917181853950715)), (Array(0, 130, 53, 66, 2),Array(0.013531923610972975, 0.011057486467983364, 0.009101917895671245, 0.008823515211060527, 0.008521430306029728)), (Array(4, 23, 81, 17, 110),Array(0.03893670826645084, 0.023437661829218952, 0.018573896604275872, 0.014945785424002746, 0.014646919443061486)), (Array(0, 30, 52, 249, 7),Array(0.020509341560030914, 0.015755699954310105, 0.009894493582038144, 0.008096889602560069, 0.008042679111450535)), (Array(26, 147, 287, 355, 369),Array(0.02020255295027804, 0.014833102725563393, 0.012758156661562998, 0.010799181289599445, 0.010172267365747301)), (Array(124, 60, 179, 61, 472),Array(0.015045835455065649, 0.01256263952282713, 0.011136225723554929, 0.010103891826195354, 0.009592871348169633)), (Array(19, 2, 60, 127, 14),Array(0.016013832119148336, 0.015676295241325515, 0.015103999409219688, 0.011513063994991753, 0.010812580477065412)), (Array(189, 116, 281, 10, 512),Array(0.012301820581107062, 0.011766319633537829, 0.010884720454484836, 0.009728387347087072, 0.009062029042401385)), (Array(12, 11, 25, 17, 326),Array(0.02024301839425212, 0.019615560037913822, 0.011034781446150338, 0.008762363283594195, 0.006388659543460347)), (Array(128, 126, 30, 188, 313),Array(0.018359610202482144, 0.01631238429319963, 0.012327542606174332, 0.01100039536236385, 0.010929970885815312)), (Array(98, 32, 5, 50, 1),Array(0.017105316947600386, 0.016499567837837802, 0.012832762125743222, 0.012010268170216528, 0.011679691208445088)), (Array(3, 191, 264, 75, 214),Array(0.014563309337766079, 0.014357849998468886, 0.01222345778091846, 0.011972086769461172, 0.011519857061795619)), (Array(45, 27, 11, 54, 78),Array(0.028246137021162587, 0.02317299778436637, 0.019972886275115946, 0.01596784774345554, 0.012658665615719175))) vocabList: Array[String] = Array(people, time, good, used, windows, want, work, right, problem, need, really, image, data, information, better, believe, using, software, years, year, mail, sure, point, drive, program, available, space, file, power, help, government, things, question, doesn, number, case, world, look, read, version, come, thing, different, long, best, jpeg, fact, university, probably, real, didn, course, state, true, files, high, possible, actually, 1993, list, game, little, news, group, david, send, wrong, based, graphics, support, able, place, called, free, john, subject, post, reason, color, great, second, card, public, having, email, info, following, start, hard, science, example, says, means, code, evidence, person, note, maybe, president, heard, general, mean, problems, quite, source, systems, life, price, standard, order, window, access, claim, paul, jesus, getting, looking, trying, control, disk, seen, simply, times, book, team, local, chip, play, encryption, idea, truth, given, church, issue, research, opinions, wrote, images, large, display, makes, remember, thought, doing, national, format, away, nasa, human, home, change, small, saying, interested, current, mark, area, internet, today, word, original, agree, left, memory, works, microsoft, machine, instead, hardware, kind, working, request, higher, sort, programs, questions, money, later, israel, mike, guess, hand, pretty, include, netcom, address, cause, matter, technology, uiuc, speed, wire, video, type, days, server, view, usually, april, earth, package, open, told, christian, stuff, unless, similar, important, size, major, house, provide, known, faith, ground, rights, michael, phone, body, center, including, health, american, apple, feel, cost, text, user, lines, bible, answer, care, copy, wouldn, understand, check, anybody, security, mind, live, certainly, started, running, message, mouse, level, network, women, study, clinton, making, position, company, came, groups, board, screen, white, common, talking, single, special, quality, black, wiring, test, likely, christians, monitor, nice, effect, light, members, medical, posted, uucp, hope, sources, certain, clear, difference, cars, write, canada, fine, hear, press, launch, build, police, love, history, couple, situation, books, particular, words, jewish, specific, sense, model, religion, anti, stop, posting, unix, talk, private, discussion, school, contact, cable, turkish, keys, frank, built, consider, service, sound, features, legal, taking, simple, comes, reference, argument, tools, children, short, night, jews, applications, clipper, device, application, comments, scsi, process, theory, objective, force, doubt, tried, self, experience, games, early, usenet, expect, steve, needed, tape, uses, interesting, killed, station, exactly, easy, death, value, turn, manager, needs, correct, according, amiga, ones, response, wanted, shuttle, language, states, drug, james, considered, reading, strong, koresh, insurance, personal, term, goes, result, future, taken, past, form, opinion, especially, religious, sorry, mentioned, rules, hell, written, various, author, guns, drivers, went, country, design, plus, happy, society, longer, gets, latest, results, analysis, haven, asked, main, section, laws, previous, cases, york, parts, aren, advance, weapons, christ, mode, required, input, looks, week, accept, community, washington, option, series, circuit, disease, robert, fast, numbers, head, exist, andrew, period, range, venus, israeli, macintosh, driver, office, offer, moral, allow, organization, involved, toronto, clock, players, department, runs, values, months, choice, half, colors, knows, picture, sell, brian, object, took, cards, includes, federal, hockey, individual, wasn, currently, suggest, dave, armenians, takes, protect, follow, americans, directly, candida, title, policy, total, devices, happened, statement, present, purpose, assume, close, recently, equipment, food, require, reasons, scientific, christianity, happen, users, media, provides, deal, wants, city, george, goal, couldn, bike, save, shall, dead, lost, action, speak, road, uunet, terms, batf, court, condition, easily, league, complete, engineering, obviously, details, completely, baseball, california, voice, mission, useful, lead, disclaimer, output, water, algorithm, clearly, administration, ways, compatible, international, sent, rest, responsible, pass, hours, digital, business, appreciated, issues, freedom, kill, project, deleted, companies, coming, operating, average, processing, context, story, figure, error, fans, season, face, port, carry, events, appropriate, leave, berkeley, trade, lower, player, king, page, convert, environment, armenian, political, points, basis, final, requires, heart, addition, performance, building, difficult, site, sale, suppose, related, stanford, resolution, field, willing, volume, actual, apply, knowledge, designed, explain, anonymous, supposed, directory, claims, worth, orbit, lots, basic, defense, advice, commercial, sounds, entries, limited, changes, wonder, suspect, radio, turkey, neutral, forget, wait, necessary, programming, reply, enforcement, inside, family, ability, handle, young, included, signal, homosexuality, natural, morality, finally, land, russian, paper, month, greek, friend, installed, create, thinking, printer, shot, services, understanding, population, hold, break, interface, comment, normal, eric, homosexual, setting, formats, names, peter, machines, report, east, supply, comp, percent, avoid, product, lives, colorado, communications, room, escrow, types, secure, arab, logic, miles, reasonable, rutgers, multiple, gary, soon, agencies, developed, recent, cubs, library, peace, expensive, cheap, cancer, million, allowed, physics, suggestions, doctor, caused, supported, technical, happens, event, looked, obvious, gives, soviet, street, mention, release, outside, table, print, mass, return, radar, archive, chance, install, treatment, bitnet, generally, development, friends, folks, choose, entire, weeks, united, social, wish, smith, trouble, child, straight, learn, supports, behavior, ideas, morning, muslim, member, diet, electrical, illegal, exists, requests, jack, areas, concept, meaning, reality, drives, appear, provided, studies, motif, attempt, possibly, west, answers, asking, pick, practice, engine, worked, stand, bring, thank, unit, remove, near, compound, errors, false, belief, continue, middle, changed, decided, modem, bits, existence, henry, trust, congress, extra, safe, facts, loss, yeah, contains, guys, particularly, arguments, proper, class, manual, frame, command, drugs, stupid, wide, nature, institute, armenia, constitution, thread, pages, function, andy, attack, fonts, privacy, aware, operations, heat, worse, appears, distribution, knew, effective, edge, division, shouldn, wall, approach, speaking, independent, unfortunately, hands, conference, crime, indiana, modern, ethernet, solution, turks, civil, bought, 1992, 1990, compression, safety, serial, added, shows, letter, cramer, faster, simms, operation, arms, internal, germany, gave, record, wondering, virginia, floppy, appreciate, blue, plastic, regular, writing, allows, abortion, utility, hello, rate, blood, spend, views, articles, actions, font, additional, method, described, concerned, scott, played, stated, kids, atheism, surface, base, step, decision, switch, tool, playing, giving, attitude, quote, keith, cover, levels, considering, highly, resources, north, military, connected, buying, places, capability, products, costs, patients, controller, fair, late, prevent, rule, western, poor, brought, functions, received, account, creation, watch, cwru, majority, forward, student, released, driving, authority, committee, protection, richard, boston, quick, calls, chips, valid, hate, shell, excellent, countries, market, necessarily, created, wires, america, apparently, turned, complex, fairly, minutes, murder, boxes, lord, keyboard, properly, plan, moon, arabs, arizona, visual, absolutely, notice, produce, sold, panel, dangerous, killing, begin, property, damage, electronics, living, failed, acts, tests, nation, intelligence, islam, vote, rangers, effort, options, greatly, holy, review, shareware, larry) topics: Array[Array[(String, Double)]] = Array(Array((people,0.01752505038953989), (israel,0.015012774928968905), (israeli,0.008814933415522707), (jewish,0.008141649707018096), (killed,0.007906218310657713)), Array((book,0.011867696558681725), (venus,0.010687276050923861), (space,0.009040843685617382), (radar,0.0075228865623441655), (books,0.006675910504553334)), Array((university,0.014985766432210718), (problems,0.012287189548478161), (board,0.010969893949852868), (mail,0.010444586166034284), (internet,0.010367417738701503)), Array((jesus,0.01449583280942609), (church,0.01242586502676794), (faith,0.0101360085462943), (bible,0.009102888033371566), (christian,0.009009101715161442)), Array((anti,0.009877278560469278), (people,0.009680996618978702), (candida,0.009115033183173074), (problem,0.008834283578952261), (steve,0.008500508881330103)), Array((list,0.017508093502452186), (send,0.016552675879800884), (interested,0.014047104056430462), (request,0.013885709401051908), (mail,0.013278653314425648)), Array((program,0.018988634089288345), (info,0.01368277720476286), (code,0.011245696329104049), (email,0.010768131662118808), (source,0.010139683322127871)), Array((research,0.00930155878276897), (drug,0.008792899458241422), (medical,0.008270271605089537), (years,0.008044178942778759), (health,0.007917181853950715)), Array((people,0.013531923610972975), (truth,0.011057486467983364), (true,0.009101917895671245), (wrong,0.008823515211060527), (good,0.008521430306029728)), Array((windows,0.03893670826645084), (drive,0.023437661829218952), (card,0.018573896604275872), (software,0.014945785424002746), (window,0.014646919443061486)), Array((people,0.020509341560030914), (government,0.015755699954310105), (state,0.009894493582038144), (clinton,0.008096889602560069), (right,0.008042679111450535)), Array((space,0.02020255295027804), (nasa,0.014833102725563393), (launch,0.012758156661562998), (station,0.010799181289599445), (shuttle,0.010172267365747301)), Array((team,0.015045835455065649), (game,0.01256263952282713), (mike,0.011136225723554929), (little,0.010103891826195354), (hockey,0.009592871348169633)), Array((year,0.016013832119148336), (good,0.015676295241325515), (game,0.015103999409219688), (play,0.011513063994991753), (better,0.010812580477065412)), Array((uiuc,0.012301820581107062), (looking,0.011766319633537829), (cars,0.010884720454484836), (really,0.009728387347087072), (bike,0.009062029042401385)), Array((data,0.02024301839425212), (image,0.019615560037913822), (available,0.011034781446150338), (software,0.008762363283594195), (tools,0.006388659543460347)), Array((encryption,0.018359610202482144), (chip,0.01631238429319963), (government,0.012327542606174332), (technology,0.01100039536236385), (keys,0.010929970885815312)), Array((president,0.017105316947600386), (question,0.016499567837837802), (want,0.012832762125743222), (didn,0.012010268170216528), (time,0.011679691208445088)), Array((used,0.014563309337766079), (wire,0.014357849998468886), (wiring,0.01222345778091846), (subject,0.011972086769461172), (ground,0.011519857061795619)), Array((jpeg,0.028246137021162587), (file,0.02317299778436637), (image,0.019972886275115946), (files,0.01596784774345554), (color,0.012658665615719175)))

We've managed to get some good results here. For example, we can easily infer that Topic 0 is about computers, Topic 8 is about space, etc.

We still get some ambiguous results like Topic 17.

To improve our results further, we could employ some of the below methods:

  • Refilter data for additional data-specific stopwords
  • Use Stemming or Lemmatization to preprocess data
  • Experiment with a smaller number of topics, since some of these topics in the 20 Newsgroups are pretty similar
  • Increase model's MaxIterations

Visualize Results

We will try visualizing the results obtained from the EM LDA model with a d3 bubble chart.

// Zip topic terms with topic IDs
val termArray = topics.zipWithIndex
termArray: Array[(Array[(String, Double)], Int)] = Array((Array((people,0.01752505038953989), (israel,0.015012774928968905), (israeli,0.008814933415522707), (jewish,0.008141649707018096), (killed,0.007906218310657713)),0), (Array((book,0.011867696558681725), (venus,0.010687276050923861), (space,0.009040843685617382), (radar,0.0075228865623441655), (books,0.006675910504553334)),1), (Array((university,0.014985766432210718), (problems,0.012287189548478161), (board,0.010969893949852868), (mail,0.010444586166034284), (internet,0.010367417738701503)),2), (Array((jesus,0.01449583280942609), (church,0.01242586502676794), (faith,0.0101360085462943), (bible,0.009102888033371566), (christian,0.009009101715161442)),3), (Array((anti,0.009877278560469278), (people,0.009680996618978702), (candida,0.009115033183173074), (problem,0.008834283578952261), (steve,0.008500508881330103)),4), (Array((list,0.017508093502452186), (send,0.016552675879800884), (interested,0.014047104056430462), (request,0.013885709401051908), (mail,0.013278653314425648)),5), (Array((program,0.018988634089288345), (info,0.01368277720476286), (code,0.011245696329104049), (email,0.010768131662118808), (source,0.010139683322127871)),6), (Array((research,0.00930155878276897), (drug,0.008792899458241422), (medical,0.008270271605089537), (years,0.008044178942778759), (health,0.007917181853950715)),7), (Array((people,0.013531923610972975), (truth,0.011057486467983364), (true,0.009101917895671245), (wrong,0.008823515211060527), (good,0.008521430306029728)),8), (Array((windows,0.03893670826645084), (drive,0.023437661829218952), (card,0.018573896604275872), (software,0.014945785424002746), (window,0.014646919443061486)),9), (Array((people,0.020509341560030914), (government,0.015755699954310105), (state,0.009894493582038144), (clinton,0.008096889602560069), (right,0.008042679111450535)),10), (Array((space,0.02020255295027804), (nasa,0.014833102725563393), (launch,0.012758156661562998), (station,0.010799181289599445), (shuttle,0.010172267365747301)),11), (Array((team,0.015045835455065649), (game,0.01256263952282713), (mike,0.011136225723554929), (little,0.010103891826195354), (hockey,0.009592871348169633)),12), (Array((year,0.016013832119148336), (good,0.015676295241325515), (game,0.015103999409219688), (play,0.011513063994991753), (better,0.010812580477065412)),13), (Array((uiuc,0.012301820581107062), (looking,0.011766319633537829), (cars,0.010884720454484836), (really,0.009728387347087072), (bike,0.009062029042401385)),14), (Array((data,0.02024301839425212), (image,0.019615560037913822), (available,0.011034781446150338), (software,0.008762363283594195), (tools,0.006388659543460347)),15), (Array((encryption,0.018359610202482144), (chip,0.01631238429319963), (government,0.012327542606174332), (technology,0.01100039536236385), (keys,0.010929970885815312)),16), (Array((president,0.017105316947600386), (question,0.016499567837837802), (want,0.012832762125743222), (didn,0.012010268170216528), (time,0.011679691208445088)),17), (Array((used,0.014563309337766079), (wire,0.014357849998468886), (wiring,0.01222345778091846), (subject,0.011972086769461172), (ground,0.011519857061795619)),18), (Array((jpeg,0.028246137021162587), (file,0.02317299778436637), (image,0.019972886275115946), (files,0.01596784774345554), (color,0.012658665615719175)),19))
// Transform data into the form (term, probability, topicId)
val termRDD = sc.parallelize(termArray)
val termRDD2 =termRDD.flatMap( (x: (Array[(String, Double)], Int)) => {
  val arrayOfTuple = x._1
  val topicId = x._2
  arrayOfTuple.map(el => (el._1, el._2, topicId))
})
termRDD: org.apache.spark.rdd.RDD[(Array[(String, Double)], Int)] = ParallelCollectionRDD[10375] at parallelize at command-1805207615647694:2 termRDD2: org.apache.spark.rdd.RDD[(String, Double, Int)] = MapPartitionsRDD[10376] at flatMap at command-1805207615647694:3
// Create DF with proper column names
val termDF = termRDD2.toDF.withColumnRenamed("_1", "term").withColumnRenamed("_2", "probability").withColumnRenamed("_3", "topicId")
termDF: org.apache.spark.sql.DataFrame = [term: string, probability: double ... 1 more field]
display(termDF)
people0.017525050389539890
israel0.0150127749289689050
israeli0.0088149334155227070
jewish0.0081416497070180960
killed0.0079062183106577130
book0.0118676965586817251
venus0.0106872760509238611
space0.0090408436856173821
radar0.00752288656234416551
books0.0066759105045533341
university0.0149857664322107182
problems0.0122871895484781612
board0.0109698939498528682
mail0.0104445861660342842
internet0.0103674177387015032
jesus0.014495832809426093
church0.012425865026767943
faith0.01013600854629433
bible0.0091028880333715663
christian0.0090091017151614423
anti0.0098772785604692784
people0.0096809966189787024
candida0.0091150331831730744
problem0.0088342835789522614
steve0.0085005088813301034
list0.0175080935024521865
send0.0165526758798008845
interested0.0140471040564304625
request0.0138857094010519085
mail0.0132786533144256485
program0.0189886340892883456
info0.013682777204762866
code0.0112456963291040496
email0.0107681316621188086
source0.0101396833221278716
research0.009301558782768977
drug0.0087928994582414227
medical0.0082702716050895377
years0.0080441789427787597
health0.0079171818539507157
people0.0135319236109729758
truth0.0110574864679833648
true0.0091019178956712458
wrong0.0088235152110605278
good0.0085214303060297288
windows0.038936708266450849
drive0.0234376618292189529
card0.0185738966042758729
software0.0149457854240027469
window0.0146469194430614869
people0.02050934156003091410
government0.01575569995431010510
state0.00989449358203814410
clinton0.00809688960256006910
right0.00804267911145053510
space0.0202025529502780411
nasa0.01483310272556339311
launch0.01275815666156299811
station0.01079918128959944511
shuttle0.01017226736574730111
team0.01504583545506564912
game0.0125626395228271312
mike0.01113622572355492912
little0.01010389182619535412
hockey0.00959287134816963312
year0.01601383211914833613
good0.01567629524132551513
game0.01510399940921968813
play0.01151306399499175313
better0.01081258047706541213
uiuc0.01230182058110706214
looking0.01176631963353782914
cars0.01088472045448483614
really0.00972838734708707214
bike0.00906202904240138514
data0.0202430183942521215
image0.01961556003791382215
available0.01103478144615033815
software0.00876236328359419515
tools0.00638865954346034715
encryption0.01835961020248214416
chip0.0163123842931996316
government0.01232754260617433216
technology0.0110003953623638516
keys0.01092997088581531216
president0.01710531694760038617
question0.01649956783783780217
want0.01283276212574322217
didn0.01201026817021652817
time0.01167969120844508817
used0.01456330933776607918
wire0.01435784999846888618
wiring0.0122234577809184618
subject0.01197208676946117218
ground0.01151985706179561918
jpeg0.02824613702116258719
file0.0231729977843663719
image0.01997288627511594619
files0.0159678477434555419
color0.01265866561571917519

We will convert the DataFrame into a JSON format, which will be passed into d3.

// Create JSON data
val rawJson = termDF.toJSON.collect().mkString(",\n")
rawJson: String = {"term":"people","probability":0.01752505038953989,"topicId":0}, {"term":"israel","probability":0.015012774928968905,"topicId":0}, {"term":"israeli","probability":0.008814933415522707,"topicId":0}, {"term":"jewish","probability":0.008141649707018096,"topicId":0}, {"term":"killed","probability":0.007906218310657713,"topicId":0}, {"term":"book","probability":0.011867696558681725,"topicId":1}, {"term":"venus","probability":0.010687276050923861,"topicId":1}, {"term":"space","probability":0.009040843685617382,"topicId":1}, {"term":"radar","probability":0.0075228865623441655,"topicId":1}, {"term":"books","probability":0.006675910504553334,"topicId":1}, {"term":"university","probability":0.014985766432210718,"topicId":2}, {"term":"problems","probability":0.012287189548478161,"topicId":2}, {"term":"board","probability":0.010969893949852868,"topicId":2}, {"term":"mail","probability":0.010444586166034284,"topicId":2}, {"term":"internet","probability":0.010367417738701503,"topicId":2}, {"term":"jesus","probability":0.01449583280942609,"topicId":3}, {"term":"church","probability":0.01242586502676794,"topicId":3}, {"term":"faith","probability":0.0101360085462943,"topicId":3}, {"term":"bible","probability":0.009102888033371566,"topicId":3}, {"term":"christian","probability":0.009009101715161442,"topicId":3}, {"term":"anti","probability":0.009877278560469278,"topicId":4}, {"term":"people","probability":0.009680996618978702,"topicId":4}, {"term":"candida","probability":0.009115033183173074,"topicId":4}, {"term":"problem","probability":0.008834283578952261,"topicId":4}, {"term":"steve","probability":0.008500508881330103,"topicId":4}, {"term":"list","probability":0.017508093502452186,"topicId":5}, {"term":"send","probability":0.016552675879800884,"topicId":5}, {"term":"interested","probability":0.014047104056430462,"topicId":5}, {"term":"request","probability":0.013885709401051908,"topicId":5}, {"term":"mail","probability":0.013278653314425648,"topicId":5}, {"term":"program","probability":0.018988634089288345,"topicId":6}, {"term":"info","probability":0.01368277720476286,"topicId":6}, {"term":"code","probability":0.011245696329104049,"topicId":6}, {"term":"email","probability":0.010768131662118808,"topicId":6}, {"term":"source","probability":0.010139683322127871,"topicId":6}, {"term":"research","probability":0.00930155878276897,"topicId":7}, {"term":"drug","probability":0.008792899458241422,"topicId":7}, {"term":"medical","probability":0.008270271605089537,"topicId":7}, {"term":"years","probability":0.008044178942778759,"topicId":7}, {"term":"health","probability":0.007917181853950715,"topicId":7}, {"term":"people","probability":0.013531923610972975,"topicId":8}, {"term":"truth","probability":0.011057486467983364,"topicId":8}, {"term":"true","probability":0.009101917895671245,"topicId":8}, {"term":"wrong","probability":0.008823515211060527,"topicId":8}, {"term":"good","probability":0.008521430306029728,"topicId":8}, {"term":"windows","probability":0.03893670826645084,"topicId":9}, {"term":"drive","probability":0.023437661829218952,"topicId":9}, {"term":"card","probability":0.018573896604275872,"topicId":9}, {"term":"software","probability":0.014945785424002746,"topicId":9}, {"term":"window","probability":0.014646919443061486,"topicId":9}, {"term":"people","probability":0.020509341560030914,"topicId":10}, {"term":"government","probability":0.015755699954310105,"topicId":10}, {"term":"state","probability":0.009894493582038144,"topicId":10}, {"term":"clinton","probability":0.008096889602560069,"topicId":10}, {"term":"right","probability":0.008042679111450535,"topicId":10}, {"term":"space","probability":0.02020255295027804,"topicId":11}, {"term":"nasa","probability":0.014833102725563393,"topicId":11}, {"term":"launch","probability":0.012758156661562998,"topicId":11}, {"term":"station","probability":0.010799181289599445,"topicId":11}, {"term":"shuttle","probability":0.010172267365747301,"topicId":11}, {"term":"team","probability":0.015045835455065649,"topicId":12}, {"term":"game","probability":0.01256263952282713,"topicId":12}, {"term":"mike","probability":0.011136225723554929,"topicId":12}, {"term":"little","probability":0.010103891826195354,"topicId":12}, {"term":"hockey","probability":0.009592871348169633,"topicId":12}, {"term":"year","probability":0.016013832119148336,"topicId":13}, {"term":"good","probability":0.015676295241325515,"topicId":13}, {"term":"game","probability":0.015103999409219688,"topicId":13}, {"term":"play","probability":0.011513063994991753,"topicId":13}, {"term":"better","probability":0.010812580477065412,"topicId":13}, {"term":"uiuc","probability":0.012301820581107062,"topicId":14}, {"term":"looking","probability":0.011766319633537829,"topicId":14}, {"term":"cars","probability":0.010884720454484836,"topicId":14}, {"term":"really","probability":0.009728387347087072,"topicId":14}, {"term":"bike","probability":0.009062029042401385,"topicId":14}, {"term":"data","probability":0.02024301839425212,"topicId":15}, {"term":"image","probability":0.019615560037913822,"topicId":15}, {"term":"available","probability":0.011034781446150338,"topicId":15}, {"term":"software","probability":0.008762363283594195,"topicId":15}, {"term":"tools","probability":0.006388659543460347,"topicId":15}, {"term":"encryption","probability":0.018359610202482144,"topicId":16}, {"term":"chip","probability":0.01631238429319963,"topicId":16}, {"term":"government","probability":0.012327542606174332,"topicId":16}, {"term":"technology","probability":0.01100039536236385,"topicId":16}, {"term":"keys","probability":0.010929970885815312,"topicId":16}, {"term":"president","probability":0.017105316947600386,"topicId":17}, {"term":"question","probability":0.016499567837837802,"topicId":17}, {"term":"want","probability":0.012832762125743222,"topicId":17}, {"term":"didn","probability":0.012010268170216528,"topicId":17}, {"term":"time","probability":0.011679691208445088,"topicId":17}, {"term":"used","probability":0.014563309337766079,"topicId":18}, {"term":"wire","probability":0.014357849998468886,"topicId":18}, {"term":"wiring","probability":0.01222345778091846,"topicId":18}, {"term":"subject","probability":0.011972086769461172,"topicId":18}, {"term":"ground","probability":0.011519857061795619,"topicId":18}, {"term":"jpeg","probability":0.028246137021162587,"topicId":19}, {"term":"file","probability":0.02317299778436637,"topicId":19}, {"term":"image","probability":0.019972886275115946,"topicId":19}, {"term":"files","probability":0.01596784774345554,"topicId":19}, {"term":"color","probability":0.012658665615719175,"topicId":19}

We are now ready to use D3 on the rawJson data.

Show code

You try!

NOW or Later as HOMEWORK

  1. Try to do the same process for the State of the Union Addresses dataset from Week1. As a first step, first locate where that data is... Go to week1 and try to see if each SoU can be treated as a document for topic modeling and whether there is temporal clustering of SoU's within the same topic.

  2. Try to improve the tuning by elaborating the pipeline with stemming, lemmatization, etc in this news-group dataset (if you want to do a project based on this, perhaps). You can also parse the input to bring in the newsgroup id's from the directories (consider exploiting the file names in the wholeTextFiles method) as this will let you explore how well your unsupervised algorithm is doing relative to the known newsgroups each document falls in (note you generally won't have the luxury of knowing the topic labels for typical datasets in the unsupervised topic modeling domain).

  3. Try to parse the data closer to the clean dataset available in /databricks-datasets/news20.binary/* and walk through the following notebook (but in Scala!):

%fs ls /databricks-datasets/news20.binary/data-001
dbfs:/databricks-datasets/news20.binary/data-001/test/test/0
dbfs:/databricks-datasets/news20.binary/data-001/training/training/0

Step 1. Downloading and Loading Data into DBFS

you don't have to do the download in databricks if above cell has contents in /databricks-datasets/news20.binary/data-001

Here are the steps taken for downloading and saving data to the distributed file system. Uncomment them for repeating this process on your databricks cluster or for downloading a new source of data.

//%sh wget http://kdd.ics.uci.edu/databases/20newsgroups/mini_newsgroups.tar.gz -O /tmp/newsgroups.tar.gz

Untar the file into the /tmp/ folder.

//%sh tar xvfz /tmp/newsgroups.tar.gz -C /tmp/

The below cell takes about 10mins to run.

NOTE: It is slow partly because each file is small and we are facing the 'small files problem' with distributed file systems that need meta-data for each file. If the file name is not needed then it may be better to create one large stream of the contents of all the files into dbfs. We leave this as it is to show what happens when we upload a dataset of lots of little files into dbfs.

//%fs cp -r file:/tmp/mini_newsgroups dbfs:/datasets/mini_newsgroups
display(dbutils.fs.ls("dbfs:/datasets/mini_newsgroups"))
dbfs:/datasets/mini_newsgroups/alt.atheism/alt.atheism/0
dbfs:/datasets/mini_newsgroups/comp.graphics/comp.graphics/0
dbfs:/datasets/mini_newsgroups/comp.os.ms-windows.misc/comp.os.ms-windows.misc/0
dbfs:/datasets/mini_newsgroups/comp.sys.ibm.pc.hardware/comp.sys.ibm.pc.hardware/0
dbfs:/datasets/mini_newsgroups/comp.sys.mac.hardware/comp.sys.mac.hardware/0
dbfs:/datasets/mini_newsgroups/comp.windows.x/comp.windows.x/0
dbfs:/datasets/mini_newsgroups/misc.forsale/misc.forsale/0
dbfs:/datasets/mini_newsgroups/rec.autos/rec.autos/0
dbfs:/datasets/mini_newsgroups/rec.motorcycles/rec.motorcycles/0
dbfs:/datasets/mini_newsgroups/rec.sport.baseball/rec.sport.baseball/0
dbfs:/datasets/mini_newsgroups/rec.sport.hockey/rec.sport.hockey/0
dbfs:/datasets/mini_newsgroups/sci.crypt/sci.crypt/0
dbfs:/datasets/mini_newsgroups/sci.electronics/sci.electronics/0
dbfs:/datasets/mini_newsgroups/sci.med/sci.med/0
dbfs:/datasets/mini_newsgroups/sci.space/sci.space/0
dbfs:/datasets/mini_newsgroups/soc.religion.christian/soc.religion.christian/0
dbfs:/datasets/mini_newsgroups/talk.politics.guns/talk.politics.guns/0
dbfs:/datasets/mini_newsgroups/talk.politics.mideast/talk.politics.mideast/0
dbfs:/datasets/mini_newsgroups/talk.politics.misc/talk.politics.misc/0
dbfs:/datasets/mini_newsgroups/talk.religion.misc/talk.religion.misc/0