Overview
Text and Sentiment Analysis refers to a body of analytic techniques to evaluate a corpus of text to develop insights about a product, company, service, etc. The corpus could come from a wide range of sources such as user comments from review sites, tweets, or from text available at web sites, 10-K reports, and the like. When humans interpret text, we use our understanding to interpret the contents of the text, and the emotional intent behind various words to infer whether a sentence, document, or corpus has a positive or negative tone, and perhaps whether words and sentences in the corpus express various emotional states (e.g., joy, anger, surprise).
In text analysis, we attempt to understand the contents and meaning of the text by exploring issues such as, what are the underlying themes or topics within a text corpus? What does the text tell us about the author(s)? What topics are trending? What keywords (a contiguous set of words) best characterize the contents of the corpus? These questions are typically easy for humans to answer by reading the text. But most of us can potentially process perhaps 50 pages of text per hour, whereas top text analysis systems can analyze 200,000 pages per hour or more. We can use text analysis to create a summary of the contents of the text corpus, extract “important” words and phrases from the corpus and identify related words and concepts. Important outputs of text analysis are word clouds summarizing the frequency of occurrence of various words (or word combinations, such as bigrams or trigrams), and the co-occurrence patterns of the words which help us understand the overall contents and meaning of the corpus. Other types of analyses include document similarity, namely, the extent to which one or more documents within a corpus are similar to other documents.
In sentiment analyses we attempt to summarize the sentiments and emotions embedded in the text corpus, which are subjective aspects of textual content, and not just their objective contents -- we attempt to detect, extract, and assess value judgments, subjective opinions, and the emotional contents in text data. Data useful for text and sentiment analysis are widely available today and they can provide companies valuable insights for making business decisions. The typical sources of data used for text and sentiment analysis in marketing come from sources such as Facebook and Instagram postings, Tweets, reviews from sites such as Yelp and TripAdvisor, as well as from product/service specific reviews collected by the supplier of the product or service.
In sum, the summaries generated from text and sentiment analyses provide at least a high-level overview of the meaning and emotions embedded within a text corpus.
Getting Started
You can use your own data or use a template preformatted by the Enginius software. Because the analytical models underlying Sentiment Analysis require a specific data format, users with their own data should review the appropriate preformatted template to become familiar with the data structure. The next section explains how to create an easy-to-use template to enter your own data.
Creating a Template
From the Enginius Dashboard, click the Templates dropdown and select Sentiment Analysis to open the dialog box to create a template.
The following dialog box will appear:
Select the options desired for the sentiment analysis and click “Run” to generate the template for entering your data.
Note, the options for each of the data sources are different and users are encouraged to go through the template generation process to fully understand the model data requirements.
Data source
- Sentiment Data:
This option requires a data block containing the text to be analyzed. The format of the required data is shown below, where the first column contains a unique id for each row, Verbatim data is the text to be analyzed for each respondent. The optional Date and Rating fields specify the date of the corresponding text and a rating (from a scale of 1-to-n) where the respondent has supplied a rating to accompany their text.
- Twitter Data:
When analyzing Twitter data, no data template is needed, and you will receive an error message if you try to generate a template. To run Sentiment analysis with Twitter data, simply select the Sentiment Analysis icon to open the model and select the Twitter handle or content you would like to analyze (see Run Analysis section in tutorial for more details).
- Web pages:
The template generation dialog box allows you to specify the number of websites to be analyzed and whether to include custom stop words.
After clicking “Run”, the following template will be generated:
Entering Your Data
*If using Twitter data as your data source, you may skip directly to the Run Analysis section of the tutorial as there is not additional data that needs to be entered.
For Sentiment or Web pages data, it is recommended to use the Template feature or at least review the template format to ensure that your data is the correct format. You may enter your data in Enginius in one of three ways:
- Enter your data directly into the Enginius online template.
- Copy and paste your data from Excel (or other data source) to Enginius.
- Download the Enginius template to Excel (using the Save function in Enginius), fill out the data in Excel (make sure to adhere to the template format), and then upload the Excel file back to Enginius (using the Load function in Enginius).
|
For the remainder of this tutorial, we will use the “OfficeStar: Sentiment Analysis” data set that loads when you open the Enginius Sentiment tutorial. |
Run Analysis
When you click on the Run Sentiment Analysis button (or Sentiment Analysis icon), the analysis setup window will display. The set-up window will vary depending on your Data source.
Data source
You need to first select the appropriate Data source for the data you are analyzing. Once selected, the remainder of the setup window will adjust to obtain the required data appropriate for analyzing that type of data.
Sentiment data
When choosing Sentiment data as your data source, the Sentiment data analysis has the Data options as shown above. You will need to select the Enginius data block that contains the Sentiment data and then identify the Verbatim (text to be analyzed), Date (if included), and Rating (if included) columns in that data block.
Twitter data
When the Twitter data option is selected, the data options are either a Twitter handle (e.g. @wsj) or Twitter keyword(s) (e.g. stock market). The software retrieves up to 1,000 corresponding tweets directly from Twitter over the past week to analyze. Note: only one twitter handle may be searched at a time.
Web pages
When choosing Web pages as your data source, the Sentiment data analysis has the data options as shown below. You will need to select the Enginius data block that contains the list of Web pages that you want to analyze.
Stop words
Typically, text contains many words, called “stop words” (e.g., “and,” “the,” “of,” etc.), that do not perform a significant lexical function. It is best to remove these words to facilitate interpretation of the text. Enginius offers the option of removing a long list of stop words (851), a short list (174), or retaining all the words for analysis (see Appendix B for a list of stop words included in the Enginius lists).
Enginius also offers the option of including additional custom stop words that can facilitate interpreting documents in a specific domain. You can incorporate these custom words by generating a custom stop word list for a specific application. Often, an initial word cloud generated from the data could provide hints about possible custom stop words, e.g., product names such as “Pepsi” or popular words that do not add any meaning in a given context (e.g., app, game, iPhone). The custom stop words must be in the following format of column header, index column, and stop word column:
Note that custom stop works
|
Be aware that removing stop words can affect the interpretation of your text significantly. For example, the Short and Long List of stop words included with Enginius includes words such as “didn’t” and “not” which can radically change the meaning of a phrase:
It is recommended to test your analysis both with and without stop words to see how your analysis performs.
|
Advanced options
As shown in the dialog box below, Enginius also provides advanced analysis options which you can access by clicking on the “Advanced” checkbox in the lower left corner. Checking “Advanced” will result in two additional options: “Word co-occurrence analysis and RAKE” and “Topic Model”. Instead of single word analysis as in a word cloud, we can analyze relationships between words. Specifically, we can explore co-occurrence of pairs of words within individual documents or within the entire corpus. The word co-occurrence analysis is then supplemented with RAKE (Rapid Automatic Keyword Extraction) that allows for exploration of the occurrence of more than two contiguous words, which allows extraction of important longer sequences of words that help to characterize the contents of the documents. Further details about these methods are provided in the Appendix.
After selecting all the options, click the Run button found at the bottom of the Sentiment analysis setup window to begin the analysis. By default, the report will output as a web page.
|
Reminder: Clicking the globe icon beside the “Run” option will allow you to choose a different output format for the report.
|
You will see a pop-up indicating the progress of the sentiment analysis. Your report will output in the format chosen (Microsoft, PDF, or Zip format may automatically download to your hard drive).
|
Sentiment analysis is a time-consuming process and may take several minutes for a report to be generated.
|
Interpreting the Results
The report generated by sentiment analysis contains several sections, depending on the options chosen. The results described below were generated with these model settings:
Word cloud
The first section presents the word cloud developed from the data. The word cloud is a graphical representation of the words found in a corpus, where the size and location of a word in the cloud diagram indicates the prominence of the word in the text. A word cloud is not the output of a statistical model per se. Rather it is an analysis of the input text for frequency of occurrence of various words after preprocessing the text by using such techniques as removing stop words (typically common words such as the, and, how, but, etc. which are unlikely to add meaning to a sentence), and stemming to reduce the number of words in a document by getting to the stem or root (e.g., buyer, buyers, buying, etc. could be put into a stem word “buy.”).
Enginius outputs two versions of word clouds, the first based on the stems (roots) of words and the second based on the un-stemmed words in the text, as in the example below:
Sentiment analysis
If the input text corpus contains information about the date of a specific review/document and an associated rating, then the Enginius output includes additional summaries as shown below (for data set consisting of user reviews and ratings of products posted on various dates):
Valence analysis
The next section shows the overall valence of the text corpus, a popular application of sentiment analysis. Here, we develop an overall valence or polarity of a text corpus. Valence is the technical term for the “subjective inclination” of a document, measured along a Positive/Neutral/Negative continuum. Several dictionaries or lexicons are publicly available (e.g., AFINN, NRC, Bing) which contain the valence weights associated with common words and phrases. This type of analysis can be applied to determine the polarity of a sentence, document, or the whole corpus. The Enginius output is for the entire corpus. (If you need the polarity for a subset of the corpus, you can re-run the analysis by including only the selected text).
Two additional outputs are useful in analyzing the results, a Valence histogram and a Valence pie chart:
Also included in the sentiment output is the valence evolution, which shows the evolution over time if Date was included in the analysis:
A valence word cloud is also produced, as shown below.
Emotion analysis
The next section of the sentiment analysis report depicts the emotions expressed within the text corpus.
Similar to earlier sections of the report, a histogram and pie chart of Emotion are also produced.
Using emotions associated with various words as determined from publicly available emotion lexicons (for common emotions such as happy, excited, discouraged, disappointed), we can generate a word cloud of emotions associated with a corpus (shown below).
Results from Advanced analysis options
Word co-occurrences between adjacent words in a corpus
The first output in this section is a word co-occurrence graph that summarizes the relative frequencies of pairs of words that are adjacent to each other when we consider the entire corpus as a single document. For every word, adjacent words provide a context. The relative frequencies of pairs of words are compiled into a “co-occurrence matrix” and the important elements from this matrix are displayed as a graph. We only include for display words that are likely to most useful for interpretation, namely, nouns, verbs, or adjectives. In the output here, we see that “good selection” “great selection” “ink cartridge,” “helpful staff” etc. go together often relative to other word pairs in the corpus. The strength of a connection is highlighted by the depth of the color of the line connecting two words.
Word co-occurrences between adjacent words in a corpus
The next graph shows word co-occurrence when we consider word co-occurrences within sentences, i.e., the pair of words can occur anywhere within a sentence in a document, i.e., the context is broader than the adjacent word restriction for co-occurrence in the graph we considered above (see Appendix A for additional details). Again, we display only nouns, verbs, and adjectives. Although there are substantial similarities between the two graphs for this data set, the analysis at the sentence level could alter the strengths of the connections between words.
RAKE Analysis (Rapid Automatic Keyword Extraction)
The next set of results summarizes output from RAKE (Rapid Automatic Keyword Extraction) (see Appendix A for additional details). The small data set used in this tutorial is not sufficient to generate an interesting output. We basically see the same results that we get from the sentence-level co-occurrence graph, showing that two bigrams (“good selection” and “great selection”) are the most important keywords in this text corpus.
With larger text corpora, we will get more informative RAKE analysis outputs, as in the example below, which shows keywords which are informative. Note: the example below was not generated from our OfficeStar data but is a representative output from RAKE analysis.
A related piece of information that Enginius provides is distribution of parts of speech tags in the corpus. Here nouns, verbs, and adjectives (which are the only words shown in the co-occurrence graphs above) make up about 60% of all the words in this corpus.
Topic Model
The final set of results from the advanced analysis is the topic model, which refers to a class of machine learning methods for analyzing and classifying text into broad thematic categories to help us understand the underlying structure of the contents of a text corpus (see Appendix A for additional details). Topic modeling is a data summarization tool for text, like what factor analysis is for summarizing numeric data. Each topic represents a theme (a grouping of words), but the analysis does not provide you a label for each topic. It is up to you to interpret the theme captured by each topic by examining the most important words in a topic. Although the example data here is too small to develop a good topic model that is interpretable, we can nevertheless see some patterns. For example, the first topic is likely about the characteristics of a retailer – carries computers, is in town, can walk to shop, good price, etc. It also contains some discrepant themes, such as “rude” and “slow.” In Enginius, you can pick up to 15 topics, and for each topic, it reports the top 15 words. Although each topic is likely to contain a different set of words, there may be some words that may appear in more than one topic, indicating their importance in several topics. Note also that the model is specified on the stemmed text, which adds to the challenges of interpreting the topics.
The output also includes two metrics to help you assess the adequacy of the topic model: Mean topic coherence and Mean topic exclusivity. Good models will have a mean topic coherence close to zero and mean topic exclusivity greater than 1. The model here has a high exclusivity score, suggesting the topics refer to different themes. There is no clear benchmark to evaluate the computed value of mean topic coherence. Roughly speaking, with 15 words in a topic (which is what is built into Enginius), a mean topic coherence score that is smaller than -240 (i.e., a larger negative number) suggests that, on average, any pair of words in a topic is likely to occur together in less than 10% of the documents. A score that is smaller than -315 suggests that, on average, any pair of words in a topic is likely to occur together in less than 5% of the documents and the score is -480 for 1%. On this basis, the topics seem to have an adequate level of coherence. We could consider increasing or decreasing the number of topics to check if that increases mean topic coherence.
Mean topic coherence = -149.367
Coherence is a metric based on the co-occurrence of the top words (say top 15) within a topic in each document of the corpus. For each pair of words in a topic, we compute the log of the probability that a document containing the higher ranked word also contains at least one instance of the lower-ranked word. The overall coherence value for a topic is the sum of scores for each pair of words. A number close to 0 (the highest possible value of this metric) indicates high coherence. We report the average coherence value across the topics.
Mean topic exclusivity = 13.549
Exclusivity, measured using a metric called FREX, captures the extent to which the top words in a topic are exclusive to that topic (i.e., are not as likely to occur in the other topics). The exclusivity score for each top word in a topic is the harmonic mean of two equally-weighted components: (1) the rate at which the word occurs within a topic relative to its rate of occurrence in the other topics, and (2) the frequency (number of times) of a word’s occurrence within a topic relative to its frequency in the other topics. The exclusivity score for a topic is the average of the exclusivity scores of the words in a topic. The computed score is a positive number (> 0), with values substantially greater than 1 indicating a topic’s high exclusivity. We report the average exclusivity value across the topics.
Appendix A
Co-occurrence graphs
In this analysis, we look at two words that co-occur together in the same document (e.g., a single tweet). We specify a sliding window of a given size to indicate how far apart the two words can be with respect to each other – are they just one-word apart (i.e., next to each other), within two-words apart, within three-words apart, etc. This separation context can be specified using a skip-gram value, which for example, is equal to 1 if we restrict the two words to occur next to each other (can occur before or after a focal word). In Enginius, we do two types of analyses, the first with a default value of 1 for the skip-gram, and the second with a “variable skip-gram” that allows co-occurrence anywhere within a sentence. In both cases, the program creates a symmetric co-occurrence matrix specifying how many times in a corpus each pair of words occur together within the specified context (skip-gram). This matrix is processed further to assess the probabilities of co-occurrence of each pair. The words are then plotted on a co-occurrence graph (the words that have higher probabilities of being connected are highlighted graphically with thicker shading). As a default, we plot up to 100 top words on the graph.
Parts-of-speech tagging
POS tagging assigns a part-of-speech to every word in the text considering the context in which the word occurs. The tags include nouns (people, places, or things, including abstractions such as health or beauty), Verbs which refer to actions and processes, adjectives that specify properties of nouns, and adverbs that modify a verb, an adjective, or another adverb. There are, of course, some ambiguities – for example words like park or train could be a verb or a noun, but many of these ambiguities can be resolved by the context in which the word occurs. POS tagging of English words is pretty accurate, with most programs able to get over 97% accuracy.
Across a range of English texts, the percentage of nouns varies from 17% to 25%, verbs from 14 to 17%, prepositions from 9 to 14%, pronouns from 5 to 11%, adverbs 6 to 8% and adjectives 3 to 6% (see, for example, http://infomotions.com/blog/2011/02/forays-into-parts-of-speech/). Lincoln’s Gettysburg address had 18.8% verbs, 15.1% nouns, 10.7% pronouns, 11.4% adverbs, etc. Lincoln’s speech was a bit more “action-oriented” than the average text corpus.
RAKE (Rapid Automatic Keyword Extraction)
RAKE is based on the observation that keywords (a set of contiguous words) rarely contain punctuation or stop words. Thus, a candidate keyword is any set of contiguous words (i.e., 2-gram or bigram for two contiguous words, 3-gram or trigram for three contiguous words, and more generally n-gram) that doesn’t contain phrase delimiters or stop words, i.e., candidate keywords are content-bearing words. Example bigrams are “customer service,” “merry christmas,” and “thank you.” Example trigrams are “new york times,” “miles per hour,” and “my credit card.”
In this sense, RAKE looks for specific sets of words within each document that characterize (i.e., gives meaning) to a document, and thereby provides a more nuanced interpretation of the key contents of a set of documents than is provided by word co-occurrence graphs that only consider pairs of words. The input parameters for RAKE are a list of stop words, a set of phrase delimiters (e.g., a comma or period signifying the beginning or end of a phrase), a set of word delimiters, and the n-gram level for extraction.
Once all possible candidate n-grams in a set of documents are identified, RAKE selects the most relevant ones that characterize the entire corpus. To do this, it first computes a RAKE score for each word, which is equal to its , where degree is the number of times that word co-occurs with another word in another keyword, and frequency is the total number of times that the word occurs overall in the corpus. This ratio favors words that predominantly occur in longer keywords, whereas degree of a word favors those words that occur often and in longer candidate keywords, and frequency favors words that occur frequently regardless of the number of words with which they co-occur. The RAKE score for each keyword is the sum of the RAKE scores of all the words in that keyword. Because of this summation, RAKE scores typically tend to be higher for longer sequences of words. All candidate keywords are rank-ordered from the highest to lowest on their RAKE scores. In Enginius, we set the minimum frequency for words to be included in the rank-ordered list to be at 0.01% of the total word count, and we list up to 20 keywords that contain at most four contiguous words each (4-gram words).
For more information about RAKE, please see Rose, Stuart, Dave Engle, Nick Cramer and Wendy Cowley (2010), “Automatic keyword extraction from individual documents,” in Text Mining: Applications and Theory, edited by Michael W. Berry and Jacob Kogan, John Wiley & Sons. Ltd.
Topic Model
Topic modeling is based on an underlying probability model of a stylized data generating process that helped create the latent (hidden) topics in a set of documents. The stylized data generating process is as follows: (1) For each document, the author selects topics from an unknown (Dirichlet) probability distribution of topics, where the number of topics is fixed beforehand. (2) Likewise, the author selects each word for a given topic using an unknown probability (Dirichlet) distribution over a fixed number of words. A Bayesian model estimates the parameters of the unknown probability distributions such that the estimated probability distributions recover as closely as possible the known set of words in each document.
Here is an example of the stylized process of how two documents focused on two topics, sports and fashion, are crafted by their writers. Let’s say the first document is mostly about sports but with a few components related to fashion, and let’s say the second document is mostly about fashion but with some discussion of sports. For both documents, the writers choose a combination of words that characterize these two topics, namely, when some words appear together, they convey something about sports, and other words when they appear together convey something about fashion. When there are many documents in a corpus, each document will contain a set of topics characterized by various word combinations. Even though each document may contain only a few topics, a text corpus may contain dozens of topics. Topic modeling is designed to leverage this stylized notion of how documents are generated, to identify a few topics and the relatively unique combination of words that pertain to each topic.
To compute mean topic coherence, Enginius uses the following metric called Intrinsic UMass measure based on the empirical conditional log-probability, , where i is a higher-ranked word in a topic than word j. The empirical score for a pair of words in the ordered list of words in the topic is then equal to:
D is a counting function to count the number of documents in which a word occurs. The numerator is a count of the number of documents that contain both words, and the denominator is the count of the number of times the higher-ranked word occurs. The extra 1, a smoothing factor, is to allow for situations where . The topic coherence score is then specified by summing up scores for all pairs of words in the ordered list of the top 15 words in a topic. With 15 words in a topic, and with assuming 10% average overlap of the words across documents, the benchmark value for topic coherence is -241.8, and with 5% average overlap, the benchmark value is -314.6. A computed value of topic coherence that is higher than these values (i.e., closer to 0), could be considered to have good coherence.
To compute mean topic exclusivity, we use the FREX (Frequency, Exclusivity) measure proposed by Bischof and Airoldi (2012). This metric captures the extent to which the top words in a topic are exclusive to that topic (i.e., are not as likely to occur in the other topics). The exclusivity score for each top word in a topic is the harmonic mean of two components for which we assign equal weights as default: (1) the rate at which the word occurs within a topic relative to its rate of occurrence in the other topics, and (2) the frequency (number of times) of a word’s occurrence within a topic relative to its frequency in the other topics. The exclusivity score for a topic is the average of the exclusivity scores of the words in a topic. By default, Enginius considers the top 15 words in a topic. The computed score is a positive number (> 0), with values substantially greater than 1 indicating a topic’s high exclusivity. We report the average exclusivity value across the topics.
Bischof, Jonathan M. and Edoardo M. Airoldi (2012), “Summarizing topical content with word frequency and exclusivity.” In International Conference on Machine Learning, volume 29, Edinburgh, Scotland, UK.
R – Packages and routines used in Enginius
- tm – text manipulation
- stringr – remove graphical characters
- SnowballC – stemming of words
- Wordcloud – Plot word cloud
- RCurl – html download
- XML – html text extraction
- scales – pie chart modifications
- syuzhet – emotion selection
- curl, httr, ROath, httpuv, rtweet – tweet extraction
- topicmodels, textmineR, text2vec – LDA model (topic model)
- textclean, spacy – (removing emogis and replacing with text)
- udpipe – POS, Co-occurrence, and RAKE analysis
Appendix B
Stop Words – Short List
a
about
above
after
again
against
all
am
an
and
any
are
aren't
as
at
be
because
been
before
being
below
between
both
but
by
can't
cannot
could
couldn't
did
didn't
do
does
doesn't
doing
don't
down
during
each
few
for
from
further
had
hadn't
has
hasn't
have
haven't
having
he
he'd
he'll
he's
her
here
here's
hers
herself
him
himself
his
how
how's
i
i'd
i'll
i'm
i've
if
in
into
is
isn't
it
it's
its
itself
let's
me
more
most
mustn't
my
myself
no
nor
not
of
off
on
once
only
or
other
ought
our
ours
ourselves
out
over
own
same
shan't
she
she'd
she'll
she's
should
shouldn't
so
some
such
than
that
that's
the
their
theirs
them
themselves
then
there
there's
these
they
they'd
they'll
they're
they've
this
those
through
to
too
under
until
up
very
was
wasn't
we
we'd
we'll
we're
we've
were
weren't
what
what's
when
when's
where
where's
which
while
who
who's
whom
why
why's
with
won't
would
wouldn't
you
you'd
you'll
you're
you've
your
yours
yourself
yourselves
Stop Words – Long List
able
about
above
abroad
according
accordingly
across
actually
adj
after
afterwards
again
against
ago
ahead
ain't
all
allow
allows
almost
alone
along
alongside
already
also
although
always
am
amid
amidst
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anyways
anywhere
apart
appear
appreciate
appropriate
are
aren't
around
as
a's
aside
ask
asking
associated
at
available
away
awfully
back
backward
backwards
be
became
because
become
becomes
becoming
been
before
beforehand
begin
behind
being
believe
below
beside
besides
best
better
between
beyond
both
brief
but
by
came
can
cannot
cant
can't
caption
cause
causes
certain
certainly
changes
clearly
c'mon
co
co.
com
come
comes
concerning
consequently
consider
considering
contain
containing
contains
corresponding
could
couldn't
course
c's
currently
dare
daren't
definitely
described
despite
did
didn't
different
directly
do
does
doesn't
doing
done
don't
down
downwards
during
each
edu
eg
eight
eighty
either
else
elsewhere
end
ending
enough
entirely
especially
et
etc
even
ever
evermore
every
everybody
everyone
everything
everywhere
ex
exactly
example
except
fairly
far
farther
few
fewer
fifth
first
five
followed
following
follows
for
forever
former
formerly
forth
forward
found
four
from
further
furthermore
get
gets
getting
given
gives
go
goes
going
gone
got
gotten
greetings
had
hadn't
half
happens
hardly
has
hasn't
have
haven't
having
he
he'd
he'll
hello
help
hence
her
here
hereafter
hereby
herein
here's
hereupon
hers
herself
he's
hi
him
himself
his
hither
hopefully
how
howbeit
however
hundred
i'd
ie
if
ignored
i'll
i'm
immediate
in
inasmuch
inc
inc.
indeed
indicate
indicated
indicates
inner
inside
insofar
instead
into
inward
is
isn't
it
it'd
it'll
its
it's
itself
i've
just
k
keep
keeps
kept
know
known
knows
last
lately
later
latter
latterly
least
less
lest
let
let's
like
liked
likely
likewise
little
look
looking
looks
low
lower
ltd
made
mainly
make
makes
many
may
maybe
mayn't
me
mean
meantime
meanwhile
merely
might
mightn't
mine
minus
miss
more
moreover
most
mostly
mr
mrs
much
must
mustn't
my
myself
name
namely
nd
near
nearly
necessary
need
needn't
needs
neither
never
neverf
neverless
nevertheless
new
next
nine
ninety
no
nobody
non
none
nonetheless
noone
no-one
nor
normally
not
nothing
notwithstanding
novel
now
nowhere
obviously
of
off
often
oh
ok
okay
old
on
once
one
ones
one's
only
onto
opposite
or
other
others
otherwise
ought
oughtn't
our
ours
ourselves
out
outside
over
overall
own
particular
particularly
past
per
perhaps
placed
please
plus
possible
presumably
probably
provided
provides
que
quite
qv
rather
rd
re
really
reasonably
recent
recently
regarding
regardless
regards
relatively
respectively
right
round
said
same
saw
say
saying
says
second
secondly
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sensible
sent
serious
seriously
seven
several
shall
shan't
she
she'd
she'll
she's
should
shouldn't
since
six
so
some
somebody
someday
somehow
someone
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specified
specify
specifying
still
sub
such
sup
sure
take
taken
taking
tell
tends
th
than
thank
thanks
thanx
that
that'll
thats
that's
that've
the
their
theirs
them
themselves
then
thence
there
thereafter
thereby
there'd
therefore
therein
there'll
there're
theres
there's
thereupon
there've
these
they
they'd
they'll
they're
they've
thing
things
think
third
thirty
this
thorough
thoroughly
those
though
three
through
throughout
thru
thus
till
to
together
too
took
toward
towards
tried
tries
truly
try
trying
t's
twice
two
un
under
underneath
undoing
unfortunately
unless
unlike
unlikely
until
unto
up
upon
upwards
us
use
used
useful
uses
using
usually
v
value
various
versus
very
via
viz
vs
want
wants
was
wasn't
way
we
we'd
welcome
well
we'll
went
were
we're
weren't
we've
what
whatever
what'll
what's
what've
when
whence
whenever
where
whereafter
whereas
whereby
wherein
where's
whereupon
wherever
whether
which
whichever
while
whilst
whither
who
who'd
whoever
whole
who'll
whom
whomever
who's
whose
why
will
willing
wish
with
within
without
wonder
won't
would
wouldn't
yes
yet
you
you'd
you'll
your
you're
yours
yourself
yourselves
you've
zero
a
how's
i
when's
why's
b
c
d
e
f
g
h
j
l
m
n
o
p
q
r
s
t
u
uucp
w
x
y
z
I
www
amount
bill
bottom
call
computer
con
couldnt
cry
de
describe
detail
due
eleven
empty
fifteen
fifty
fill
find
fire
forty
front
full
give
hasnt
herse
himse
interest
itse”
mill
move
myse”
part
put
show
side
sincere
sixty
system
ten
thick
thin
top
twelve
twenty
abst
accordance
act
added
adopted
affected
affecting
affects
ah
announce
anymore
apparently
approximately
aren
arent
arise
auth
beginning
beginnings
begins
biol
briefly
ca
date
ed
effect
et-al
ff
fix
gave
giving
heres
hes
hid
home
id
im
immediately
importance
important
index
information
invention
itd
keys
kg
km
largely
lets
line
'll
means
mg
million
ml
mug
na
nay
necessarily
nos
noted
obtain
obtained
omitted
ord
owing
page
pages
poorly
possibly
potentially
pp
predominantly
present
previously
primarily
promptly
proud
quickly
ran
readily
ref
refs
related
research
resulted
resulting
results
run
sec
section
shed
shes
showed
shown
showns
shows
significant
significantly
similar
similarly
slightly
somethan
specifically
state
states
stop
strongly
substantially
successfully
sufficiently
suggest
thered
thereof
therere
thereto
theyd
theyre
thou
thoughh
thousand
throug
til
tip
ts
ups
usefully
usefulness
've
vol
vols
wed
whats
wheres
whim
whod
whos
widely
words
world
youd
youre
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article