Practical-5

Aim: Data Pre-processing and text analytics using Orange.

what is text analytics?

Text Analytics is the process of converting unstructured text data into meaningful data for analysis, to measure customer opinions, product reviews, feedback, to provide search facility, sentimental analysis and entity modeling to support fact based decision making.

what is sentiment analysis?

Sentiment analysis (or opinion mining) uses natural language processing and machine learning to interpret and classify emotions in subjective data. Sentiment analysis is often used in business to detect sentiment in social data, gauge brand reputation, and understand customers. Sentiment analysis – otherwise known as opinion mining – is a much bandied about but often misunderstood term. In essence, it is the process of determining the emotional tone behind a series of words, used to gain an understanding of the the attitudes, opinions and emotions expressed within an online mention.

Why it is useful?

Sentiment analysis is often used in business to detect sentiment in social data, gauge brand reputation, and understand customers. Learn more about how sentiment analysis works, how you can apply it to your data, and how you can get started right away with no-code machine learning software. Sentiment analysis is useful for quickly gaining insights using large volumes of text.

Discretize:

It is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation data by the models.

The Discretize widget discretizes continuous attributes with a selected method.

Randomization:

A method based on chance alone by which study participants are assigned to a treatment group. Randomization minimizes the differences among groups by equally distributing people with particular characteristics among all the trial arms.

Continuization:

It will return a new table in which the discretize attributes are replaced with continuous or removed.

Normalization:

It is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies.

Text / Data Preprocessing with Orange tool:

In the Twitter developer acoount we get the api key and its secret password. we have to give that details in the twitter widget.

Here we have given #DataScience.So,it will fetch the details as per the given queries.

And we have selected language to English so it will fetch only tweet which are in english language. And also we have limit to fetch 100 tweets.

Now after that we have to add preprocess text widget. In which we can have many options from which we can filter out the tweets. We can also add stopping file which can be used to restrict certain tweets.

Now after that we have add a word cloud widget.which Generates the cloud of the words obtain from tweets.

Double click this widget and select Vader. VADER uses a combination of a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative. VADER not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.

We can see the output of sentimental analysis in tweet profiler.

After that we prints out the emotion with box plot graph.Here we can use different visualization graph to represent the emotions. Now in first fig. I have selected variable as author name and in Second fig I have selected variable as emotion. So, its show emotion vs emption graph that most tweet contains joy emotion.

References:

Search This Blog

Yash Radadiya

Practical-5

Continuization:

Comments

Post a Comment