Let's say you are going to work on a Natural Language Processing (NLP) project and currently collecting a dataset which will be used in that project. The team has decided to assign you the task of collecting the dataset in an optimized manner.
As one can see if we go on to collect each word according to their frequency the storage usage can become quite large. So you came up with the idea of storing the unique words(alphabet's case matters here) with their frequency in a pandas series object, as this data can be used in different processes more accurately.
Let's say you are given the raw data in form of a string, for storing all the words write a program which can take a string as an input and return the unique words and the corresponding frequency in form of a Pandas Series object. The indices of the series should be the unique words and the values should be the frequency of those unique words.
Just complete the function that returns a Series object.Notes
- String contains no special character.
- Always a Non-empty string.
- Case sensitive i.e. He and he should be treated as two different word tokens.
- Series indices are sorted by python inbuilt function.
- The code for taking input and printing the series is already written, you just have to complete the function that returns a series which satisfies all the requirements.
- Optional : Using same strategies you can also look up to approaches and functions to sort the words according to their frequencies.
String with space separated words. (basically a sentence)
space separated words in first line.
space separated values in the second line.
He said he is the king
He he is king said the
1 1 1 1 1 1