473,545 Members | 2,017 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Lesson 4 (part2) – data management and visualization

nbiswas
149 New Member
Sorting data in a vector is quite easy in R. You simply pass the vector to the sort function and new sorted vector is generated.
The default sort order is ascending, but this can be changed using the argument decreasing=TRUE .
However, in our scenario the data is not in a vector, it is in a data frame. Let's examine how we sort data stored contained in a data frame.
The data frame n.2010 contains three columns: Year, Name and Frequency.
There is no need to keep the Year attribute so we decide to filter our data frame to remove the Year column.
Here is an example using the bracket notation of how we can filter out the Year column by specifying a character vector of column names within the brackets.
If you are familiar with SQL, this task would be equivalent to using projection to specify the columns in the SELECT list for an SQL statement.
We use the head() function to verify that the Year column has been removed. The function shows that the data is still in its original sort order by name.
To find the most popular names we must sort by Frequency in decreasing or descending order.
We can't use the function sort as we did with vectors, but with data frames we will use the order function within our bracket notation.
Here we want the data sorted by decreasing frequency values and all of the columns should be included in our output.
The final expression converts the Factor vector into a simple character vector to determine that the most popular name was: Ethan, followed by Liam, Jacob, Lucas, and Noah.
Whenever we have explored a data set and wish to save it for future reference we can use one of the write() family of functions.
Here we see some examples of exporting the sorted male names from the year 2010 to a comma separated values or csv file and also to a tab delimited file.
In many data frames the row names are not really used to identify data so we have used the row.names=FALSE option to avoid writing the row name information in our output file.
So far we have worked with data within data frames. Now let's visualize trends in our data using Data Visualizations or graphics.
In this task we would like to provide an interactive experience so that the user can provide a baby name and then view a single visualization of its popularity from 1917 to 2010.

Let's get started.
So far we have imported data from external files, but we have not obtained input from users before.
We can use the readline() function to read data from text files, websites, or in this example we will use the function to obtain data from the standard input device or keyboard. We specify the prompt with the optional argument.
To visualize trends in baby names we would decide to use a scatter plot visualization based on frequencies over the years 1917 to 2010.
To do this we create a new data frame which is a subset of frequencies for the name provided by the user. The built-in toupper() function is used to convert the input character string to all uppercase so we can match the data stored in the data.frame. We also only require the two variables of Year and Frequency. If you are familiar with SQL you can imagine how this query would look using an SQL SELECT statement, but in R we do things a bit different.
We then sort the data within the data frame and call the plot() function.
The plot function uses the values provided to generate a visualization.
Plot is an example of a generic function in R as it can accept many different data formats and it will attempt to create a visualization.
Let's take a look at the visualization.
When the data is provided to the plot function as a data frame of 2 variables, plot will create a scatterplot using circles for each observation.
Notice how default labels and scales were generated.
The x and y axis labels are determined based on the defined column names.
We will examine a few additional commonly used options for the plot() function next.
Most visualizations should have a title. The main argument is used to provide a title for a visualization.
The label names can be specified for each axis and the method of plotting can be changed from the default to lines, stairs, or other representations .
Let's look at another example with our baby name data frame.
This plot uses the same data, but it uses the paste function to construct a more precise title for our chart and the type of plot is changes from using circles to using stairs.
R has been able to help us spot trends in our data set across 2 variables. It is fairly obvious from this data that the first name or given name of "GRANT" was most popular in the early 1960s and from our earlier analysis we know that it was definitely not one of the top 5 names of newborns in 2010.
This plot uses the same data, but it uses the paste function to construct a more precise title for our chart and the type of plot is changes from using circles to using stairs.
R has been able to help us spot trends in our data set across 2 variables. It is fairly obvious from this data that the first name or given name of "GRANT" was most popular in the early 1960s and from our earlier analysis we know that it was definitely not one of the top 5 names of newborns in 2010.
Let's look at one more example of trend analysis and also learn a few more plotting options along the way.
In this dataset we plan to examine the maximum and minimum temperatures during the month of July 2013 in Toronto.
After the data is loaded we check the structure of the data frame and decide to move forward.
Note that there are some missing values of wind speed in our data, but we will only examine temperatures so we can ignore this variable at this time.
We would like our graph to include both maximum and minimum values for each day of the month.
Therefore, we need to determine the range of values to set our axis properly.
Here we have created a variable called yrange that contains the lowest temperature and the highest temperature for the month.
As we plot the data we will specify a line graph type with a line width of 6 pixels and a colour of blue. The y-axis limit is specified using our yrange vector and we have also included a title in our initial plot.
At this point we are not finished as we now wish to add the maximum temperatures and our axis labels to the same plot.
The par() function can be used to control various features for our graph. Here we are telling R that our next plot() function call should reuse the existing plot.
Now we add the red line of maximum temperatures for the month. We also decide to add the x and y axis labels at this time.
The final step is to include the axis range of values to our chart.
R has helped us explore how July was very warm in the middle of July followed by an extended cool period lasting until the end of the month.
We can use summary statistics to determine the mean or average temperature in the month of July or examine trends like we did in the previous example.
Now we would like to examine the distribution of temperatures during the month. A distribution of values are often visualized using a histogram were the data is grouped into categories and the categories are visualized.
Here is an example that shows the most frequent maximum temperature in July was between 22 and 24 degrees Celsius.
The breaks argument can be used to determine the size of the categories.
So now if someone asks what was it like in July 2013 in Toronto you will have lots to discuss based on learning R.
R has many more data visualization capabilities.
One of the most popular R extensions for graphics is called ggplot2 by Hadley Wickham and if your goal is to use R to create data visualizations then exploring the ggplot2 package is a good idea.
Sep 4 '14 #1
1 4550
zmbd
5,501 Recognized Expert Moderator Expert
nbiswas:
Please provide proper citations for these articles.
-z
Sep 29 '14 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

2
4421
by: ben moretti | last post by:
hi i'm learning python, and one area i'd use it for is data management in scientific computing. in the case i've tried i want to reformat a data file from a normalised list to a matrix with some sorted columns. to do this at the moment i am using perl, which is very easy to do, and i want to see if python is as easy. so, the data i am...
0
1693
by: Matthieu Siggen | last post by:
Hello, I'm really confused about how to define services when concerning data management. I'm going to take an example to show where is my problem. If I'm developping an application with two main business process. One is an agenda planning service (where I could enter meetings) and the second one is a work task manager (where I could enter...
0
1735
by: Enorme Vigenti | last post by:
Hi all, I have a problem with sqlserver 2000 and large data management. I have a database with a large tables. Every table has a continuative input data flow every morning a job delete old records from the tables (delete one most old day) but sometimes I belive that this operation is blocking for the table and the continuative data flow on...
2
2007
by: Bryan.Fodness | last post by:
I would like to have my data in a format so that I can create a contour plot. My data is in a file with a format, where there may be multiple fields field = 1 1a 0 2a 0 3a 5
3
5333
by: Karabo | last post by:
What Are The Features And Data Management Strategies Of Postgresql
1
4620
nbiswas
by: nbiswas | last post by:
Welcome to the lesson on R data structures. To perform any meaningful data analysis we need to collect our data into R data structures. In this lesson we will explore the most frequently used data types and data structures. R can be used to analyze many different forms of data. We will explore the built-in data types of R. Data analysis...
1
5199
nbiswas
by: nbiswas | last post by:
In this lesson we will learn how to import external data from files into R. We will also learn how to export, or write, the data to files if needed. Once we have our data loaded into memory we can perform data filtering or querying to focus on key aspects of our data set. We will also learn how to reorder our data within data frames and...
0
7409
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7921
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7437
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
5982
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
4958
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3465
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3446
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1900
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1023
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.