63429 – Lab 2: Estimating Probabilities and Exploratory StatisticsIF

SOLUTION AT Australian Expert Writers

Assignment Details:
Lab 2: Estimating Probabilities and Exploratory StatisticsIF YOU NEED HELP: Review the followingLogical operators (R Handbook)Indexing of data frames/matrices/vectors (R Handbook)Setting:As an environmental consultant working on development planning for the eastern United States, you have successfully been able to locate and begin to explore data on sea duck wintering areas (Lab 1!). Now that you have explored and mapped some of the raw data, you are ready to begin asking some more detailed questions of the dataset.In particular, you are interested in exploring some characteristics of the different species of ducks as well as understand the probabilities of duck flock sizes relative to given critical thresholds. You’re going to be working with probabilities and descriptive statistics.Lab Purpose:In lecture we will discuss definitions, axioms, and theorems of probability in lectures – now let’s apply them to our duck data sets. Recall that probability quantifies the likelihood that event will occur – so our first task is to define some simple events – each defined for a given observation. Let’s focus on something of concern for coastal development: duck wintering sites on the shore vs off shore.Are ducks found on shore (0 km from coast)Are ducks found off shore?We might ask if these are connected to particular species. To do so, it may be easiest to first separate species into particular nominal categories.Black scoter (Melanitta Americana, code BLSC), a near threatened speciesAmerican common eider (Somateria mollissima dresseri, code COEI), a near threatened speciesLong-tailed duck (Clangula hyemalis, code LTDU), a vulnerable speciesSurf scoter (Melanitta perspicillata, code SUSC), a species of least concern for conservationWhite-winged scoter (Melanitta fusca, code WWSC), a species of least concern for conservationUnidentified dark-winged scoter (surf or black scoter, code DWSC)Unidentified scoter (Melanitta sp., code SCOT)The “codes” correspond to the “species” column in your duck dataset.For this lab, we’ll begin to explore species at risk designations for development planning by assessing the probability of finding species at risk¬ – both over the full dataset, and conditional on i) flock size and ii) year.How?We can use logical operators to find observations that meet certain conditions – then mark them as belonging to a certain event. For example, suppose I wanted to identify flocks at or above sea level (depth =0 m, where negative values indicate height above ), and flocks over deep water (depth 20m). I could set up three events:depth =0mdepth 0m AND depth = 20mdepth 20mevnts – data$depth*NA # Creates a new (empty) vector,# the same size as my depth vectorevnts[data$depth = 0] – 1 # Where depth = 0, mark evnts# with 1evnts[data$depth 0 &data$depth = 20] – 2 # Where depth btwn 0,20, mark# evnts with 2evnts[data$depth 20] – 3 # Where depth 20, mark evnts# with 3You can also use which; it adds a line to each step, but would accomplish the same thing:i – which(data$depth = 0) # Find entries where depth – 0;# save them to object ‘ievnts[i] – 1 # Mark the same entries in evnts# with ‘1’.You can also use which to list out the elements in your vector “evnts”. HINT: this could help count the number of a given eventwhich(evnts== 1)You’ll need to think a little about how to identify species at risk and flock size events – what combination of logical operations ( , , =, =, ==, !=…and ways to connect statements, like ‘&’ or ‘ | ‘) will pull out the set you want?You’ll also need to think about how to ‘point’ to certain entries in a matrix, vector, or data frame!That might mean ‘indexing’ vectors/matrices, with [ ] and commas (where appropriate). It might mean correctly naming columns (e.g. data$flock_size).You’ll need to think about creating new vectors or variables to hold your information. I often like to create empty objects by copying existing data – this way, it automatically has the right dimensions. (e.g. evnts – data$depth*NA creates ‘evnts’ from the “depth” column of data; multiplying by NA sets all entries to NA).You’ll need to think about how to estimate the probability of events, conditional probability, etc. THIS IS JUST DOING MATH IN R!Pr?{E}? (Number of times E occurs)/(Number of opportunities for E to occur)=a/n (Eq.1)Where a is the number of times E was observed, and n is the number of observations.Descriptive statisticsUsually it’s useful to understand the distribution of samples, and simple descriptive statistics to understand basic features of the data. For example, if you find that most species at risk have small flock sizes using the probability calculations above, plotting the distribution may reveal a long “tail” whereby there are a few cases of very, very large flock sizes. Plotting data can help understand the location, spread, and symmetry of your data, as well as assessing the robustness and resistance statistics used to quantify these characteristics.One way to test resistance and robustness of a statistical measure is to calculate it repeatedly with subsamples of a larger data set. If a measure is robust and resistant, it shouldn’t vary too much with each re-calculation, even as your subsamples get small (and your estimate of measures become less certain). This kind of repeated sampling has several useful applications in stats – notably, estimating a measurement’s uncertainty. We’ll explore them further in later labs.It’s relatively easy to create a subsample in R: just use the sample . Given a vector (x) and a number of subsamples (n), sample will pull n entries from x at random. By default, sample will not pull the same entry more than once – but you can request that it does.Useful functions:sum(x) Adds all values in the numerical object xlength(x) Gives the length of the vector x (NOTE: x MUST be a vector)+, -, *, / Math operations (addition, subtraction, multiplication, division)sample(x) creates a subsample in Rtable(x) creates a table counting all the occurrences of specific entriesmean(x) calculates the mean of a vectormedian(x) finds the median of a vectorsd(x) calculates the standard of a vectorIQR(x) calculates the interquartile range of a vectorskewness(x) calculates the skewness of a vector. You need to source the “num.sum.funcs.R” script to run thisYKi(x) calculates the Yule-Kendall Index of a vector. You need to source the “num.sum.funcs.R” script to run thisGrading:30% for a script that i) runs and ii) completes all required tasks.5% deducted for each line of script that produces a crash/error message.15% for including comments that make it easy to understand how the script works15% for formatting your script in the requested manner.Should be named appropriatelyMust be easy to read for your TA40% for your written interpretationsTasksSubmit an R script that does the following:Loads your data setFor each observation, assigns one of the following categories:Depth =0m; flock on shore; call is S1Depth 0 and Depth =20m; flock near short; call this S2Depth 20m; flock off shore; call this S3Calculates the probability that flocks are found: a) on shore, b) near shore, c) off shore. These probabilities will simply be the number of flocks found at a given depth, divided by the number of observations we have for each species (Pr{S1}, Pr{S2},Pr{S3})Applies the custom specsplit function to create a list of seven datasets for each duck species (or species grouping)Calculates the conditional probability of finding a flock on shore given that it is each one of the species (or species groups). Hint, you will calculate 7 different conditional probabilities here (Pr{S1|species})Calculates the conditional probability of finding a species at risk (near threatened or vulnerable) given it is found on shore (Pr{at risk|S1})Reports (prints) a paragraph comparing the probability of finding flocks on shore by species, versus the likelihood of finding a species at risk if looking on shore.Proves Bayes Theorem applies to your duck data set; that is, show that:Pr?{at risk¦S1}=(Pr?{S1¦at risk} Pr?{at risk})/(Pr?{S1})Plots a histogram of the depths of all duck flocksCalculates (and reports) the following descriptive statistics for the depth data for all duck flocks. HINT: You will need to source the num.sum.funcs.R script for some of these calculations:MeanMedianStandard deviationIQRSkewnessYule-Kendall IndexCreates three random subsamples of depth of all duck flocks, storing them as objects. Each should have a length of 100 (n=100)Calculates the same set of descriptive statistics for each subsample:MeanMedianStandard deviationIQRSkewnessYule-Kendall IndexFor each statistic, save results for your 3 subsamples as a new (3 element) vector. So you’ll have a vector with your 3 subsample means, another for your 3 subsample medians, etc.Reports the RANGE of your subsampled statistics, and prints comments on the relative i) agreement and ii) resistance ofLocation measures (a-b)Spread measures (c-d)Symmetry measures (e-f)Prints a paragraph, commenting on the uncertainty of the statistics calculated as part of question 10

Order from Australian Expert Writers
Best Australian Academic Writers

QUALITY: 100% PAPERNO PLAGIARISM – CUSTOM PAPER