File a4_stateMicromaps By Dr. Carr Sections 1. Introduction 2. Dot plots with precalculated 95% confidence bounds 3. Change plots Arrows and percent change 4. Boxplots 5. stateMicromap arguments 6. Getting Data for Different Cancers 7. State estimates and country variability 8. Comments about possible class projects related to micromaps Due Plots from 2,3,4,6,7 Setup You may already have most, if not all, of the panel functions in panelLayoutObjects. It probably won't hurt and may help to reinstall them as indicated below. The stateMicromapsObjects file contains new functions and data Get the two Object files from the class schedule, put them in your working directory and the load them. ## Run load("panelLayoutObjects.txt") load("stateMicromapObjects.txt") ##End The R save() function with write object in a list in a file that the R load() function can read. While the file extension used here is .txt these files are meant for R to read, and not human. It is my current understanding that files can be read on different operation systems. Students using operatinng systems other than windows should contact me if there is a problem. I can easily supply all of the function in text files, and with a little work, the data in comma delimited files. ## Run load("stateMicromapObjects.txt") load("panelLayoutObjects.txt") ##End Data for 2-4 Get data sets from the class schedule and put these in your working directory Statelungwm50_69.txt Statelungwf50_69.txt Statelungwm70_94.txt Statelungwf70_94.txt Countylungwm70_94.txt Countylungwf70_94.txt 1. Introduction Constructing linked micromap (LM) plots can take a lot of work. Many people who have use for LM plots don't know R and have no plans to use R. Staff at the National Cancer Institute (NCI) liked my LM plots and built them into an applet used in the web site. See statecancerprofiles.cancer.gov/micromaps NCI develop a Java application to simplify production row-labeled plots and LM plots. This was for SEER registrar's quality ontrol efforts. The QC appplicaiotn inputs comma delimited data files and shape files (for micromaps). This application is being well received by people who would not dream of learning a scripting langauge like R. I hope to get an updated version that students can use. This assignment involves use of my R function that partially automates the production of LM plots. This particular function addresses the special case of showing the 50 US States plus the District of Columbia. Many federal agencies produce reports for these 51 entities, so it is reasonable to produce special design for this situation. Constructive comments to 1) clarify layout specificaton via data.frame 2) clarify other function arguments 3) improve the plot appearance 4) add more options are appreciated. Exercises 2-4 fixes up the data and show different variations of state micromaps. Exercise 5 is to read the description of the rlStatePlot arguments. Some are needed for 7. Exercise 6 is to get data for a different type of cancer and produce the given plot. Exercise 7 is to combine pieces from 2 and 4 2. Dot plots with precalculated 95% confidence bounds__________ # 50 means 50-69 # 70 means 70-94 ##Run wmlung70 = read.table('Statelungwm70_94.txt', header=T,sep=';',row.names=1) wmlung70US = wmlung70[1,] # obtain wmlung70 = wmlung70[-1,] # and remove US value wflung70 = read.table('Statelungwf70_94.txt', header=T,sep=';',row.names=1) wflung70US = wflung70[1,] #obtain wflung70 = wflung70[-1,] #and remove US values # merge the two data sets wmflung70 = merge(wmlung70,wflung70,by="row.names") row.names(wmflung70) = wmflung70$Row.names panelDesc = data.frame( type=c('maptail','id','dotconf','dotconf'), lab1=c('','','White Males','White Females'), lab2=c('','','Rate and 95% CI','Rate and 95% CI'), lab3=c('','','Deaths per 100,000','Deaths per 100,000'), col1=c(NA,NA,2,9), col2=c(NA,NA,4,11), col3=c(NA,NA,5,12), ref=c(NA,NA,NA,wflung70US[,1])) #pdf(file="a6DotConf.pdf",width=7.5,height=10) windows(width=7.5,height=10) stateMicromap(wmflung70,panelDesc,sortVar=2, title=c('Lung Cancer Mortality Rates', '1970-1994')) #dev.off() ##End 3. Change plots Arrows and percent change # 50 means 50-69 # 50 means 70-94 ##Run wmlung50 = read.table('Statelungwm50_69.txt', header=T,sep=';',row.names=1) wmlung50US = wmlung50[1,] # obtain wmlung50 = wmlung50[-1,] # and remove US value wmlung5070 = merge(wmlung50[,c(1,2)],wmlung70[,c(1,2)],by="row.names") row.names(wmlung5070) = wmlung5070$Row.names percent = 100*(wmlung5070[,4]-wmlung5070[,2])/wmlung5070[,2] wmlung5070 = data.frame(wmlung5070,percent=percent) panelDesc = data.frame( type=c('map','id','arrow','bar'), lab1=c('','','Change','Percent Change'), lab2=c('','','1950-69 To 1970-94','1950-69 To 1970-94'), lab3=c('','','Deaths per 100,000','Percent'), col1=c(NA,NA,2,6), col2=c(NA,NA,4,NA), refval=unlist(c(NA,NA,NA,100*(wmlung70US[,1]-wmlung50US[1])/wmlung50US[1])) ) #pdf(width=7.5,height=10,file="a6ArrowBar.pdf") windows(width=7.5,height=10) stateMicromap(wmlung5070,panelDesc,sortVar=2, title=c('Lung Cancer Mortality Rates',"White Males")) #dev.off() ##End 4. Boxplots________________________________________________ ##Run # white males wmlung70cnty = read.table('Countylungwm70_94.txt', header=T,sep=';',as.is=T) wmlung70cntyUS = wmlung70cnty[1,] # save wmlung70cnty = wmlung70cnty[-1,] # and remove US value nam= c('fips','rate','deaths','lower','upper','us1','us2','us3') names(wmlung70cnty) = nam wmlung70cnty[1:5,] # fips is character string not as intended # the non-numeric "US" value cause the problem wmlung70cnty$fips = as.numeric(wmlung70cnty$fips) # white females wflung70cnty = read.table('Countylungwf70_94.txt', header=T,sep=';',as.is=T) wflung70cntyUS = wflung70cnty[1,] # save wflung70cnty = wflung70cnty[-1,] # and remove US value names(wflung70cnty) = nam wflung70cnty[1:5,] wflung70cnty$fips = as.numeric(wflung70cnty$fips) # boxplot summaries for males______________________________ # The first two digits of county fips codes are state fips codes state = floor(wmlung70cnty$fips/1000) # boxplot of county values wmboxlist = boxplot(split(wmlung70cnty$rate,state),plot=F) wmboxlist$names # the names are state fips codes # convert fips codes to two letter state abbreviations statefips = rlStateNamesFips # Also used for Females below ab = statefips[match(as.integer(wmboxlist$names),statefips[,1]),2] wmboxlist$names = ab # extract the state median for sorting wmMedian = wmboxlist$stats[3,] names(wmMedian) = ab wmMedian = as.data.frame(wmMedian) # boxplot summaries for white females____________________ state = floor(wflung70cnty$fips/1000) wfboxlist = boxplot(split(wflung70cnty$rate,state),plot=F) ab = statefips[match(as.integer(wfboxlist$names),statefips[,1]),2] wfboxlist$names = ab # extract the state median for sorting wfMedian = wfboxlist$stats[3,] names(wfMedian) = ab wfMedian = as.data.frame(wfMedian) # Build a data frame with the state medians for the counties wmfbcnty = merge(wmMedian,wfMedian,by='row.names') row.names(wmfbcnty) = wmfbcnty$Row.names # panel description and plot panelDesc = data.frame( type=c('mapmedian','id','boxplot','boxplot'), lab1=c('','','White Male','White Female'), lab2=c('','','County Boxplots','County Boxplots'), lab3=c('','','Deaths per 100,000','Deaths per 100,000'), col1=c(2,2,2,2), # on the first argument is used. boxplot=c('','','wmboxlist','wfboxlist')) #!!! Note that the state data frame for stateMicromap # just has medians. # The bulk of the information come from # comes from wmboxlist, and wfboxlist passed # whose names are obtained via panelDesc #pdf(width=7.5,height=10,file="a6Boxplots.pdf") windows(width=7.5,height=10) stateMicromap(wmfbcnty,panelDesc,sortVar=2, title=c('Lung Cancer Mortality Rates','1970 - 1994')) #dev.off() ##End 5. stateMicromap arguments stateMicromap = function( stateframe, panelDesc, rowNames=c("ab","fips","full")[1], sortVar=NULL, ascend=T, title=c("",""), plotNames=c("ab","full")[2], colors=rlStateDefaults$colors, details= rlStateDefaults$details) { # stateframe data.frame # rownames must be state abbreviations, names, or fips codes # panelDesc data.frame # The panel description data.frame provides specifics for each plot column # in each data.frame row. Different kinds of arguments are in the # in the panelDesc columns. The number of rows must match for all the # columns of the data frame. Thus a place holder, such as NA or "", is # needed when the argument is not used for some of the plot columns. # If the argument not used in any column it can typically be omitted. # For example if the plot has no boxplots, there is not need for # for a boxplot column in the panelDesc data.frame. # # The basic arguments are: # type: The type of panel to used in each plot column: dot, bar, ... # lab1,lab2,lab3: Two top labels and one bottom label for the column # col1,col2,col3: The data.frame colums with the needed data. # refval: Reference values if any ar used. # boxplot: The location of of boxplots summaries, if any # # Example # panelDesc = data.frame( # type=c('map','id','dotconf','dotconf'), # lab1=c('','','White Males','White Females'), # lab2=c('','','Rate and 95% CI','Rate and 95% CI'), # lab3=c('','','Deaths per 100,000','Deaths per 100,000'), # col1=c(NA,NA,2,9), # col2=c(NA,NA,4,11), # col3=c(NA,NA,5,12), # refval=c(NA,NA,NA,wflungbUS[,1]), # boxplot=c('','','','') ) # # Each column in the data frame is a description vector # # The first element in each description vector # applies to the first column in the plot # The jth element in each description vector # applies to the jth column in the plot. # # Several plots types do not utilize all of the # description vectors. Use NA as place holder in these situation. # '' or "" can also be used for character strings # # For example a 'map' column already comes with labels # and does not need data. In the panel description above, # the first elements of the other description vectors # are either '' or NA. # # The third column in the micromap will be a dotconf # column. That is, it will show an estimate plotted on lower # and upper confidence bounds that have be calculated. I # The data for the values comes from column 2,4, and 5 # in the data data.frame. This time the description # provide the labels. # # type refers the panel type, valid types are # "map", "mapTail","mapMedian", # "id", # "dot", "dotse","dotconf", # "bar", "arrow", "boxplot" # # For non-highlighted contours # map accumulates states top to bottom # maptail accumulates states outside in # mapMedian features above median state above the median and vis versa # # bar will accept negative values and plot from 0 in that direction. # # col1,col2, col3 # numbers indicating stateframe columns to be used as data # Dot and bar plots require one variable: Supply the column number in col1 # # Dotse and arrow dots require require two variables: # dotse needs estimates and standard errors # arrows needs beginning and ending values # Supply the two column numbers in col1 and col2 respectively # # Dotconf requires 3 variables: # estimate, lower and upper bounds # Supply the column number in col1, col2, and col3 respectively # # lab1, lab2 # Two label lines appear at the top of columns. Use "" for blank labels # # lab3 # One label line to appear at the bottom of a each column, # Typically measurement units # # refValues # name of objects providing a reference values shown # as a line down the column # # boxplot # names list object with a boxplot for each state # states much be labeled by their abbreviation. # # Note: Descriptors may be omitted if none of the # panel plots need them. # often refValues and boxplots can be omitted # rowNames type of state id used as row.names in stateframe # default: "ab" for abbreviation # plotNames state label use in the plot # default: full name # sortVar a column subscript of stateframe to specify # the variable used in sorting. # Can be a vector of column subscripts to break ties # title vector with one or two character strings to use in the title. # ascend default: T sorts in ascending order # colors a color palette as a vector of strings # 5 colors for states in a group of 5 # 1 color for the median state # 1 foreground color for non-highlighted states in the map # details spacing, line widths and other details # see rlStateDef$details The function rlStateSetDefaults creates the object rlStateDefaults. This object controls many rendering details. rlStateDefaults = rlStateDefaults() You can edit the defaults directly or modify the function and use it to produce the object. I edit the function in a script file and source it. Both the function and the object are available after loading the stateMicromap.txt file. 6. Getting Data for Different Cancers_________________________________ Produce a plot like 2)above using data for a different cancer. Get state data for 1970-1994 time period. for both males and females. Choose a race but make sure they match. Fix labels accordingly NCI Web Resources: Atlas of Cancer Mortality in the U.S. 1950-1994 Data available for two time periods 1950-1969, 1970-1994 Access: http://www3.cancer.gov/atlasplus/new.html Directions Select Mortality Maps and Rates by Cancer Select Type of Cancer (Not Bronchus, Lung, Trachea, Pleura) Select Download Data Click Desired red button in the table Right Click On Ascii Text further up the page Use save link as to put the file where you want it. If files are opened with notepad, they look odd because the carriage returns are respected. I right click on the files and usually open them with word pad. Then the files look fine as semicolon delimited files. I then save the files from wordpad. After that they look okay in notepad as well. (Probably notepad has an option I could have used.) Note that the first line is variable labels and the second line is the US value. The scripts earlier removed the US values, but saved them for use as a reference value. 7. State estimates and country variability For the time period 1970-1994 and white males (or females) Make a 3 column rlstateplot. id in column 1, dotplot of state estimates with confidence bounds in column 2 boxplot of county values in column 3 Produce as a pdf file. 8. Comments about possible class projects related to micromaps I have many components for a general LMplot version that I would like to make available in an R package. Producing a first cut at this could be the final project for someone in this class. Final projects can focus on different parts of the world. I started working on the nations of African and the regions of India. There are some issues related to regions and boundary files. African nations have changed over time. India has at least one disputed region so the map might be drawn different depending on the target audience. I will provide some help to people working to produce LMplots for new regions or using new data. Getting boundary files and data can be challenging. At NCI I Produced legends and modification that responded to usability feed back. Suggest and demonstrated adding ranks as a new panel that meet with a positive response. Envisioned alternative views that I called egocentric view Some are in now available view the NCI applet I didn't have or take the time to really automate these items times in R or Splus functions. Working on these could be a class project. I am still looking for a better way to handle the data storage for boxplots. and am open to constructive suggestions in areas such as data handling, appearance, utility, and easier production.