appendix c: analytical strategy for quantifying the effect of … · 2020. 3. 17. · neither...

21
1 Appendix C: Analytical Strategy for Quantifying the Effect of Facility and Operator Characteristics on Venting and Flaring Practices In this appendix, I review the strategies behind the analysis of the types of facilities and operators most responsible for Texas oil and gas venting and flaring practices in 2012. In essence, this appendix provides more details about the methods and findings underlying the research presented in chapter 4. C.1. Research Method C.1.1. Units of Analysis, Population, and Sample This study involves all producing Texas oil and gas extraction facilities that submitted their monthly production and disposition report in 2012 that are within a mile of a Census block group with at least one American Community Survey five-year summary file block group estimate publicly released, as in Appendix B. In addition, this study also involves the companies with direct ownership of the oil and gas extraction facility (i.e., the operator). In 2012, there were 4,713 different operators in control of producing oil and gas extraction facilities. C.1.2. Data Sources In addition to the five different sources described in Appendix B, this study also relies on the following three sources: (1) additional Texas Railroad Commission datasets, (2) United States Energy Information Administration Interstate and Intrastate Pipeline Shapefile, and (3) Corporate Structure Information on LexisNexis and Google. C.1.2.1. Additional Texas Railroad Commission Datasets C.1.2.1.1. Organization Report (P-5) The Texas Railroad Commission Organization Report (P-5) dataset provides information on all organizations that have completed form P-5 required to legally engage in the oil and gas extraction industry business in Texas. Since 1981, organizations directly involved in oil and gas activities in Texas, including organizations involved in drilling, operating, or producing any oil or gas well, are required to file an organization report, Form P-5. This dataset is ideal because, to my knowledge, it is the only dataset that provides researchers with the capacity to link production and disposition at individual gas wells to specific operating companies. C.1.2.1.2. 2012 Inspection Extract An extract of all inspections conducted by the Texas Railroad Commission in 2012 was received on June 25, 2016. This is an ideal dataset because it provides the most comprehensive information on inspection activities and when facilities violate state regulations. Since the state (not the federal government) is primarily responsible for regulating oil and gas extraction facilities, state regulatory activity is critical to the analysis. C.1.2.1.3. 2012 Permit Extract An extract of all venting and flaring permits granted by the Texas Railroad Commission in 2012 was received on August 10, 2015. It includes information regarding approved flaring permits. This is an ideal dataset because it is maintained by the agency responsible for approving and tracking permits to vent and flare gas in Texas.

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

1

Appendix C: Analytical Strategy for Quantifying the Effect of Facility and

Operator Characteristics on Venting and Flaring Practices In this appendix, I review the strategies behind the analysis of the types of facilities and

operators most responsible for Texas oil and gas venting and flaring practices in 2012. In essence, this

appendix provides more details about the methods and findings underlying the research presented in

chapter 4.

C.1. Research Method

C.1.1. Units of Analysis, Population, and Sample This study involves all producing Texas oil and gas extraction facilities that submitted their

monthly production and disposition report in 2012 that are within a mile of a Census block group with at

least one American Community Survey five-year summary file block group estimate publicly released, as

in Appendix B. In addition, this study also involves the companies with direct ownership of the oil and

gas extraction facility (i.e., the operator). In 2012, there were 4,713 different operators in control of

producing oil and gas extraction facilities.

C.1.2. Data Sources In addition to the five different sources described in Appendix B, this study also relies on the

following three sources: (1) additional Texas Railroad Commission datasets, (2) United States Energy

Information Administration Interstate and Intrastate Pipeline Shapefile, and (3) Corporate Structure

Information on LexisNexis and Google.

C.1.2.1. Additional Texas Railroad Commission Datasets

C.1.2.1.1. Organization Report (P-5)

The Texas Railroad Commission Organization Report (P-5) dataset provides information on all

organizations that have completed form P-5 required to legally engage in the oil and gas extraction

industry business in Texas. Since 1981, organizations directly involved in oil and gas activities in Texas,

including organizations involved in drilling, operating, or producing any oil or gas well, are required to

file an organization report, Form P-5. This dataset is ideal because, to my knowledge, it is the only

dataset that provides researchers with the capacity to link production and disposition at individual gas

wells to specific operating companies.

C.1.2.1.2. 2012 Inspection Extract

An extract of all inspections conducted by the Texas Railroad Commission in 2012 was received

on June 25, 2016. This is an ideal dataset because it provides the most comprehensive information on

inspection activities and when facilities violate state regulations. Since the state (not the federal

government) is primarily responsible for regulating oil and gas extraction facilities, state regulatory

activity is critical to the analysis.

C.1.2.1.3. 2012 Permit Extract

An extract of all venting and flaring permits granted by the Texas Railroad Commission in 2012

was received on August 10, 2015. It includes information regarding approved flaring permits. This is an

ideal dataset because it is maintained by the agency responsible for approving and tracking permits to

vent and flare gas in Texas.

Page 2: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

2

C.1.2.2. United States Energy Information Administration Intrastate and Interstate Natural Gas Pipeline

Shapefile

A shapefile of the natural gas interstate and intrastate pipelines as of January 1, 2012 is publicly

available to be downloaded at www.eia.gov/maps/layer_info-m.cfm. This dataset was collected by the

EIA from the Federal Energy Regulatory Commission (FERC). This dataset is ideal because it provides the

most extensive map of all natural gas pipelines in the continental United States. Like the Census

TIGER/Line shapefile, this shapefile datum is NAD83.

C.1.2.3. Corporate Structure Information on Lexis Nexis and Google

The Texas A&M University Sociology Department Graduate Research Award supported an

outstanding undergraduate student, Garrison Reed Barrilleaux, to collect corporate structure

information on the operators identified in the Texas Railroad Commission Organization Report Form.

First, using the operator names listed in the Texas Railroad Commission Organization Report Form, the

student identified operators listed in the LexisNexis Corporate Affiliations Database. Then, operators not

identified in LexisNexis were searched on Google. All companies that could not be found on Lexis Nexis

Corporate Affiliations that were found on Google were identified as private companies. The operators

neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small private

companies or trusts without a multilayered subsidiary form. The procedures for collecting this

information are as follows.

C.1.2.3.1. LexisNexis Corporate Affiliations Data Collection Procedures

I. Set Up Excel Document for Data Collection A. Open the excel document 32-operator_20160516.xls B. In cell I:1 type opNotes C. In cell J:1 type opType D. In cell K:1 type opTicker E. In cell L:1 type subsidLevel F. In cell M:1 type parTicker G. In cell N:1 type parName H. In cell O:1 type parSubsid I. In cell P:1 type otherNotes

i. If you ever need to note something about your search that is not listed in this document, put your note in this column.

J. Right click cell B:1 K. Select Sort A to Z L. Save the document as 32-operatorMatch_InProgress.xls.

i. Always keep a backup of this data by saving it on the computer you are using and in our GoogleDrive project folder.

ii. To ensure accuracy, make sure the .xls has autocorrect turned off. If you do not know how to turn off autocorrect, Google how to turn off autocorrect in your version of Excel.

II. Go to LexisNexis Corporate Affiliations database A. Go to A&M Libraries website at http://library.tamu.edu/ B. Type lexisnexis corporate affiliations in the search box and select the search button C. Click LexisNexis Corporate Affiliations

i. If there is an issue with accessing the database, contact the library at http://askus.library.tamu.edu/ . Be quick to ask for help.

Page 3: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

3

III. Collect Data A. If the row has new company name information:

i. Copy the name in the cell for operator_name ii. Paste the parent company name information in the company search bar and

select the search button 1. If LexisNexis says No Records Match, try shortening the company name

(e.g., deleting strings like “USA,” “CORP”, “LTD”, “LLC”, “INC”, “LP”, or “CO”)

a. If LexisNexis still says No Records Match: i. type notFound in the row’s opNotes cell

ii. type . in the row’s remaining cells iii. move on to collect data for the next row

2. If LexisNexis finds more than one company, use the information in the parState, parZip, or parAddress column to find the correct company.

a. If you cannot narrow it down, often the companies you can narrow it down to are all a part of the same ultimate parent company (this is the case if the companies you narrow it down to all have the same Hierarchy/Family Role name). If this is the case, type estNotUnique in the row’s opNotes cell, type subsidiary in the opType cell, type . in the opTicker, subsidLevel and parSubsid cells, and move on to collect the ultimate parent company information

iii. Once the company is found, collect the ultimate parent company information: 1. Record company match found

a. Type found in the row’s opNotes cell 2. Find operating company 2012 information

a. Select the company name listed on the first column on the left b. On the new screen, in the Historical Data scroll box, select 2012

and wait for the data to load 3. Record operating company type

a. Record the Company Type listed for 2012 in the opType cell i. Type parent in the row’s opType cell if the company

type is a parent company ii. Type subsidiary in the row’s opType cell if the company

type is a subsidiary iii. Type private in the row’s opType cell if the company

type is a private company 1. If the company type is private, there will be no

corresponding ticker symbol, subsidiary level, parent company ticker, parent subsidiary levels or parent company name information. As such, type . in the row’s remaining cells and move on to collect information for the next operating company.

4. Record the operating company ticker symbol information a. Scroll down to the Industry/Other table b. Find the Ticker Symbol 0 row

Page 4: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

4

i. If there is not a Ticker Symbol 0 row, or if there is no information in the Ticker Symbol 0 row, type . in the opTicker cell.

ii. If there is a Ticker Symbol 0 row, copy the ticker symbol information and paste it in the opTicker cell.

5. Record operating company subsidiary level information a. If the company type is parent, type 0 in the operating

company’s subsidLevel cell and move on to collect remaining parent company information

b. If the company is a subsidiary, count the number of subsidiary levels between the operating company and the ultimate parent company. A subsidiary level is designated by a dotted line and dash after an entity with the code S. For example:

In the example, if the operating company were

Concrete, Inc. the number of subsidiary levels is 3. If the

operating company were Knife River Corporation, the

number of subsidiary levels is 2. Once you find the

number of subsidiary levels between the operating

company and its parent, type the number in the row’s

subsidLevel cell.

6. Record operating company ultimate parent company name

Subsidiary Level

Subsidiary Level

Subsidiary Level

Page 5: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

5

a. Look in the Ultimate Parent Name row in the Company/Financial Table.

i. If there is not a Ultimate Parent Name row, or if there is no information in the Ultimate Parent Name row, type . in the parName cell.

ii. If there is a Ultimate Parent Name row, copy the ultimate parent company name information and paste it in the parName cell.

7. Record operating company ultimate parent company ticker information a. If the operating company is the parent company, re-record the

operating company ticker symbol information in the parTicker cell (i.e., for operating companies that are ultimate parent companies, like Exxon Mobil, the parTicker cell and the opTicker cell should have the same information.

b. If the operating company is a subsidiary, find the ultimate parent company information by looking at the corporate hierarchy graph on the bottom of the page, and clicking on the ultimate parent company (i.e., the company on the top of the hierarchy with the symbol P). Wait for the ultimate parent company information to load.

c. Once the 2012 ultimate parent company information loads, scroll down to the Industry/Other table and look at the Ticker Symbol 0 row

i. If there is not a Ticker Symbol 0 row, or if there is no information in the Ticker Symbol 0 row, type . in the parTicker cell.

ii. If there is a Ticker Symbol 0 row, copy the ticker symbol information and paste it in the parTicker cell.

8. Record ultimate parent company subsidiary levels. a. Count the total subsidiary levels within the ultimate parent

company. Drawing from the example image listed above in section 5. All of the operating companies within the ultimate parent company MDU Resources group would be 3, since that is the number of subsidiary levels within the organization.

b. Once you find the number of subsidiary levels within the operating company’s ultimate parent company organization, type the number in the row’s parSubsid cell.

IV. Save Collected Data A. Save the dataset as 32-operatorMatch_COMPLETE_YYYYMMDD.xls where YYYY is the

year, MM is the month, and DD is the day you completed the data collection. In addition to saving the document on your computer, also save it in our Google Drive project folder.

B. Email the dataset. Kate will replicate these procedures on a random sample of the collected dataset to verify the data was correctly collected.

C. If necessary, Kate may ask you to go back and search for the operating company using other mechanisms, such as a Google Search. If this occurs, another set of data collection procedures will be established.

Page 6: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

6

C.1.2.3.2. Google Data Collection Procedures

V. Set Up Excel Document for Data Collection M. Open the excel document Finished-32-operator_Complete_20170705.xls N. In cell Q:1 type googleSearch O. In cell R:1 type googleOpNotes P. In cell S:1 type dateChecked Q. In cell T:1 type url1 R. In cell U:1 type url2 S. In cell V:1 type url3 T. In cell W:1 type url4 U. In cell X:1 type url5 V. Sort cell I:1 opNotes by right clicking it and selecting Sort A to Z W. For all rows where opNotes is found, type 0 in the googleSearch column X. Hide all rows where opNotes is found

i. If you ever need to note something about your search that is not listed in this document, put your note in this column.

Y. Save the document as 32-operatorGoogleMatch_InProgress.xls. i. Always keep a backup of this data by saving it on the computer you are using

and in our GoogleDrive project folder. ii. To ensure accuracy, make sure the .xls has autocorrect turned off. If you do not

know how to turn off autocorrect, Google how to turn off autocorrect in your version of Excel.

VI. Go to Google Search Engine D. Go to https://www.google.com/

VII. Collect Data A. If the row has new company name information:

iv. Copy the name in the cell for operator_name v. Paste the parent company name information in the company search bar and

select the search button vi. Search the first page of the results to see if they have any information about if

the company is private, or public. Stick to the first page of results so that we can maintain the same procedures for all companies. Keep an eye out for the Bloomberg.com results, as this website often has the information we need. You can tell if the company is a private company if the url has the word private in it, for example: https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=145709387 is a private company

1. If no information is found a. re-google the company, this time the search should include the

operating company name and the word “Bloomberg” b. If still no information is found:

i. type 1 in the row’s googleSearch cell ii. type notFound in the row’s googleOpNotes cell

iii. type the date it was checked in the row’s dateChecked cell

1. use the yyyymmdd format

Page 7: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

7

2. for example, if the information is collected on July 10, 2017, type 20170710 in the dateChecked cell

c. move on to collect data for the next row vii. Once information about the parent company is found, collect the following

information: 1. Record company match found

a. Type found in the row’s googleOpNotes cell b. Copy the url where the information was found and paste it in

the row’s url1 cell c. type the date it was checked in the row’s dateChecked cell

2. Record operating company type a. Record the Company Type listed for 2012 in the opType cell

i. Type parent in the row’s opType cell if the company type is a parent company

ii. Type subsidiary in the row’s opType cell if the company type is a subsidiary

iii. Type private in the row’s opType cell if the company type is a private company

1. If the company type is private, there will be no corresponding ticker symbol, subsidiary level, parent company ticker, parent subsidiary levels or parent company name information. As such, type . in the row’s remaining cells and move on to collect information for the next operating company.

3. Record the operating company ticker symbol information in the opTicker cell.

a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.

b. If you obtain information about the operating company’s ticker information from a different url from where you found out the company’s type (i.e. if it is private/parent/subsidiary), record this url in the url2 column

4. Record operating company subsidiary level information in the subsidLevel column.

a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.

b. If you obtain information about the operating company’s subsidiary level from a different url from where you found out

Page 8: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

8

the company’s type (i.e. if it is private/parent/subsidiary) and/or ticker information, record this url in the url2 or url3 column

9. Record operating company ultimate parent company ticker information in the parTicker cell

a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.

b. If you obtain information about the operating company’s subsidiary level from a different url from where you found out the company’s type (i.e. if it is private/parent/subsidiary), ticker information, and/or subsidiary level, record this url in the url2 url3, or url4 column

10. Record ultimate parent company subsidiary levels in the row’s parSubsid cell

a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.

b. If you obtain information about the operating company’s subsidiary level from a different url from where you found out the company’s type (i.e. if it is private/parent/subsidiary), ticker information, subsidiary level and/or parent ticker, record this url in the url2, url3, url4, or url5 column

VIII. Save Collected Data A. Save the dataset as 32-operatorGoogleMatch_COMPLETE_YYYYMMDD.xls where YYYY is

the year, MM is the month, and DD is the day you completed the data collection. In addition to saving the document on your computer, also save it in our Google Drive project folder.

B. Email the dataset. Kate will replicate these procedures on a random sample of the collected dataset to verify the data was correctly collected.

C. If necessary, Kate may ask you to go back and search for the operating company using other mechanisms, such as a Google Search. If this occurs, another set of data collection procedures will be established.

C.1.3. Variables and Measurements Variables and measured are described in the table below.

Figure C.1: Variables and Measures for Analysis Presented in Chapter 4

Variables Facility-Level Measure

Venting/Flaring Participation

1 = Facility vented or flared in 2012 0 = Not

Venting/Flaring Magnitude Log (100 * [Volume (in mcf) of gas well or casinghead gas vented or flared at the facility] / Volume (in mcf) of gas well or casinghead gas produced at the facility])

Page 9: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

9

Income Median income category of households in block groups within one mile of facility: 1: Less than $10K 2: $19K – $14,999.99 3: $15K – $19,999.99 4: $20K – $24,999.99 5: $25K – $29,999.99 6: $30K – $34,999.99 7: $35K – $39,999.99 8: $40K – $44,999.99 9: $45K – $45,999.99 10: $50K – $59,999.99 11: $60K – $74,999.99 12: $75K – $99,999.99 13: $100K – $124,999.99 14: $125K – $149,999.99 15: $150K – $199,999.99 16: $200 thousand or more

Home Values Median owner-occupied home value category of households in block groups within one mile of facility: 1: Less than $10K 2: $10K – $14,999.99 3: $15K – $19,999.99 4: $20K – $24,999.99 5: $25K – $29,999.99 6: $30K – $34,999.99 7: $35K – $39,999.99 8: $40K – $49,999.99 9: $50K – $59,999.99 10: $60K – $69,999.99 11: $70K – $79,999.99 12: $80K – $89,999.99 13: $90K – $99,999.99 14: $100K – $124,999.99 15: $125K – $149,999.99 16: $150K – $174,999.99 17: $175K – $199,999.99 18: $200K – $249,999.99 19: $250K – $299,999.99 20: $300K – $399,999.99 21: $400K – $499,999.99 22: $500K – $749,999.99 23: $750K – $999,999.99 24: $1 million or more

Poor 100 * Number of households living at or below the poverty that live in block groups within one mile of the facility / Number of households living within one mile of the facility

Poor Education 100 * Number of individuals 25 and older without a high school diploma living in a block group within one mile of the facility / Number of individuals 25 and older residing in a block group within one mile of the facility

Limited English 100 * Number of households with limited or no English fluency living in block groups within one mile of the facility / Number of households living within one

Page 10: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

10

mile of the facility

Population Density Number of individuals living in block groups within one mile of the facility / Land area of block groups within one mile of the facility (in square miles)

Nonprofit Organizations Number of registered nonprofits in the county in which the facility is located

Black 100 * Number of non-Hispanic black individuals residing in block groups within one mile of the facility / Number of individuals residing in a block group within one mile of the facility

Hispanic 100 * Number of Hispanic individuals residing in block groups within one mile of the facility / Number of individuals residing in a block group within one mile of the facility

Permitted 1 = Facility Had Permit to Legally Vent or Flare in 2012 0 = Not

Violation Number of violations facility received for venting or flaring in 2012

Oil/Condensate Production Volume (in barrels) of oil or condensate produced at facility squared

Gas/Casinghead Production Volume (in mcf) of gas or casinghead produced at the facility squared

New Well 1 = Facility drilled new wells in 2012 0 = Not

Gas Well 1 = Facility is a gas well 0 = Facility is an oil lease

Well Density Number of other active wells within one mile of the surface locations of active wells on the lease

Wells on Lease Number of wells active on lease

Distance to Nearest Pipeline

Nearest distance (in feet) between the surface locations of wells active on the lease and natural gas pipeline build by January 1, 2012

Operator Gas Production Volume of gas well gas and casinghead gas produced at the operator’s facilities (in thousand cubic feet)

Operator Oil Production Volume of condensate and oil produced at the operator’s facilities (in barrels)

Operator Wells Number of active wells directly owned by the operator

Operator Gas Ratio Volume (in mcf) of gas well gas produced by facilities directly owned by the operator / (Volume of petrochemicals (barrels of oil, barrels of condensate, mcf of casinghead, and mcf of gas well gas) produced by facilities directly owned by the operator

Multilayered Subsidiary Form

1 = Facility operator is subsidiary organization 0 = Not

C.1.4. Connecting Texas Railroad Commission Datasets In addition to connecting the Texas Railroad Commission Datasets as described in appendix B,

the following steps were also taken. First, I parsed out and connected the Organization Report dataset

to the production data dump using the operator number. Then I connected both the permit extract and

inspection extract to the production data query dump using the district number, lease name and

operator number.

C.1.5. Connecting Texas Railroad Commission Information with Other Datasets To connect facility points to the nearest pipeline, I build upon the Geographic Information

System described in appendix B. I started by adding the 2012 United States Energy Information

Administration Interstate and Intrastate Natural Gas Pipeline shapefile to the geodatabase and

projecting it to North American Dam NAD83 State Plan Texas Central FIPS 4203 Coordinate System.

Then, I used the nearest distance tool to find the nearest distance (in feet) between facility wellbore

surface locations and pipeline established as of January 1, 2012.

Page 11: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

11

C.1.6. Data Analysis

This research uses a two-part/hurdle model in order to determine correlations between facility

and operator characteristics and both (1) whether, among all producing oil and gas extraction facilities,

the facility vented or flared (i.e., participation), and (2) the venting and flaring rate among facilities that

vented or flared (i.e., magnitude). The final model accounts for the clustering of standard errors by

facility operator. A two-part model accounting for the clustering of standard errors by facility operator

was chosen over a multi-level model for two reasons: (1) because there is not enough variation at level

one to run a multi-level regression model, and (2) because multi-level regression model outcomes are

very similar to regression model outcomes that account for the clustering of standard errors. Using

Stata’s vce(cluster) command, I use Huber’s (1967) formula to produce consistent standard errors, even

though the data is clustered. To ensure that operators with many facilities are not under sampled,

clustered sandwich variance estimators were produced rather than simply sampling one facility for each

operator.

C.1.6.1. Participation Generalized Linear Model

The first part of the model (i.e., the participation model) investigates the direct effects of lease

and operator characteristics on whether or not the lease vents or flares using the following equation:

log (𝜑1𝑗

1−𝜑1𝑗) = 𝛾0 + ∑ 𝛽𝑘(𝑀𝑘𝑗 − 𝑀𝑘𝑗

)𝐾𝑘=1 + 𝑒𝑗, 𝑤here 𝑒𝑖𝑡𝑗 ≈ N(0,𝜎𝑒

2)

In the full participation model above, 𝜑1𝑗 denotes the probability that lease j vented or flared; 𝛾0

denotes the average log odds that a lease will vent or flare; 𝛽𝑘 is the corresponding coefficient that

represents the direction and strength of the explanatory variable (k is the number of variables at the

lease-level); 𝑀𝑘𝑗 is the observation of the explanatory variable k for lease j, and 𝑀𝑘 is the mean of the

explanatory variable k; 𝑒𝑗 represents the random error, which is assumed to be normally distributed

with a mean of 0 and variance of 𝜎𝑒2.

C.1.6.2. Magnitude Generalized Linear Mixed Model

The second part of the model (i.e., the magnitude model) investigates the direct effects of lease,

and operator characteristics on the venting or flaring rate for leases that vented or flared gas using the

following equation:

log(𝐸[𝜑2𝑗 | 𝜑2𝑗 > 0]) = 𝛾0 + ∑ 𝛽𝑘(𝑀𝑘𝑗 − 𝑀𝑘 )𝐾

𝑘=1 + 𝑒𝑗, 𝑤here 𝑒𝑗 ≈ N(0,𝜎𝑒2)

In the full magnitude model above, 𝜑2𝑗 denotes the venting or flaring rate at lease j; 𝛾0 denotes the

average venting or flaring rate of all leases that vented or flared; 𝛽𝑘 is the corresponding coefficient that

represents the direction and strength of the explanatory variable (k is the number of different

explanatory variables in the model); 𝑀𝑘𝑗 is the observation of the explanatory variable k for lease j, and

𝑀𝑘 is the mean of the explanatory variable k; 𝑒𝑗 represents the random error, which is assumed to be

normally distributed with a mean of 0 and variance of 𝜎𝑒2.

C.2. Research Findings

C.2.1. Trends in Who Vents and Flares Summary statistics for facility and operator-level venting and flaring volumes and the

characteristics of communities surrounding all producing oil and gas extraction facilities in 2012 are as

follows.

Page 12: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

12

C.2.1.1. Facility-Level Summary Statistics

Figure C.2: Facility-Level Analysis Summary Statistics

N Mean

Standard Deviation Minimum Maximum

VENTING AND FLARING PRACTICES

Venting and Flaring Facilities

126,862 0.05 0.22 0 1

Venting and Flaring Rate (log)

6,651 -3.62 2.96 -13.929 0

COMMUNITY ECONOMIC STATUS

Median Household Income

126,862 9.65 1.94 1 15

Median Owner Occupied Housing Value

126,862 12.81 2.73 1 21

Portion of Households Living At or Below the Poverty Line

126,862 13.99 9.43 0 100

COMMUNITY CULTURAL CAPITAL

Portion of Residents 25 and Older Without Highschool Diploma

126,862 19.74 10.84 0 78.553

Portion of Households with Limited Fluency in the English Language 126,861 4.91439 7.470736 0 44.85981

COMMUNITY ORGANIZATION CAPACITY

Population Density

126,862 38.905 156.712 0.007 6,707.434

Registered Nonprofit Organizations

126,862 283.443 845.645 0 14,502

COMMUNITY RACE AND ETHNICITY

Portion of Residents that are non-Hispanic black

126,862 4.211 7.436 0 88.075

Portion of Residents that are Hispanic

126,862 25.703 23.534 0 100

STATE REGULATION

Permitted to Vent or Flare

126,862 0.004 0.066 0 1

Venting or Flaring Violations

126,862 0.002 0.068 0 9

Lease Inspections 126,862 .2970866 1.177388 0 95

FACILITY SIZE

Oil and Condensate Produced (square)

126,862 3.98E+09 4.66E+11 0 1.12E+14

Gas and Casinghead Produced (square)

126,862 4.5E+10 4.1E+12 0 9.84E+14

FACILITY COMPLEXITY

Facility Wellbores 126,862 4.816 41.446 1 5,413

ECONOMIC COSTS New Wellbores Established 126,862 0.043 0.203 0 1

Gas Well 126,862 .6078495 .4882319 0 1

Other Wellbores within One Mile

126,862 43.567 5527.855 0 1,968,882

Nearest Distance to Gas Pipeline

126,862 9,398.80 17,562.53 0 176,658.40

OPERATOR SIZE

Operator Volume of Gas and Casinghead Produced

126,862 1.12e+08 2.27e+08 0 7.61e+08

Operator Volume of Oil and Condensate Produced 126,862 2766057 5893063 0 4.27e+07

Page 13: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

13

OPERATOR COMPLEXITY

Wellbores Controlled by Facility Operator (log)

126,862 6.158 2.157 0 10.296

OPERATOR GAS DEPENDENCE

Gas Ratio

126,862 .8029547 .2939673 0 1

ORGANIZATIONAL STRUCTURE

Subsidiary 126,862 0.291 0.454 0 1

SIZE INTERACTION TERM

Facility gas/cond production volume x Operator gas/cond production volume 126,862 9.85e+12 4.71e+13 0 2.34e+15

Figure C.3: Correlations Between Variables- Facility Level Analysis

vent/ flare

vent/flare rate (ln) income

house value poverty uneducated

limited english

income -

0.0301* 0.0352 1.000

house value -0.026* -0.0504* 0.471* 1.000

poverty -0.035* -0.0871* -0.469* -0.404* 1.000

uneducated 0.1312* -0.0464* -0.434* -0.485* 0.5512* 1.000

limited english 0.1616* -0.0775* -0.4706* -0.303* 0.3495* 0.7545* 1.000

pop. density -0.035* -0.0601* 0.0391* 0.1341* 0.0062 -0.0615* 0.0014

nonprofits -0.026* -0.0865* 0.0495* 0.1597* -0.0019 -0.0276* 0.0214*

black -0.055* 0.0472* -0.155* -0.032* 0.0400* -0.045* -0.1431*

hispanic 0.1612* -0.0566* -0.322* -0.404* 0.4608* 0.8049* 0.7789*

permit 0.2083* 0.2001* -0.026* -0.036* 0.0117* 0.0461* 0.0472*

violation 0.0282* 0.0466* -0.012* -0.0067 0.0062 0.0036 -0.0008

oil/cond 0.011* -0.0384 0.0019 -0.0027 -0.0025 0.0059 0.0047

gas/csgd 0.0076* -0.0224 0.0016 -0.0019 -0.0016 0.0021 0.0012

new 0.1588* 0.0360 0.0054 0.0191* -0.022* 0.0393* 0.0297*

gas well -0.143* -0.6385* 0.0148* 0.0517* 0.1267* -0.0734* -0.0975*

well density -0.0008 -0.0619* 0.0009 0.0013 0.0010 0.0002 0.0008

wells 0.0434* 0.0191 0.0153* 0.0004 -0.019* 0.0267 0.0172*

pipe distance -0.007* 0.1696* -0.043* -0.057* -0.055* -0.0226* -0.0128*

oper. oil/cond 0.2214* 0.1003* 0.1016* 0.0504* -0.0358* 0.1007* 0.0756*

op. csgd/gas -0.075* -0.3502* 0.0967* 0.0997* -0.0504* -0.1036* -0.1157*

operator wells 0.0801* -0.2472* 0.1532* 0.1072* -0.051* 0.0328* 0.0316*

gas ratio -0.077* -0.6451* 0.0482* 0.0924* 0.0823* -0.0370* -0.0373*

subsidiary 0.0162* 0.1078* 0.0895* 0.0582* -0.0469* -0.0441* -0.0460*

size interact -0.019* -0.1778* 0.0173* 0.0428* -0.0321* -0.0469* -0.0389*

pop.

density ngos black Hisp. permit violation oil/cond gas/ csgd

pop dens 1.000

ngos 0.4844* 1.000

Page 14: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

14

black 0.1261* 0.0775* 1.000

hispanic -0.0112* 0.0238* -0.208* 1.000

permit -0.0138* -0.016* -0.015* 0.0539* 1.00

violation 0.0009 -0.0012 -0.002 0.0045 0.00 1.000

inspect. 0.0216* 0.0479* 0.0016 -0.017* -0.01 0.1275*

oil/cond -0.0014 -0.0018 -0.002 0.0063 0.00 0.0005 1.000

gas/csgd 0.0000 -0.001 0.0013 0.0034 -0.00 -0.0001 0.755* 1.000

new -0.0013 -0.014* -0.027* 0.0474* 0.12* 0.0037 0.01* 0.01*

gas well 0.0863* 0.0593* 0.1195* -0.09* -0.1* -0.026* -0.011* -0.006

well dens 0.0001 0.0002 0.0025 0.0013 -0.00 -0.0001 0.0003 0.000

wells -0.0052 0.0073 -0.015* 0.0299* 0.00 0.0142* 0.158* 0.058*

pipe dist -0.0731* -0.083* -0.144* -0.021* 0.00 0.0346* -0.003 -0.003

op oil/cond -0.0244* -0.036* -0.085* 0.1093* 0.0866* 0.0009 0.1335* 0.0700*

op. csgd/gas 0.0871* 0.0772* 0.0600* -0.13* -0.0197* -0.012* -0.0275* 0.1029*

op wells 0.0141* 0.0066 0.0020 0.0257* 0.0178 -0.0072 0.0055 0.003

gas ratio 0.0448* 0.0016 0.0977* -0.059* -0.03* -0.026* -0.0046 -0.006

subsidiary 0.0285* 0.0276* 0.0286* -0.056* 0.0206* -0.0075 0.0698* 0.0026

size interact 0.0948* 0.0515* 0.0325* -0.066* -0.0074 -0.0048 0.1283* 0.4938*

new gas well well dens wells

pipe dist

oper. oil/cond

op csgd/ gas

op wells

gas ratio subsid

gas/csgd

new 1.000

gas well -0.058* 1.0000

well dens

-0.000 0.0024 1.000

wells 0.12* -0.12* 0.001 1.00

pipe dist -0.005 -0.22* -0.002 0.01 1.00

op oil/cond

0.1335* -0.166* -0.0001 0.0730 -0.1* 1.00

op csgd/gas

-0.0275* 0.3665* 0.0001* -0.04* -0.1* 0.1396* 1.00

op wells 0.062* 0.063* 0.0027 0.05* -0.1* 0.5068* 0.5394* 1.000

gas ratio -0.061 0.775* 0.0024 -0.1* -0.2* -0.02* 0.2976* 0.063 1.000

subsid 0.036* 0.222* -0.001 0.00 -0.1* 0.1969* 0.6314* 0.528* 0.22* 1.000

size inter 0.1283* 0.1584* -0.0003 0.0025 -0.0* 0.0584* 0.4462* 0.2270* 0.1281* 0.2775*

* = significant at p < 0.001

C.2.1.2. Operator-Level Summary Statistics

I also examined all operators directly responsible for oil and gas facility operations. Summary

statistics for facility operators are as follows:

Figure C.4: Operator-Level Summary Statistics

Page 15: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

15

N Mean

Standard Deviation Minimum Maximum

VENTING AND FLARING PRACTICES

Venting and Flaring Operator

6,135 0.11 0.32 0 1

Venting and Flaring Rate (ln)

6,135 0.05 0.58 -13.929 37.96377

COMMUNITY ECONOMIC STATUS

Median Household Income (mean)

6,135 8.99 1.86 1.15709 14

Median Owner Occupied House Value (mean)

6,135 12.00 2.68 1 20.18182

Portion At or Below the Poverty Line (mean)

6,135 13.74 7.08 0 58.36614

COMMUNITY CULTURAL CAPITAL

Portion Without Highschool Diploma (mean)

6,135 19.07 8.27 0 56.972

Portion with Limited English Language Fluency (mean 6,135 4.465024 5.613945 0 44.85981

COMMUNITY ORGANIZATION CAPACITY

Population Density (mean)

6,135

35.391 145.222 0.007 4612.576

Registered Nonprofit Organizations (mean)

6,135 289.822 1051.664 0 14,502

COMMUNITY RACE AND ETHNICITY

Portion of Residents that are non-Hispanic Black (mean)

6,135 3.771 6.220 0 88.075

Portion of Residents that are Hispanic (mean)

6,135 24.813 21.138 0 98.74068

STATE REGULATION

Permitted to Vent or Flare (square mean)

6,135 0.001 0.021 0 1

Venting or Flaring Violations (mean)

6,135 0.003 0.085 0 4.5

Lease Inspections 6,135 22.94442 124.2467 0 3490

FACILITY SIZE

Oil and Condensate Produced (square mean)

6,135 1.30E+09 6.46E+10 0 3.57E+12

Gas and Casinghead Produced (square mean)

6,135 1.56E+10 7.33E+11 0 4.05E+13

FACILITY COMPLEXITY

Facility Wellbores (log mean) 6,135 6.265916 19.13736 1 707

Page 16: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

16

ECONOMIC COSTS

New Wellbores Established (mean) 6,135 0.031 0.124 0 1

Gas wells (mean) 6,135 .3180042 .3615361 0 1

Other Wellbores within One Mile (mean) 6,135 17.413 37.31328 0 2,179

Nearest Distance to Gas Pipeline (mean)

6,135 13,558.99 20,400.24 0 175,789.10

OPERATOR COMPLEXITY

Wellbores Controlled by Facility Operator (log) 6,135 2.759 1.721

0 10.3

OPERATOR GAS DEPENDENCE

Gas Ratio 6,135 .5302668 .4331749

0 1

ORGANIZATIONAL STRUCTURE

Subsidiary 6,135 .0176039 .1315174 0 1

C.2.2. Regression Results My primary analysis involves a facility-level two-part regression model. The development of

both the participation and magnitude models are below.

Figure C.5: Development of Facility -Level Regression Models

Model 1 Model 2 Model 3 Model 4

Part. Mag. Part. Mag. Part. Mag. Part. Mag.

N 126,862 6,651 126,862 6,651 126,862 6,651 126,861 6,651

Operator Clusters 4,608 455 4,608 455 4,608 455

R2/Pseudo R2 0.0821 0.028 0.0821 0.028 0.1681 0.4544 0.2057 0.5570

Constant -3.7391* -1.67627* -3.7391* -1.676 -3.3632* -1.358349 -3.2315* 1.001008

SURROUNDING COMMUNITY DEMOGRAPHICS

median income NA -.026922 NA -.0269 NA -.0102889 NA .0639128

housing value .041787* -.095357* .041787 -.0954 .0387254 -.0759229 NA -.022374

percent living at or below the poverty line

-.044509* -.038167* -.04451* -.0382 -.033188 .0027215 -.03291* -.002882

percent without high school diploma

NA .0077081 NA .0077 NA -.0037406 NA .0041144

percent with limited fluency in the English language

NA -.022509* NA -.023* NA -.0083465 NA .007259

population density -.002532* -.000712 -.002532 -.0007 -.001753 -.0001171 -.001491 -.000062

number of NGOs -.000225* -.000209* -.000225 -.0002 -.000068 .000054 .000021 -.00004

percent black -.011402* .0227* -.011402 .0227 -.008752 .0197132 -.002243 .0081067

percent Hispanic .032917* .0009338 .032917* .0009 .030933* -.0007944 .027933* .0005596

STATE REGULATION

Page 17: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

17

permit 3.08145* 1.191801* 2.93243* 1.22202*

violations .730726* .3233757* .741784* .268556*

FACILITY SIZE

oil/cond produced 9.80e-15 -1.08e-13* -1.9e-15 1.e-14*

csgd/gas produced 1.66e-15 -1.92e-15 -1.2e-14 -8.9e-14*

FACILITY COMPLEXITY

facility wellbores .0005629 .0010193* .0001769 .000878*

ECONOMIC COSTS

new 1.4752* .391152 1.41654* .4877963

gas wells -1.0853* -4.061098* -.521976 -2.9638*

wellbores within one mile -.003954 -.0065501* -.004127 -.00279*

nearest distance to pipeline -8.7e-06* .0000159* -6.4e-06* 6.06e-06

OPERATOR SIZE

oil/cond produced 6.71e-08* 2.52e-09*

csgd/gas produced -3.2e-09* 6.65e-09

OPERATOR COMPLEXITY

operator wellbores NA -.3678*

OPERATOR GAS DEPENDENCE

gas portion NA -2.9718*

ORGANIZATIONAL STRUCTURE

subsidiary .0982878 .6126543

SIZE INTERACTION

facility gas production volume x operator gas production volume

2.4e-15* -7.8e-15*

* significant at P < 0.05

C.2.3. Post Regression Analysis

C.2.3.1. Checking For Multicollinearity

Prior to determining which regression model to employ, I first determined if multicollinearity

would be a problem. To do this, the regression model was estimated and then variance inflation factors

(VIF) were measured as shown below.

Figure C.6: Table of Variance Inflation Factor Scores for Facility Participation Analysis

VIF 1/VIF

SURROUNDING COMMUNITY DEMOGRAPHICS

percent living at or below the poverty line 3.89 0.257054

population density 1.41 0.707269

number of NGOs 1.45 0.691777

percent black 1.37 0.730625

percent Hispanic 2.90 0.345352

STATE REGULATION

permit 1.03 0.971903

violation 1.00 0.997537

Page 18: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

18

FACILITY SIZE

oil/cond produced 2.40 0.417042

csgd/gas produced 2.39 0.418156

FACILITY COMPLEXITY

facility wellbores 1.07 0.930883

ECONOMIC COSTS

new 1.10 0.907388

gas well 2.60 0.384956

wellbores within one mile 1.00 0.999919

nearest distance to pipeline 1.17 0.855453

OPERATOR SIZE

csgd/gas produced 2.70 0.370656

oil/cond produced 1.35 0.740576

ORGANIZATIONAL STRUCTURE

subsidiary 2.20 0.454092

SIZE INTERACTION

facility gas production volume * operator gas production volume

1.40 0.715139

1.80

Mean VIF

While there was moderate correlation, all VIF scores were not greater than 5, so multicollinearity was not determined to be a problem. I removed various community and operator variables with VIF scores significantly larger than 5 in order to ensure the ensure confidence in model estimates.

Figure C.7: Table of Variance Inflation Factor Scores for Facility Magnitude Analysis

VIF 1/VIF

SURROUNDING COMMUNITY DEMOGRAPHICS

median income 3.16 0.316646

median owner occupied housing value 1.93 0.518510

percent living at or below the poverty line 2.48 0.403658

percent without high school diploma 5.00 0.200166

percent with limited fluency in the English language 4.96 0.201544

population density 1.26 0.790534

number of NGOs 1.31 0.763892

percent black 1.17 0.853475

percent Hispanic 5.27 0.189803

STATE REGULATION

permit 1.07 0.933906

violation 1.01 0.989903

FACILITY SIZE

oil/cond produced 1.50 0.665079

Page 19: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

19

csgd/gas produced 1.48 0.677098

FACILITY COMPLEXITY

facility wellbores 1.41 0.711328

ECONOMIC COSTS

new 1.17 0.854952

gas well 3.49 0.286199

wellbores within one mile 1.43 0.701469

nearest distance to pipeline 1.18 0.849737

OPERATOR SIZE

csgd/gas produced 2.31 0.433732

oil/cond produced 2.31 0.433381

OPERATOR COMPLEXITY

operator wellbores 2.48 0.402839

OPERATOR GAS DEPENDENCE

gas portion 2.93 0.341042

ORGANIZATIONAL STRUCTURE

subsidiary 1.19 0.843047

SIZE INTERACTION

facility gas production volume * operator gas production volume

2.02 0.495082

Mean VIF 2.23

While there was moderate correlation, all VIF scores were not much larger than 5, so multicollinearity was not determined to be a problem.

C.2.3.2. Checking Residuals

The residual distribution of the regression models used in the analysis were finally examined.

Figure C.8: Final Participation Regression Model Residual Distribution

Page 20: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

20

As you can see here, the facility-level logit model predicts most observations very well. The average

residual (which ranged from -12.96196 to 18.50433) is -.0039995 with a standard deviation of .9616276.

Model assumptions that the residuals are close to normal and approximately independently distributed

are not significantly violated.

Figure C.9: Final Magnitude Regression Model Residual Distribution

Ordinary Lease Squares (OLS) regression model assumptions that the residuals are close to normal and

approximately independently distributed are not significantly violated. The average residual (which

ranged from -9.63593 to 8.536046) is -6.56e-10 with a standard deviation of 1.968609.

Page 21: Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small

21

C.3. References Huber, Peter. 1967. “The behavior of maximum likelihood estimates under nonstandard

conditions.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and

Probability. pp. 221–233.