Posts

Showing posts from December 10, 2018

Programatically identified cookie is not getting accepted

Image
up vote 0 down vote favorite I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed. I have tried the following two codes First approach import requests import browsercookie try: cj = browsercookie.chrome() session = requests.Session() r = session.get(base_url, cookies=cj) new_cookie = str(session.cookies.get_dict()['JSESSIONID

Azerbaijão nos Jogos Olímpicos de Verão de 2004

Image
Azerbaijão nos Jogos Olímpicos de Verão de 2004 Comitê Olímpico Nacional Código do COI AZE Nome National Olympic Committee of the Azerbaijani Republic site oficial (em azeri) Jogos Olímpicos de Verão de 2004 Sede Atenas, Grécia Competidores 36 (30 homens, 6 mulheres) em 10 esportes Porta-bandeira Nizami Paşayev Medalhas Pos. 50 1 0 4 5 Participações nos Jogos Olímpicos Verão 1996 • 2000 • 2004 • 2008 • 2012 • 2016 Inverno 1998 • 2002 • 2006 • 2010 • 2014 • 2018 Outras participações relacionadas RU1 Império Russo (1900–1912) URS União Soviética (1952–1988) EUN Equipa Unificada (1992)

PySpark apply same StringIndexer on multiple columns

Image
up vote 3 down vote favorite I have the following Dataframe +--------------+---------------+ | SrcAddr| DstAddr| +--------------+---------------+ | 192.168.100.5| 192.168.220.16| | 192.168.100.5| 192.168.220.15| |192.168.220.15| 192.168.100.5| |192.168.220.16| 192.168.100.5| | 192.168.100.5| 192.168.220.15| |192.168.220.16| 192.168.100.5| | 192.168.220.9| 192.168.100.5| | 192.168.100.5| 192.168.220.9| | 192.168.220.9| 192.168.100.5| +--------------+---------------+ containing source and destination address IPs. I want to transform them in numerical index by means of StringIndexer, but I want to learn a common mapping between the columns. Unfortunately StringIndexer does not provide such a rich interface in PySpark. Thus I found a workaround, but I wanted to know if there is a bet